[PDF] Analysis of multiscale integrators for multiple attractors and irreversible Langevin samplers

Abstract

We study multiscale integrator numerical schemes for a class of stiff stochastic differential equations (SDEs). We consider multiscale SDEs with potentially multiple attractors that behave as diffusions on graphs as the stiffness parameter goes to its limit. Classical numerical discretization schemes, such as the Euler-Maruyama scheme, become unstable as the stiffness parameter converges to its limit and appropriate multiscale integrators can correct for this. We rigorously establish the convergence of the numerical method to the related diffusion on graph, identifying the appropriate choice of discretization parameters. Theoretical results are supplemented by numerical studies on the problem of the recently developing area of introducing irreversibility in Langevin samplers in order to accelerate convergence to equilibrium.

Full PDF

aa r X i v : . [ m a t h . NA ] O c t ANALYSIS OF MULTISCALE INTEGRATORS FOR MULTIPLEATTRACTORS AND IRREVERSIBLE LANGEVIN SAMPLERS

JIANFENG LU AND KONSTANTINOS SPILIOPOULOS

Abstract.

We study multiscale integrator numerical schemes for a class of stiﬀ stochastic diﬀeren-tial equations (SDEs). We consider multiscale SDEs with potentially multiple attractors that behaveas diﬀusions on graphs as the stiﬀness parameter goes to its limit. Classical numerical discretizationschemes, such as the Euler-Maruyama scheme, become unstable as the stiﬀness parameter convergesto its limit and appropriate multiscale integrators can correct for this. We rigorously establish theconvergence of the numerical method to the related diﬀusion on graph, identifying the appropriatechoice of discretization parameters. Theoretical results are supplemented by numerical studies onthe problem of the recently developing area of introducing irreversibility in Langevin samplers inorder to accelerate convergence to equilibrium. Introduction

The main focus of this work is numerical integrators for stochastic diﬀerential equations (SDEs)with multiscale coeﬃcients, with the focus on irreversible ﬁrst order (overdamped) Langevin dy-namics with additive noise. The main motivation of this work is to design numerical integratorsfor SDEs arising from recent works on irreversible Langevin samplers [22, 25, 26]. In those works,an irreversible drift term is added to the overdamped Langevin dynamics (details will be speciﬁedin section 2), and it is proved that under the proper assumptions the sampling eﬃciency increasesas the magnitude of the irreversible drift goes to inﬁnity, which is validated by numerical studies,see [7, 8, 25, 26]. However, at the same time, when a strong irreversible drift is added to the originaloverdamped Langevin equation, the stiﬀness of the system is inevitably increased, and thus pre-vents the application of standard numerical integrators to the resulting systems. The goal of thiswork is to study multiscale integrators that allow to enlarge the magnitude of the irreversible driftwithout having to sacriﬁce the stability of the numerical algorithm.For SDEs with multiscale coeﬃcients, it is well understood that we shall take into account themultiscale structure in order to design better integrators (see e.g., the books [10, 24]). The key ideais to use the averaged limit of the SDE when the scale is well separated. Hence, it is not necessaryto accurately resolve the scales of the original system, but we can rather work with the averagedlimit. This has been the underlying principle of the heterogeneous multiscale methods (HMM)[3, 11, 12], in particular see [13, 31, 32] for its applications to stochastic diﬀerential equations. Othernumerical approaches for stiﬀ SDEs were also developed in [1, 2, 4, 6, 20, 30].Our numerical scheme follows the ideas of the FLAVORS method developed in [30], which onthe algorithmic level is very similar to the seamless version of HMM method developed in [14, 15].The basic idea is to use a split-step integrator which combines a short time integration of the wholeSDE and a longer time integration of the SDE without the stiﬀ terms. The numerical analysis ofsuch schemes [30] shows that in the case that the variables of the SDEs can be one-to-one mapped

Date : October 10, 2018.J.L. was partially supported by the National Science Foundation under the grant DMS-1415939. K.S. was partiallysupported by the National Science Foundation (NSF) CAREER award DMS 1550918. to a set of “fast” and “slow” variables, the numerical scheme converges to the averaged limit whichconsists of the dynamics of the slow component. We emphasize that the algorithm does not requireexplicit knowledge of the mapping that transforms the system into fast and slow variables, whileit does require the forcing terms of the SDE can be separated into stiﬀ and non-stiﬀ terms.The main contribution of this work is to extend the analysis of [30] to situations that a one-to-one mapping of the original degree of freedom into fast and slow variables is not possible. Inparticular, for the irreversible Langevin sampler the function U that maps the conﬁgurationalspace to the energy is clearly not one-to-one. In fact, it is well known that in the limit the SDEconverges to a diﬀusion on an associated graph [17, 18], for which besides the energy, one has toadd the index variable to represent the state space. Our main result proves that the multiscaleintegrator converges to a diﬀusion on graph as the scale separation parameter tends to inﬁnityand the discretization parameters are appropriately chosen. In the one well case, our proof followsideas of [30] appropriately adjusting for the diﬀerent limiting behavior that we have here. Then,the results are being extended to the multiple well case by using techniques similar to those of theclassical averaging techniques of [5, 16, 18]. However, since we work in the discrete time frameworkand not in the continuous time framework, we need to obtain bounds with explicit dependence onthe discretization parameters.In this paper, we mainly study convergence to the invariant measure of the limiting dynamics.The mathematical analysis suggests how to choose the parameters of the problem (micro step andmacro step) with respect to a given value of the stiﬀness parameter in order for the HMM integratorto sample from the correct measure. In addition, being able to numerically approach the limit ofthe stiﬀness parameter allows us to approximate via simulation the limiting transition probabilitiesbetween the diﬀerent attractors of the system.In regards to future research, it would be of great interest to address the challenges that comeup in convergence in ﬁnite time points, as it done in [21] for Euler’s method. It would also be ofgreat interest to obtain nonasymptotic bounds in the spirit, for example, of [9].This paper is organized as follows. We will introduce the SDEs from the irreversible Langevinsampler and the HMM multiscale integrator in Section 2. Some numerical results are presented inSection 3 to validate the method. The averaging results of the SDEs, in particular, convergenceto the diﬀusion on graphs are recalled in Section 4. The main results and the proofs are given inSection 5. 2. HMM integrator for irreversible Langevin sampling scheme

Consider the overdamped Langevin equation(1) d Z t = −∇ U ( Z t ) d t + p β d W t , Z = z , where U : E → R is a given potential, β = ( k B T ) − is the temperature, and W t is the standardmulti-dimensional Wiener process. Here E ⊆ R d denotes the state space. See Section 4 for condi-tions on U . The overdamped Langevin dynamics (1) is often used to sample the Boltzmann-Gibbsmeasure, see [28], with density given by ̺ ( z ) ∝ e − U ( z ) /β , which is the invariant measure of (1) under mild conditions. Note that the inﬁnitesimal generatorof (1) is symmetric with respect to the invariant measure, and thus the dynamics (1) is reversiblein time, i.e. , it satisﬁes detailed balance. NALYSIS OF MULTISCALE INTEGRATORS FOR MULTIPLE ATTRACTORS AND IRREVERSIBLE LANGEVIN SAMPLERS3

In [22, 25, 26], it was proposed to add to the overdamped Langevin dynamics an irreversibleforcing to accelerate the sampling, the resulting dynamics reads(2) d Z εt = (cid:2) −∇ U ( Z εt ) + 1 ε C ( Z εt ) (cid:3) d t + p β d W t , Z = z , where the vector ﬁeld C : R d R d . The invariant measure is maintained if the vector ﬁelds C satisﬁes div( Ce − U/β ) = 0, or equivalentlydiv C = β − C · ∇ U , where C · ∇ U denotes the classical inner product between C and ∇ U . A convenient choice, whichwe assume henceforth, is to pick C such thatdiv C = 0 , and C · ∇ U = 0 . This is not the most general choice for C , but it has the advantage that allows to choose C independently of β . One such choice of C ( z ) is C ( z ) = J ∇ U ( z ), where J is any antisymmetricmatrix. These conditions mean that the ﬂow generated by C preserves Lebesgue measure sinceit is divergence-free, at the same time, since U is a constant of the motion, the micro-canonicalmeasure on the surfaces { U = z } are preserved as well. Let us remark that in physics terminology,a Langevin equation is a second-order dynamics which also include momentum variables in additionto the “position variable” Z t as in the overdamped equation (1). The physical Langevin equationis in fact irreversible due to the momentum degree of freedom, while here we have adopted theconventional name of irreversible Langevin sampler for the ﬁrst-order dynamics (2) with additionalirreversible drift on the overdamped Langevin equations.The amplitude of the irreversible drift in (2) is chosen to be ε . We will consider the regimethat ε ≪

1. Using the large deviation action functional of the empirical measure, it is shown in[25, 26] that the dynamics (2) converges faster to the invariant measure for a larger irreversibledrift, i.e., as ε becomes smaller. From another point of view, as will be recalled in section 4, inthe limit ε →

0, the slow component associated to the solution of the SDE (2) converges to theaveraging limit which is a diﬀusion on an associated graph, and hence the entropy associated withthe iso-surfaces of U is completely removed and only the energetic barrier is left in the limit. In[26] it is also established that the asymptotic (as t → ∞ ) variance of the estimator is decreasing in ε and in the limit as ε ↓

0, it converges to the asymptotic (as t → ∞ ) variance of the correspondingsampling problem on the graph where the limiting diﬀusion lives.Increasing the irreversible drift however comes with a price: The right hand side of the SDE(2) becomes rather stiﬀ as ε →

0, and as a result, standard integrators (for example the Euler-Maruyama scheme) would require vanishingly small time step size to resolve the fast scale of thedynamics. As ε goes to zero, the SDE contains multiple time scale, and thus it is better to usemultiscale integrators for such dynamics.In this work, we investigate a multiscale integrator for stiﬀ SDEs as (2) proposed in [30], whichis also rather close to the seamless version of HMM scheme [13–15]. For a macro time step δ andmicro time step τ such that τ ≪ ε ≪ δ , from t n to t n + δ , we evolve the dynamicsd s Z t = (cid:2) −∇ U ( s Z t ) + 1 ε C ( s Z t ) (cid:3) d t + p β d W t t ∈ [ t n , t n + τ );(3a) d s Z t = −∇ U ( s Z t ) d t + p β d W t t ∈ [ t n + τ, t n + δ ) . (3b) While it is possible to combine the irreversible sampling with other techniques to overcome the energetic barrier,we will not go further in this direction as it is not the focus of the current work.

JIANFENG LU AND KONSTANTINOS SPILIOPOULOS

This can be understood as a split-step time integrator where for the short time step τ we usethe whole SDE and for the long time step δ − τ we neglect the irreversible drift. The equationsabove can be integrated using standard numerical schemes, and for deﬁniteness, in this work wewill discretize using the standard Euler-Maruyama method, which gives s Z t n + τ − s Z t n = − τ ∇ U ( s Z t n ) + τε C ( s Z t n ) + p βτ ξ n ;(4a) s Z t n + δ − s Z t n + τ = − ( δ − τ ) ∇ U ( s Z t n + τ ) + p β ( δ − τ ) ξ ′ n , (4b)where ξ n and ξ ′ n are independent standard normal random variables.As will be discussed in Sections 4 and 5, if the dynamical system ˙ z t = C ( z t ) does not have aunique invariant measure on each connected component of the level sets of U ( z ) and the dimensionis bigger than two, then one needs to modify the scheme by considering an additional regularizingnoise, see Condition 1. In particular, in this case we may need to regularize the problem byintroducing an additional artiﬁcial noise component in the fast dynamics, i.e.,(5) dZ ǫt = h −∇ U ( Z ǫt ) dt + p βdW t i + (cid:20) ǫ e C ( Z ǫt ) dt + r κǫ σ ( Z ǫt ) dW ot (cid:21) . Here, W and W o are independent standard Wiener processes, the matrix σ will be speciﬁed inCondition 3 below, and we have deﬁned(6) e C i ( z ) = C i ( z ) + κ d X j =1 ∂ (cid:2) σσ T ( z ) (cid:3) j,i ∂z j , i = 1 , . . . , d. If κ = 0 then the fast motion is the deterministic dynamical system ˙ z t = C ( z t ) and Z ǫt is arandom perturbation of this dynamical system. For example, if d is even we can take C to bethe Hamiltonian vector ﬁeld C ( z ) = J ∇ U ( z ). If κ > ρ ( z ) ∝ e − U ( z ) /β . In addition, we emphasize here that thelimiting behavior as ǫ → κ nor σ ( z ) appear in the limiting dynamics, see Theorem 6. Additionally, as it is proven in [27] addingsuch perturbations does not make the performance worse in terms of all three criteria, spectral gap,asymptotic variance and large deviations rate function.As we shall also see in (14) in Section 5, in the case of the perturbation (5), the algorithmnaturally extends to the form s Z t n + τ − s Z t n = − τ ∇ U ( s Z t n ) + τε e C ( s Z t n ) + p βτ ξ n + r τǫ √ κσ ( s Z t n ) ξ ′′ n ;(7a) s Z t n + δ − s Z t n + τ = − ( δ − τ ) ∇ U ( s Z t n + τ ) + p β ( δ − τ ) ξ ′ n , (7b)where ξ n , ξ ′ n , ξ ′′ n are independent standard normal random variables.We will show that with proper choices of the time steps τ and δ as ε →

0, the numerical schemes(4), and more generally (7), converge to the diﬀusion on graphs, which is the averaging limit of (2).Thus, we may use (4), or more generally (7), to numerically discretize the SDE which is consistentin the asymptotic regime as ε → τ rather than justone such step as in (4), which is analogous to the original HMM integrators. For the purposeof sampling invariant measure, one could also combine the integrator with Metropolis adjustmentsteps as in the MALA method [28], see [23] for some preliminary results towards this direction. NALYSIS OF MULTISCALE INTEGRATORS FOR MULTIPLE ATTRACTORS AND IRREVERSIBLE LANGEVIN SAMPLERS5

We will focus on the numerical analysis of the scheme (4), and more generally (7), and leave theseextensions to future works. 3.

Numerical examples

Before we turn to the analytical results, let us present a few numerical tests for the multiscaleintegrator. We will ﬁrst consider the sampling eﬃciency of the irreversible Langevin sampler withthe multiscale HMM integrator. We will then show some numerical examples illustrating propertiesof the integrator. We limit ourselves to simple toy examples as the focus is to demonstrate thenumerical properties of the integrator and validate the numerical analysis results, rather thanapplying to scheme to realistic problems.For the ﬁrst test, we consider a 2 D symmetric double well potential given by(8) U ( x, y ) = 14 ( x − + y , with inverse temperature β = 0 .

1. We choose C = J ∇ U with J = (cid:0) − (cid:1) . The initial condition isset to be the origin, and we consider the empirical average of the observable f ( x, y ) = x + y over thetime interval [0 , T burn = 20. To test the performance of the samplingscheme based on the multiscale integrator, we compare the empirical average with the true averageof the observable with respect to the invariant measure. Note that in this case, due to the choiceof the potential and the observable, the true average is explicitly given by h f i = β = 0 .

05. Wealso estimate the asymptotic variance of the sampling scheme by dividing the sampled data pointsinto 20 batches.Denote the sampling error (with respect to the observable f ) as Err f and the asymptotic variance(with respect to the observable f ) as AVar f . The numerical results for various choice of ε areshown in Table 1, in which we also include the results for direct Euler-Maruyama discretization forcomparison. In these tests, we ﬁx the macro time step δ = 5 e -3 (for the direct Euler-Maruyamadiscretization, δ is the time step size), and choose the micro time step τ = 10 δε = 0 . ε . For a givenset of parameters, we report the mean and standard deviation estimated from 20000 independentruns of the algorithms. We note that the Euler-Maruyama scheme is unstable for ε below 0 . δ .We remark that the particular choice of the observable f = x + y makes the accurate samplingrather challenging in this case: As E x = 0 due to the symmetry, the correct sampling of the averagevalue requires ﬁne balance of the time the trajectory spent in left and right well of the double wellpotential. This explains the high relative error that E (Err f ) is on the same order of E f .We make several observations in regards to the numerical results in Table 1. First, from the resultof the Euler-Maruyama scheme for various ε , it is clear that a larger irreversible drift (smaller ε )enhances the sampling as the sampling error and also the asymptotic variance decrease. Second,while for a ﬁxed computational cost the Euler-Maruyama scheme becomes unstable for small ε ,the HMM scheme works well for smaller ε which further reduces the sampling error. Moreover, weremark that while the integrator works well for very small ε , the improvement in this example forgoing to a very small ε is limited, this is expected since when ε →

0, as will be shown later, thescheme becomes an approximation of the averaging limit of the SDE. Thus the sampling eﬃciencyis determined by the limiting system, and the impact of a ﬁnite but small ε may be negligible. Ofcourse, this depends on how fast ergodicity kicks in allowing the averaging limit to be achieved.Note that the HMM multiscale integrator allowed us to reach to the ε → ε keeping δ ﬁxed. JIANFENG LU AND KONSTANTINOS SPILIOPOULOS ε τ δ E (Err f ) Std(Err f ) E (AVar f ) Std(AVar f )E-M ∞ e -3 1 . e -1 9 . e -2 3 . e -1 7 . e -25 5 e -3 1 . e -1 9 . e -2 3 . e -1 7 . e -25 e -1 5 e -3 7 . e -2 5 . e -2 1 . e -1 4 . e -21 e -1 5 e -3 4 . . e -2 1 . e -2 5 . e -3HMM 1 e -2 5 e -4 5 e -3 3 . e -2 2 . e -2 1 . e -2 5 . e -31 e -3 5 e -5 5 e -3 3 . e -2 2 . e -2 1 . e -2 5 . e -31 e -4 5 e -6 5 e -3 3 . e -2 2 . e -2 1 . e -2 5 . e -3 Table 1.

Comparison of the Euler-Maruyama scheme and the HMM multiscaleintegrator for the double well potential (8). The macro time step is δ = 5 e -3 withthe micro time step τ = 0 . ε . The mean and standard deviation of the samplingerror Err f and asymptotic variance AVar f are reported for various choice of ε . Thecase ε = ∞ means sampling without adding the irreversible drift. The stabilitythreshold for the Euler-Maruyama scheme for the time step size δ = 5 e -3 is around ε = 8 . e -2. ε τ δ E (Err f ) Std(Err f ) E (AVar f ) Std(AVar f )E-M 5 e -2 1 e -3 3 . e -2 2 . e -2 1 . e -2 5 . e -3HMM 1 e -3 2 e -5 1 e -3 3 . e -2 2 . e -2 1 . e -2 5 . e -31 e -4 2 e -6 1 e -3 3 . e -2 2 . e -2 1 . e -2 5 . e -31 e -5 2 e -7 1 e -3 3 . e -2 2 . e -2 1 . e -2 5 . e -3HMM 1 e -4 1 e -6 1 e -3 3 . e -2 2 . e -2 3 . e -2 1 . e -21 e -5 1 e -7 1 e -3 3 . e -2 2 . e -2 3 . e -2 1 . e -2 Table 2.

Comparison of the Euler-Maruyama scheme and the HMM multiscaleintegrator for the double well potential (8) with reduced macro time step δ = 1 e -3(compared with δ = 5 e -3 in Table 1). The Euler-Maruyama scheme is now stablewith ε = 5 e -2; and the stability threshold is around ε = 3 . e -2. The micro time stepin HMM scheme is chosen to be either τ = 20 δε = 0 . ε or τ = 10 δε = 0 . ε . Themean and standard deviation of the sampling error Err f and asymptotic varianceAVar f are reported for various choice of ε .We further test the dependence of the HMM scheme on the choice of parameters in Table 2.In those tests, we decrease the value of macro time step to δ = 1 e -3. In comparison, we also listthe result of the Euler-Maruyama scheme with the same time step, which is now stable for smaller ε = 5 e -2 (but loses stability if we further reduce ε ). The results for the HMM scheme with diﬀerent ε and τ = 20 δε = 0 . ε suggest that it is better to take a smaller ε , though the improvement isagain marginal in this case. Compared with Table 1, we see that a smaller δ improves the samplingresults, though of course this comes with a higher computational cost.In Table 2, we also consider choice of the micro time step τ with diﬀerent ratios of τ / ( δε ) tosee the dependence. We observe that in the case with the smaller δ , if we still take τ such that τ = 10 δε as in Table 1, the performance of the sampling scheme is in fact worse than the directEuler-Maruyama scheme (with a larger ε ). Thus it motivates the choice of a larger τ , whichincreases the eﬀective sampling time of the fast dynamics, and hence is expected to lead to betterperformance. This is conﬁrmed in the numerical results with the choice of τ = 20 δε = 0 . ε . NALYSIS OF MULTISCALE INTEGRATORS FOR MULTIPLE ATTRACTORS AND IRREVERSIBLE LANGEVIN SAMPLERS7

Unfortunately, if we further increase τ (choosing for example τ = 30 δε = 0 . ε here), the HMMscheme becomes unstable. This instability can be understood in our theoretical analysis as theassumption that ( τ /ε ) / ≪ δ in the convergence result Theorem 6. With a ﬁxed macro time step δ (and hence ﬁxing computational budget), it seems that a good practice is to choose a larger ratio τ / ( εδ ) while making sure that the scheme is stable.In the next example, we consider a more complicated potential, still in two dimension, given by(9) U ( x, y ) = 14 h ( x − (( y − + 1) + 2 y − y/ i + e − x − y with inverse temperature β = 0 .

2. This is the potential considered in [26, Example 3]. We takethe observable f ( x, y ) = ( x − + y and setting δ = 5 e -3 and τ = 10 δε = 0 . ε . The totalsimulation time is T = 2000 with a burn-in period T burn = 20. 20000 independent runs of thealgorithms are used to get statistics of the sampling error and asymptotic variance. The results arereported in Table 3. Here the true average of the observable is obtained by a discretization of theGibbs distribution on the phase space with a ﬁne mesh, which gives approximately E f ≈ . ε smallerthan 0 .

1. The conclusion of the numerical results is similar to that of the double well example. ε τ δ E (Err f ) Std(Err f ) E (AVar f ) Std(AVar f )E-M ∞ e -3 4 . e -1 3 . e -1 2 . e

00 5 . e -15 5 e -3 4 . e -1 3 . e -1 2 . e

00 5 . e -15 e -1 5 e -3 3 . e -1 2 . e -1 1 . e

00 3 . e -11 e -1 5 e -3 1 . e -1 8 . e -2 3 . e -1 1 . e -16 e -2 5 e -3 1 . e -1 5 . e -2 6 . e -2 2 . e -2HMM 1 e -2 5 e -4 5 e -3 1 . e -1 7 . e -2 3 . e -1 9 . e -21 e -3 5 e -5 5 e -3 1 . e -1 7 . e -2 3 . e -1 9 . e -21 e -4 5 e -6 5 e -3 1 . e -1 7 . e -2 2 . e -1 9 . e -2 Table 3.

Comparison of the Euler-Maruyama scheme and the HMM multiscaleintegrator for the potential (9). The macro time step is δ = 5 e -3 and the micro timestep τ = 0 . ε . The mean and standard deviation of the sampling error Err f andasymptotic variance AVar f are reported for various choice of ε . The case ε = ∞ means sampling without adding the irreversible drift. The stability threshold forthe Euler-Maruyama scheme is around ε = 6 e -2, which is included in the abovetable.Next we plot the x coordinate of a sample trajectory and the corresponding potential energy U ( x ( t ) , y ( t )) for the double well potential in Figure 1. The simulation is done with ε = 1 e -5 andtime step sizes τ = 5 e -7 and δ = 0 .

05. The trajectory is plotted at the end of every macro step(so on the interval of δ ). The plot focuses on the time period [20 ,

21] during which the trajectorymainly stays in the left portion of the phase space { x < } . As can be clearly observed from theﬁgure, while the solution of the SDE oscillates very fast, the potential energy U changes much moreslowly, which suggests that U is a slow variable in the averaging limit.It is clear that the map from ( x, y ) to U is not one-to-one in these examples. For the double wellpotential, below the energy of the saddle point U (0 ,

0) = , the isopotential curve is disjoint andseparated into the left and right half-planes corresponding to the two minima of the potential at JIANFENG LU AND KONSTANTINOS SPILIOPOULOS t

20 20.2 20.4 20.6 20.8 21 x -1.5-1-0.500.5 t

20 20.2 20.4 20.6 20.8 21 U Figure 1.

A sample trajectory of the HMM multiscale integrator for the doublewell potential (8) with ε = 1 e -5, τ = 5 e -7, and δ = 0 .

05. (Left) The x cooridnate ofthe trajectory for t ∈ [20 , U associated with thetrajectory.( − ,

0) and (1 , U = corresponding to the saddle point (see section 4where these concepts are recalled). Therefore, to get from one component of the isopotential curveto the other, the trajectory has to go up in energy and cross the interior vertex; when the energyis decreased from above , the trajectory would go into one of the edges, corresponding to one ofthe disjoint components. In fact, such an event of energy goes above the saddle point energy canbe observed already in Figure 1 around t = 20 .

5, where the energy ﬁrst goes up and when it dropsdown, the trajectory goes back to the same potential well (on the left half-plane).To see how the multiscale integrator captures diﬀusion across the interior vertices on the graph,we record the number of transitions to each edge with lower energy connecting to the saddle pointwhen the energy of the trajectory is decreasing from above that of the vertex. For the doublewell potential, we count the transitions into each component for a long trajectory with total time T = 2000 (with burn-in time T burn = 20) when the energy decreases from above . The simulationparameters are ε = 1 e -5, τ = 5 e -7 and δ = 0 . N left = 3719 , and N right = 3659 , where N left denotes the number of times the trajectory goes to the left well and N right for the rightwell during the time period [ T burn , T ]. Note that the empirical probability of going to the left wellis 0 . . U ( x, y ) = 14 ( x − − x + y . The tilting by − x moves the local minima and the saddle point of the potential to approximately( − . , . , − . , N left = 2485 , and N right = 3704 , so that the empirical probability of going to the left well is 0 . NALYSIS OF MULTISCALE INTEGRATORS FOR MULTIPLE ATTRACTORS AND IRREVERSIBLE LANGEVIN SAMPLERS9

This is consistent with the theoretical results for the multiple well case that we will establish insection 5. 4.

The averaging problem

The irreversible perturbations with a small ǫ induce a fast motion on the constant potentialsurface and slow motion in the orthogonal direction. Using the theory of diﬀusions on graphsand the related averaging principle, see [5, 16, 18], we may identify the limiting motion of the slowcomponent, see [26]. The fast motion on constant potential surfaces decreases the variance of theestimator as the phase space is explored more eﬃciently. Let us consider the level set d ( x ) = { z ∈ E : U ( z ) = x } , where E ⊆ R d denotes the state space. We then denote by d i ( x ) the connected components of d ( x ),i.e., d ( x ) = [ i d i ( x ) . We deﬁne Γ to be the graph which is homeomorphic to the set of connected components d i ( x )of the level sets d ( x ). Exterior vertexes correspond to minima of U , whereas interior vertexescorrespond to saddle points of U . The edges of Γ are indexed by I , · · · , I m . Each point on Γ isindexed by a pair y = ( x, i ) where x is the value of U on the level set corresponding to y and i is the edge number containing y . Clearly the pair y = ( x, i ) forms a global coordinate on Γ. Let Q : E Γ with Q ( z ) = ( U ( z ) , i ( z )) be the corresponding projection on the graph. For an edge I k and a vertex O j we write I k ∼ O j if O j lies at the boundary of the edge I k . We endow the tree Γwith the natural topology. It is known that Γ forms a graph with interior vertexes of order two orthree, see for example [19].If the dynamical system ˙ z t = C ( z t ) does not have a unique invariant measure on each connectedcomponent d i ( x ), then we may need to regularize the problem by introducing an additional artiﬁcialnoise component in the fast dynamics, i.e.,(10) dZ ǫt = h −∇ U ( Z ǫt ) dt + p βdW t i + (cid:20) ǫ e C ( Z ǫt ) dt + r κǫ σ ( Z ǫt ) dW ot (cid:21) . Here, W and W o are independent standard Wiener processes, the matrix σ will be speciﬁed below,and we have deﬁned e C i ( z ) = C i ( z ) + κ d X j =1 ∂ (cid:2) σσ T ( z ) (cid:3) j,i ∂z j , i = 1 , . . . , d. If κ = 0 then the fast motion is the deterministic dynamical system ˙ z t = C ( z t ) and Z ǫt is arandom perturbation of this dynamical system. For example, if d is even we can take C to bethe Hamiltonian vector ﬁeld C ( z ) = J ∇ U ( z ). If κ > C ( z ), U ( z ) and σ ( z ) in order to guarantee that theaveraging principle applies to (10). We make these assumptions in order to guarantee that the fastprocess has a unique invariant measure and will have U as a smooth ﬁrst integral.Let us next identify the corresponding fast and slow components. The fast motion correspondsto the inﬁnitesimal generator(11) b L g ( z ) = e C ( z ) ∇ g ( z ) + κ (cid:2) σσ T ( z ) D g ( z ) (cid:3) = C ( z ) ∇ g ( z ) + κ ∇ (cid:2) σσ T ( z ) ∇ g ( z ) (cid:3) . Let us write b Z t for the diﬀusion process that has inﬁnitesimal generator b L . In order to guaranteethe existence of a unique invariant measure for the fast dynamics we assume: Condition 1.

In dimension d = 2 , we take κ ≥ . In dimension d > , we either assume that thedynamical system ˙ z t = C ( z t ) has a unique invariant measure on each connected component d i ( x ) ,in which case we take κ ≥ , or otherwise we assume that κ > . As far as the potential function U ( z ) and the perturbation C ( z ) are concerned, we shall assumeCondition 2. Condition 2 guarantees that C ( z ) does not aﬀect the invariant measure of the processand it imposes natural growth and structural conditions on the potential U . Condition 2.

The potential function U ( z ) and the perturbation C ( z ) satisfy (1) There exists a > such that U ∈ C (2+ a ) ( E ) and C ∈ C (1+ a ) ( E ) . (2) U ( z ) ≥ A | z | , ∇ U ( z ) ≥ A | z | and ∆ U ( z ) ≥ A for suﬃciently large | z | , where A , A , A are positive constants. (3) div C ( z ) = 0 and C ( z ) · ∇ U ( z ) = 0 . (4) U has a ﬁnite number of critical points z , . . . , z m and at these points the Hessian matrix D U is non-degenerate. (5) There is at most one critical point for each connected level set component of U . (6) If z k is a critical point of U , then there exists a constant d k > such that C ( z ) ≤ d k | z − z k | . (7) If d = 2 and κ = 0 , then C ( z ) = 0 implies ∇ U ( z ) = 0 and for any saddle point z k of U ( z ) ,there exists a constant c k > such that | C ( z ) | ≥ c k | z − z k | . In regards to the additional artiﬁcial perturbation by the noise W ot , i.e., when κ >

0, we assumeCondition 3. Condition 3 guarantees that the extra regularization with the noise does not aﬀectthe invariant measure and that it is such that the subsequent averaging analysis goes through, see[16].

Condition 3. (1)

The matrix σ ( z ) σ T ( z ) is symmetric, non-negative deﬁnite, and with smoothentries. (2) σ ( z ) σ T ( z ) ∇ U ( z ) = 0 for all z ∈ E . (3) For any z ∈ E and ξ ∈ R d such that ξ · ∇ U ( z ) = 0 we have that λ ( z ) | ξ | ≤ ξ T σ ( z ) σ T ( z ) ξ ≤ λ ( z ) | ξ | where λ ( z ) > if ∇ U ( z ) = 0 and there exists a constant K such that λ ( z ) < K for all z ∈ E . Moreover if z k is a critical point for U , then there are positive constants k , k such that for all z in a neighborhood of z k λ ( z ) ≥ k | z − z k | , and λ ( z ) ≤ k | z − z k | . (4) Let λ i,k be the eigenvalues of the Hessian of U ( z ) at the critical points z k where k = 1 , · · · , m and i = 1 , · · · , d . Then we assume that < κ ≪ ( K max i,k λ i,k ) − . We remark here that the end result does not depend on the additional regularizing noise, since σ ( z ) does not appear in the limiting dynamics. Recall that b Z t is the diﬀusion process that hasinﬁnitesimal generator b L . Conditions 2 and 3 guarantee that with probability one, if the initialpoint of b Z is in a connected component d i ( x ), then b Z t ∈ d i ( x ) for all t ≥

0. Indeed, by Itˆo formulawe have U ( b Z t ) = U ( b Z ) + Z t b L U ( b Z s ) ds + Z t ∇ U ( b Z s ) σ ( b Z s ) dW s . Since C ( z ) ∇ U ( z ) = 0 and σ ( z ) σ T ( z ) ∇ U ( z ) = 0 we obtain with probability one R t b L U ( b Z s ) ds = 0.The quadratic variation of the stochastic integral is also zero, due to σ ( z ) σ T ( z ) ∇ U ( z ) = 0, which NALYSIS OF MULTISCALE INTEGRATORS FOR MULTIPLE ATTRACTORS AND IRREVERSIBLE LANGEVIN SAMPLERS11 implies that with probability one R t ∇ U ( b Z s ) σ ( b Z s ) dW s = 0. Thus, we indeed get that for all t ≥ b Z t ∈ d i ( x ) given that the initial point belongs to the particular connected component of the levelset d i ( x ). In particular, Itˆo formula gives for a test function f ∈ C ( R ) f ( U ( Z εt )) = f ( U ( z )) + Z t (cid:16)h −|∇ U ( Z εs ) | + β tr (cid:2) D U ( Z εs ) (cid:3)i f ′ ( U ( Z εs )) + β |∇ U ( Z εs ) | f ′′ ( U ( Z εs )) (cid:17) ds + p β Z t f ′ ( U ( Z εs )) ∇ U ( Z εs ) dW s where Z ε is the solution to (10).Let m ( z ) be a smooth invariant density with respect to Lebesgue measure for the process b Z t .The fact that m ( z ) exists and is smooth follows from the discussion in Section 2.3 of [16]. Then, theproof of [16, Lemma 2.3] and the fact that t ≥ b Z t ∈ d i ( x ) if b Z ∈ d i ( x ) imply that if ( x, i ) ∈ Γ isnot a vertex, there exists a unique invariant measure µ x,i concentrated on the connected component d i ( x ) of d ( x ) which takes the form µ x,i ( A ) = 1 T i ( x ) I A m ( z ) |∇ U ( z ) | ℓ ( dz ) , where ℓ ( dz ) is the surface measure on d i ( x ) and T i ( x ) = H d i ( x ) m ( z ) |∇ U ( z ) | ℓ ( dz ). Notice that if ( x, i ) ∈ Γis not a vertex, then the invariant density on d i ( x ) is m x,i ( z ) = m ( z ) T i ( x ) |∇ U ( z ) | , z ∈ d i ( x ) . We remark here that in the case κ >

0, it is relatively easy to see that independently of the formof the matrix σ ( z ) σ T ( z ), the fact that div( C ) = 0 implies that the Lebesgue measure is invariantfor the diﬀusion process corresponding to the operator b L . Hence, in that case any constant functionis an invariant density. Also, in the case d = 2 and κ = 0, one immediately obtains from Condition2 that m ( z ) = |∇ U ( z ) || C ( z ) | , see [16, Proposition 2.1].Given a suﬃciently smooth function f ( z ), deﬁne its average over the related connected componentof the level set of U ( z ) by b f ( x, i ) = I d i ( x ) f ( z ) m x,i ( z ) ℓ ( dz ) = 1 T i ( x ) I d i ( x ) f ( z ) |∇ U ( z ) | m ( z ) ℓ ( dz ) . We write L for the inﬁnitesimal generator of the process Z t given by (2) with C ( z ) = 0. Let usthen set d L U ( x, i ) = I d i ( x ) L U ( z ) m x,i ( z ) ℓ ( dz ) = 1 T i ( x ) I d i ( x ) L U ( z ) |∇ U ( z ) | m ( z ) ℓ ( dz ) , b A ( x, i ) = I d i ( x ) β |∇ U ( z ) | m x,i ( z ) ℓ ( dz ) = 1 T i ( x ) I d i ( x ) β ∇ U ( z ) · ∇ U ( z ) |∇ U ( z ) | m ( z ) ℓ ( dz ) . and then consider the one-dimensional process Y t on the branch I i governed by the inﬁnitesimalgenerator(12) L Yi g ( x ) = d L U ( x, i ) g ′ ( x ) + 12 b A ( x, i ) g ′′ ( x ) . Within each edge I i of Γ, Q ( Z εt ) = ( U ( Z εt ) , i ( Z εt )) converges as ε ↓ L Yi . In order to uniquely deﬁne the limiting process, we need to specify the behavior atthe vertexes of the tree, which amounts to imposing restrictions on the domain of deﬁnition of thegenerator, denoted by L Y , of the Markov process. For this purpose, we have the following deﬁnition Deﬁnition 4.

We say that g belongs in the domain of deﬁnition of L Y , denoted by D ( L Y ), of thediﬀusion Y , if(1) The function g ( x ) is twice continuously diﬀerentiable in the interior of an edge I i .(2) The function x

7→ L Yi g ( x ) is continuous on Γ.(3) At each interior vertex O j with edges I i that meet at O j , the following gluing conditionholds X i : I i ∼ O j ± b ji D i g ( O j ) = 0where, if γ ji is the separatrices curves that meet at O j , we have set b ji = I γ ji β |∇ U ( z ) | |∇ U ( z ) | m ( z ) ℓ ( dz ) = 2 β I γ ji |∇ U ( z ) | m ( z ) ℓ ( dz ) . Here one chooses + or − depending on whether the value of U increases or decreasesrespectively along the edge I i as we approach O j . D i represents the derivative in thedirection of the edge I i .Moreover, within each edge I i the process Y t is a diﬀusion process with inﬁnitesimal generator L Yi .Consider now the process Y t that has the aforementioned L Y as its inﬁnitesimal generator withdomain of deﬁnition D ( L Y ), as deﬁned in Deﬁnition 4. Such a process is a continuous strongMarkov process, e.g., [19, Chapter 8]. Then, for any T > Q ( Z εt ) converges weakly in C ([0 , T ]; Γ)to the Markov process Y t on the tree as ε ↓

0. In particular, we have the following theorem.

Theorem 5 (Theorem 2.1 of [16]) . Let Z εt be the process that satisﬁes (10). Assume Conditions1, 2 and 3. Let T > and consider the Markov process on the tree { Y t , t ∈ [0 , T ] } as deﬁned byDeﬁnition 4. We have Q ( Z ε · ) → Y · , weakly in C ([0 , T ]; Γ) , as ε ↓ . Analysis of the numerical HMM method

To analyze the numerical scheme, following [30], let us assume that there exists a random variableΦ αh ( z ) and an h such that for all 0 < h ≤ h and α = 0 or α = ε one has the estimate (cid:18) E (cid:12)(cid:12)(cid:12) Φ αh ( z ) − z + h ∇ U ( z ) − αh e C ( z ) − √ h p βξ − √ h √ κασ ( z ) ξ ′ (cid:12)(cid:12)(cid:12) (cid:19) / ≤ Ch / (1 + α ) / , (13)where ξ, ξ ′ are independent standard normal random variable. Then, the algorithm becomes¯ Z ε = z ¯ Z ε ( κ +1) δ = (cid:18) Φ δ − τ ◦ Φ ε τ (cid:19) ( ¯ Z εκδ )(14)Recall from Theorem 5 that it is important to keep in mind that Z εt does not converge tosomewhere when ε ↓

0. What converges to somewhere, i.e., to the diﬀusion on the tree, is Q ( Z ε ) =( U ( Z ε ) , i ( Z ε )). Then, we have the following theorem. Theorem 6.

Assume the conditions of Theorem 5 and that ε, δ, τ ↓ are such that δετ , τε , (cid:0) τε (cid:1) / δ ↓ . Then, for τ < δ < τε ≪ suﬃciently small, the process Q ( ¯ Z εnδ ) = ( U ( ¯ Z εnδ ) , i ( ¯ Z εnδ )) (where ¯ Z ε is the process from (14)) converges in distribution to the process Y · as deﬁned in Deﬁnition 4. In NALYSIS OF MULTISCALE INTEGRATORS FOR MULTIPLE ATTRACTORS AND IRREVERSIBLE LANGEVIN SAMPLERS13 addition, convergence to the invariant measure µ of the Y process holds, in the sense that for anybounded and uniformly Lipschitz test function φ we have that for all t > h ↓ lim ε,δ, δετ , τε , ( τε ) / δ ↓ h Z t + ht E π φ ( ¯ Z εs ) ds = E µ b φ ( Y t ) , where π is the invariant measure of the continuous process Z ε . The proof of this theorem is done in two steps. In the ﬁrst step, in Section 5.1, we consider thecase of a single well. Then, in the second step in Section 5.2, we complete the proof by consideringthe general multiple well case.5.1.

The case of one well.

Let us assume that there is only one well, i.e., that for any

T > s ∈ [0 , T ], we have i ( ¯ Z εs ) = 1. In this case, we simply have Q ( ¯ Z ε · ) = ( U ( ¯ Z ε · ) ,

1) and we areinterested in the asymptotic behavior of the process Q s = U ( ¯ Z εs ). Going back to (13) we have thefollowing lemma. Lemma 1.

Consider z such that |∇ U ( z ) | ≤ C < ∞ . Let us deﬁne Ψ αh ( z ) = U (Φ αh ( z )) . Then, thereexists h < ∞ such that for all < h ≤ h and α = 0 or α = ε , one has the estimate (cid:18) E (cid:12)(cid:12)(cid:12) Ψ αh ( z ) − U ( z ) − h (cid:0) −|∇ U ( z ) | + β tr (cid:2) D U ( z ) (cid:3)(cid:1) − √ h p β ∇ U ( z ) · ξ (cid:12)(cid:12)(cid:12) (cid:19) / ≤≤ Ch / (1 + α ) / , where ξ is a standard multidimensional normal random variable.Proof. By applying Taylor expansion to U (Φ αh ( z )) up to second order with respect to 0 < h ≪ h and using (13) we get for h suﬃciently small U (Φ αh ( z )) = U ( z ) + ∇ U ( z ) (Φ αh ( z ) − z ) + 12 (Φ αh ( z ) − z ) T D U ( z ) (Φ αh ( z ) − z ) + o ((Φ αh ( z ) − z ) )= U ( z ) + ∇ U ( z ) (cid:16) h [ −∇ U ( z ) + α e C ( z )] + √ h p βξ + √ h √ κασ ( z ) ξ ′ + I ( α, h ) (cid:17) + 12 (cid:16) h [ −∇ U ( z ) + α e C ( z )] + √ h p βξ + √ h √ κασ ( z ) ξ ′ + I ( α, h ) (cid:17) T D U ( z ) ×× (cid:16) h [ −∇ U ( z ) + α e C ( z )] + √ h p βξ + √ h √ κασ ( z ) ξ ′ + I ( α, h ) (cid:17) + o ((Φ αh ( z ) − z ) ) , where (cid:0) E I ( α, h ) (cid:1) / ≤ Ch / (1 + α ) / . Using now the assumptions from Conditions 2 and 3 that ∇ U ( z ) C ( z ) = 0, σ ( z ) σ T ( z ) ∇ U ( z ) = 0 and expanding the quadratic term, the previous expressionsimpliﬁes to U (Φ αh ( z )) = U ( z ) + (cid:0) h (cid:2) −|∇ U ( z ) | + β tr (cid:2) D U ( z ) (cid:3)(cid:3)(cid:1) + √ h p β ∇ U ( z ) ξ + ∇ U ( z ) I ( α, h ) + R ( α, h ) , where (cid:0) E R ( α, h ) (cid:1) / = o ( h / (1 + α ) / ). The latter, essentially concludes the proof of the lemma. (cid:3) For notational convenience, let us deﬁne the operator on test functions f ∈ C ( R ), L Q f ( z ) = (cid:2) −|∇ U ( z ) | + β tr (cid:2) D U ( z ) (cid:3)(cid:3) f ′ ( U ( z )) + β |∇ U ( z ) | f ′′ ( U ( z ))(15)Next we have the following lemma for the numerical approximation HMM scheme (14). Lemma 2.

Let f ∈ C ( R ) . Then for ¯ Z εt given by (14) we have, as τ < δ < τ /ε , δ, τε ↓ E (cid:16) f ( U ( ¯ Z ε ( n +1) δ )) − f ( U ( ¯ Z εnδ )) (cid:17) = δ E L Q f ( ¯ Z εnδ ) + O (cid:18) δ / + (cid:16) τε (cid:17) / (cid:19) . Proof.

We start by noticing that Lemma 1 implies that U ( ¯ Z εnδ + τ ) = U ( ¯ Z εnδ ) + τ (cid:0) −|∇ U ( ¯ Z εnδ ) | + β tr (cid:2) D U ( ¯ Z εnδ ) (cid:3)(cid:1) + √ τ p β ∇ U ( ¯ Z εnδ ) ξ n + R ,n , where (cid:0) E R ,n (cid:1) / ≤ C (cid:0) τε (cid:1) / . The last display and smoothness of the test function f , implies that E f ( U ( ¯ Z εnδ + τ )) = E f ( U ( ¯ Z εnδ )) + τ E L Q f ( ¯ Z εnδ ) + E R ,n , where with some abuse of notation we still denote R ,n the error term which again satisﬁes (cid:0) E R ,n (cid:1) / ≤ C (cid:0) τε (cid:1) / .In a similar manner, we also obtain that E f ( U ( ¯ Z ε ( n +1) δ )) = E f ( U ( ¯ Z εnδ + τ )) + ( δ − τ ) E L Q f ( ¯ Z εnδ + τ ) + E R ,n , where (cid:0) E R ,n (cid:1) / ≤ C ( δ − τ ) / .Hence, we get E (cid:16) f ( U ( ¯ Z ε ( n +1) δ )) − f ( U ( ¯ Z εnδ )) (cid:17) = δ E L Q f ( ¯ Z εnδ )+ ( δ − τ ) E (cid:0) L Q f ( ¯ Z εnδ ) − L Q f ( ¯ Z εnδ + τ ) (cid:1) + E R ,n + E R ,n . We further notice that by the regularity of U and f we have (cid:16) E (cid:0) L Q f ( ¯ Z εnδ ) − L Q f ( ¯ Z εnδ + τ ) (cid:1) (cid:17) / ≤ C (cid:16) E (cid:12)(cid:12) ¯ Z εnδ + τ − ¯ Z εnδ (cid:12)(cid:12) (cid:17) / ≤ C ( √ τ + τ /ε ) ≤ C ( √ δ + τ /ε ) . Putting the estimates together we obtain the statement of the lemma. (cid:3)

Let us recall now the operator L Q f ( z ) deﬁned by (15) and let us recall the “averaged” generator L Y = L Y deﬁned via (12) (recall that the single edge case is considered at the moment). We wantto prove that the process U ( ¯ Z εnδ ) converges in distribution to the process with generator L Y .For this purpose, we may use Theorem 1 of [29, Chapter 2]. By Lemma 2 we have(16) E (cid:20) f ( U ( ¯ Z εnδ )) − f ( U ( z )) − Z nδ L Y f ( U ( ¯ Z εs )) ds (cid:21) == E n − X k =0 " f ( U ( ¯ Z ε ( k +1) δ )) − f ( U ( ¯ Z εkδ )) − Z ( k +1) δkδ L Y f ( U ( ¯ Z εs )) ds = δ n − X k =0 E (cid:2) L Q f ( ¯ Z εkδ ) − L Y f ( U ( ¯ Z εkδ )) (cid:3) + nδ E I = E Z nδ (cid:2) L Q f ( ¯ Z εs ) − L Y f ( U ( ¯ Z εs )) (cid:3) ds + nδ E I , where (cid:0) E I (cid:1) / ≤ C (cid:16) δ / + (cid:0) τε (cid:1) / (cid:17) .Notice that L Y f ( U ( z )) = d L Q f ( U ( z )). Essentially, for a nice function g = L Q f , we need tightestimates for E R nδ (cid:2) g ( ¯ Z εs ) − b g ( U ( ¯ Z εs )) (cid:3) ds , where we recall that ¯ Z εs is the approximating process NALYSIS OF MULTISCALE INTEGRATORS FOR MULTIPLE ATTRACTORS AND IRREVERSIBLE LANGEVIN SAMPLERS15 and b g is the average on graph as deﬁned in (4). We can write E Z nδ (cid:2) L Q f ( ¯ Z εs ) − L Y f ( U ( ¯ Z εs )) (cid:3) ds = nδ " n n − X k =0 E L Q f ( ¯ Z εkδ ) − τ Z ( k +1) τkτ E L Q f ( Z εs ) ds ! + nδ (cid:20) nτ Z nτ E L Q f ( Z εs ) ds − nτ Z nτ E L Y f ( U ( Z εs )) ds (cid:21) + nδ " n n − X k =0 Z ( k +1) τkτ E L Y f ( U ( Z εs )) ds − E L Y f ( U ( ¯ Z εkδ )) ! = nδ [ J n + J n + J n ] . By the estimate (A.103)-(A.104) of [30] we have that for an unimportant constant

C < ∞| J n + J n | ≤ C (cid:18)r τε + √ nδ + nδ + n (cid:16) τε (cid:17) / (cid:19) e C nτε . (17)It remains to treat the term J n = nτ R nτ E L Q f ( Z εs ) ds − nτ R nτ E L Y f ( U ( Z εs )) ds . Standard PDEarguments, e.g., [16, Section 3.2], show that for any point ( z, i ) ∈ I that is not a vertex, and for g ∈ C α , the PDE(18) − b L u ( z ) = g ( z ) − b g ( U ( z )) , for z ∈ d ( x ) , has a unique (up to constants) C α ′ solution with α ′ ∈ (0 , α ). We ﬁx the free constant by setting b u ( x,

1) = 0. Then, the solution u ( z ) can be written as u ( z ) = Z ∞ E z h g ( b Z s ) − b g ( U ( b Z s )) i ds. Moreover, there exist a constant λ = λ ( z, > z ∈ d ( x ), | u ( z ) | ≤ λ sup z ∈ d ( x ) (cid:12)(cid:12) g ( z ) − b g ( U ( z )) (cid:12)(cid:12) . Notice that in the case that we can take κ = 0, i.e. , when the dynamical system ˙ z t = C ( z t )has a unique invariant measure on the connected component d ( x ), then we simply have b L u ( z ) = C ( z ) ∇ u ( z ). If we cannot take κ = 0, then b L u ( z ) is given by (11). Applying Itˆo’s formula to u weobtain that u ( Z εnτ ) = u ( z ) + 1 ε Z nτ b L u ( Z εs ) ds + Z nτ L u ( Z εs ) ds + r κε Z nδ ∇ u ( Z εs ) σ ( Z εs ) dW os + p β Z nδ ∇ u ( Z εs ) dW s . Recalling now that u solves (18), we obtain by rearranging the last display and taking expectedvalue J n = 1 nτ Z nτ E L Q f ( Z εs ) ds − nτ Z nτ E L Y f ( U ( Z εs )) ds = εnτ (cid:20) E ( u ( Z εnτ ) − u ( z )) + E Z nτ L u ( Z εs ) ds (cid:21) , which then, due to the boundedness of u and its derivatives, gives | J n | ≤ C (cid:16) εnτ + ε (cid:17) . (19) Thus, using estimates (17)-(19), (16) gives (cid:12)(cid:12)(cid:12)(cid:12) E (cid:20) f ( U ( ¯ Z εnδ )) − f ( U ( z )) − Z nδ L Y f ( U ( ¯ Z εs )) ds (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) ≤≤ Cnδ (cid:20)(cid:18)r τε + √ nδ + nδ + n (cid:16) τε (cid:17) / (cid:19) e C nτε + εnτ + ε + δ / + (cid:16) τε (cid:17) / (cid:21) . (20)Choosing now n such that p nτε e C nτε ∼ (cid:0) τεδ (cid:1) / , and recalling the requirement τ < δ < τε ≪ nδ (cid:12)(cid:12)(cid:12)(cid:12) E (cid:20) f ( U ( ¯ Z εnδ )) − f ( U ( z )) − Z nδ L Y f ( U ( ¯ Z εs )) ds (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) ≤≤ C "(cid:18) δετ (cid:19) / + (cid:18) δετ (cid:19) / + (cid:16) τε (cid:17) / δ r δετ + 1log (cid:0) τεδ (cid:1) + τδ + δ / + (cid:16) τε (cid:17) / → , as δετ , (cid:16) τε (cid:17) / δ ↓ . Hence, by Theorem 1 of [29, Chapter 2], we have obtained that U ( ¯ Z εnδ ) converges in distributionto the process Y on the graph (for the moment with just one edge) with generator L Y .Let us next discuss convergence to the invariant measure. Since the invariant measure for theoriginal process Z ε is the Gibbs measure π , we get that the invariant measure for the process Y onthe tree Γ is nothing else but the projection of π on Γ, say µ . In particular for any Borel set γ ⊂ Γ,we have µ ( γ ) = π (Γ − ( γ )).Then, from the weak convergence of Q ( ¯ Z εnδ ) to the process Y and the uniform mixing propertiesof Z εt and Y t , we get that for any bounded and uniformly Lipschitz test function φ that for all t > h ↓ lim ε,δ, δετ , τε , ( τε ) / δ ↓ h Z t + ht E π φ ( ¯ Z εs ) ds = lim h ↓ lim ε,δ, δετ , τε , ( τε ) / δ ↓ h Z t + ht E π b φ ( Q ( ¯ Z εs )) ds = E µ b φ ( Y t ) . (21)The latter establishes Theorem 6 in the one well case.5.2. The multi-well case.

The goal of this section is to establish that Theorem 6 holds in thegeneral multi-well case, i.e., when m >

1. First we need to deﬁne certain objects. Let us consider θ > I i of the graph set I θi = { ( x, i ) ∈ I i : dist(( x, i ) , ∂I i ) > θ } and deﬁne ¯ τ i = inf { t > Q ( ¯ Z εt ) / ∈ I θi } . Thus, I θi is the interior part of the edge I i and ¯ τ i is the ﬁrst exit time of the interior part. NALYSIS OF MULTISCALE INTEGRATORS FOR MULTIPLE ATTRACTORS AND IRREVERSIBLE LANGEVIN SAMPLERS17

In addition, for ζ > O j and a segment I i ∼ O j , let us deﬁne thefollowing quantities as in [17, Chapter 8]: D i = { z ∈ E : Q ( z ) ∈ I ◦ i } D i ( U , U ) = { z ∈ D i : U < U ( z ) < U } D j ( ± ζ ) = { z ∈ E : U ( O j ) − ζ < U ( z ) < U ( O j ) + ζ } D ( ± ζ ) = [ j D j ( ± ζ ) C j = { z ∈ E : Q ( z ) = O j } C ji ( ζ ) = { z ∈ D i : U ( z ) = U ( O j ) ± ζ } C ji = C j \ ∂D i C i ( U ) = (cid:8) z ∈ ¯ D i : U ( z ) = U (cid:9) . Here I ◦ i denotes the open interior of I i . Let us then also deﬁne the ﬁrst exit time of the process¯ Z εt from D j ( ± ζ ) as follows ¯ τ εj ( ± ζ ) = inf { t > Z εt / ∈ D j ( ± ζ ) } . Following the proof of [17, Theorem 8.2.2], see also [19], the statement of Theorem 6 will follow ifwe show that in the limit as ε, δ, τ ↓ Z ε · behaves within a given i well according tothe generator L Yi , it spends zero time in exterior and interior vertices and that the probabilisticbehavior at the vertices leads to the gluing condition of Deﬁnition 4. To be precise, following theproof of [17, Theorem 8.2.2], Theorem 6 follows if we prove Lemmas 3, 4, 5 and 6 below. Lemma 3.

Let f ∈ C b ( R ) and θ > such that I θi = ∅ for all i ∈ { , · · · , m } . Assume the conditionsof Theorem 6. Then, uniformly in z ∈ D θi = { z ∈ E : Q ( z ) ⊂ I θi } , we have that (22) (cid:12)(cid:12)(cid:12)(cid:12) E (cid:20) f ( U ( ¯ Z εnδ ∧ ¯ τ i )) − f ( U ( z )) − Z nδ ∧ ¯ τ i L Y f ( U ( ¯ Z εs )) ds (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cnδ (cid:18)(cid:18)r τε + √ nδ + nδ + n (cid:16) τε (cid:17) / (cid:19) e C nτε + εnτ + τδ + δ / + (cid:16) τε (cid:17) / (cid:19) , for some constant C < ∞ . In particular, choosing n such that p nτε e C nτε ∼ (cid:0) τεδ (cid:1) / , we obtain that nδ (cid:12)(cid:12)(cid:12)(cid:12) E (cid:20) f ( U ( ¯ Z εnδ ∧ ¯ τ i )) − f ( U ( z )) − Z nδ ∧ ¯ τ i L Y f ( U ( ¯ Z εs )) ds (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) ≤≤ C "(cid:18) δετ (cid:19) / + (cid:18) δετ (cid:19) / + (cid:16) τε (cid:17) / δ r δετ + 1log (cid:0) τεδ (cid:1) + τδ + δ / + (cid:16) τε (cid:17) / → , as δετ , (cid:16) τε (cid:17) / δ ↓ . Lemma 3 follows directly by the arguments of Section 5.1. In particular Lemma 3 implies thatif δ, δετ , τε , (cid:0) τε (cid:1) / δ ↓

0, then the process Q ( ¯ Z εnδ ∧ ¯ τ i ) converges in distribution, within edge I i , to theprocess with generator L Yi as deﬁned by (12). Lemma 4.

Let O j be an exterior vertex of the graph Γ . Assume the conditions of Theorem 6.Then, there exists ζ > , such that for all < ζ ≤ ζ and for all z ∈ D j ( ± ζ ) , there exists a constant C < ∞ such that E z ¯ τ εj ( ± ζ ) ≤ C ( ζ + δ + ( τ /ε ) / ) . In other words, for every η > and for < max { δ, ( τ /ε ) / } < ζ ≤ ζ suﬃciently small, we havethat E z ¯ τ εj ( ± ζ ) ≤ η. Lemma 5.

Let O j be an interior vertex of the graph Γ . Assume the conditions of Theorem 6.Then, there exists ζ > ε α for some exponent α > , such that for all < ε α < ζ ≤ ζ and for all z ∈ D j ( ± ζ ) E z ¯ τ εj ( ± ζ ) ≤ Cζ | ln ζ | . In other words for every η > , there exists ζ > ε α for some exponent α > , such that for all < ε α < ζ ≤ ζ and for all z ∈ D j ( ± ζ ) E z ¯ τ εj ( ± ζ ) ≤ ηζ. Lemma 6.

For I i ∼ O j deﬁne b ji as in Deﬁnition 4 and set p ji = b ji P i : Ii ∼ Oj b ji . Assume theconditions of Theorem 6. Then, for every η > , there exists ζ > max { δ, ( τ /ε ) / } , such that forall < max { δ, ( τ /ε ) / } < ζ ≤ ζ there exists ζ ′ = ζ ′ ( ζ ) such that for all suﬃciently small ε, δ, τ (cid:12)(cid:12)(cid:12) P z (cid:16) ¯ Z ε ¯ τ εj ( ± ζ ) / ∈ ∂D j ( ± ζ ) ∩ I ◦ i (cid:17) − p ji (cid:12)(cid:12)(cid:12) < η. for all z ∈ D j ( ± ζ ′ ) . Lemmas 4, 5 and 6 follow as Lemmas 3.4, 3.5 and 3.6 in [17, Chapter 8]. The main diﬀerencebetween our situation and that of [17] is that we are working with the discrete approximation,which implies that we need information on the error bounds in terms of the parameters δ, ε, τ , as itwas also the case for Lemma 3. Given that the method of the proof is similar to the correspondingproofs of [17], we do not repeat all the details here.The principle idea is that Lemma 3 controls the behavior within each branch of the tree, whereasLemmas 4, 5 allow us to conclude that the approximating process spends in the limit zero time onexterior and interior vertices respectively (equivalently it spends zero time in the neighborhood ofstable and unstable points of the dynamical system). Then, Lemma 6 characterizes the splittingprobability in each interior vertex concluding the description of the limiting Markov process.In order to demonstrate the diﬀerences with the corresponding proofs of [17] and to see therole of the discrete approximation, we demonstrate the proofs of these lemmas emphasizing thediﬀerences.

Proof of Lemma 4.

Let us assume that U ( z j ) is a local minimum of U . It is clear that the followingrelation should hold | E U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) − U ( z j ) | ≥ ζ. Let us deﬁne k = max { k ∈ N : ( kδ + τ ) ∨ kδ ≤ ¯ τ εj ( ± ζ ) } . Let us assume that k is such that ( k δ + τ ) ≤ ¯ τ εj ( ± ζ ). The approach is the same if k is suchthat k δ ≤ ¯ τ εj ( ± ζ ). By adding and subtracting terms of the form U ( ¯ Z εmδ ) for m = 0 , , · · · , k we NALYSIS OF MULTISCALE INTEGRATORS FOR MULTIPLE ATTRACTORS AND IRREVERSIBLE LANGEVIN SAMPLERS19 get U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) = h U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) − U ( ¯ Z εk δ + τ ) i + U ( ¯ Z εk δ + τ )= h U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) − U ( ¯ Z εk δ + τ ) i + (cid:2) U ( ¯ Z εk δ + τ ) − U ( ¯ Z εk δ ) (cid:3) + U ( ¯ Z εk δ )= h U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) − U ( ¯ Z εk δ + τ ) i + (cid:2) U ( ¯ Z εk δ + τ ) − U ( ¯ Z εk δ ) (cid:3) + k X m =1 h U ( ¯ Z εmδ ) − U ( ¯ Z ε ( m − δ ) i + U ( z ) . Taking expected value, Lemma 2 (with the test function f ( u ) = u ) implies that for δ, τ /ε suﬃcientlysmall E U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) = E h U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) − U ( ¯ Z εk δ + τ ) i + E (cid:2) U ( ¯ Z εk δ + τ ) − U ( ¯ Z εk δ ) (cid:3) + δ E k X m =1 (cid:20) L U ( ¯ Z ε ( m − δ ) + O (cid:18) δ / + (cid:16) τε (cid:17) / δ (cid:19)(cid:21) + U ( z ) . Next we notice that up to an unimportant multiplicative constant | E U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) − U ( z ) | < ζ + δ and that the non-degeneracy of U implies that for every z ∈ D j ( ± ζ ) there is a constant C > L U ( z ) > C . Hence, we have obtained ζ + δ > E h U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) − U ( ¯ Z εk δ + τ ) i + E (cid:2) U ( ¯ Z εk δ + τ ) − U ( ¯ Z εk δ ) (cid:3) + C E ¯ τ εj ( ± ζ ) (cid:18) O (cid:18) δ / + (cid:16) τε (cid:17) / δ (cid:19)(cid:19) + C E ( δk − ¯ τ εj ( ± ζ )) (cid:18) O (cid:18) δ / + (cid:16) τε (cid:17) / δ (cid:19)(cid:19) ≥ E h U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) − U ( ¯ Z εk δ + τ ) i + E (cid:2) U ( ¯ Z εk δ + τ ) − U ( ¯ Z εk δ ) (cid:3) + C E ¯ τ εj ( ± ζ ) (cid:18) O (cid:18) δ / + (cid:16) τε (cid:17) / δ (cid:19)(cid:19) − δ (cid:18) O (cid:18) δ / + (cid:16) τε (cid:17) / δ (cid:19)(cid:19) . The latter inequality follows since by the deﬁnition of k we have that | δk − ¯ τ εj ( ± ζ ) | < δ .Rearranging the latter expression, we obtain for some unimportant constants 0 < C i < ∞ E ¯ τ εj ( ± ζ ) ≤ C ζ + 2 δ + δ / + ( τǫ ) / + (cid:12)(cid:12)(cid:12) E h U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) − U ( ¯ Z εk δ + τ ) i(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12) E (cid:2) U ( ¯ Z εk δ + τ ) − U ( ¯ Z εk δ ) (cid:3)(cid:12)(cid:12) O (cid:16) δ / + (cid:0) τε (cid:1) / δ (cid:17) ≤ C ζ + 2 δ + δ / + ( τǫ ) / + ( δ + τ ) + ( δ / + ( τ /ε ) / )1 + O (cid:16) δ / + (cid:0) τε (cid:1) / δ (cid:17) , which implies that for 0 < τ < δ < τε ≪ (cid:0) τε (cid:1) / δ ↓

0, we get E ¯ τ εj ( ± ζ ) ≤ C (cid:16) ζ + δ + ( τǫ ) / (cid:17) , or, in other words if we choose ζ > max { δ, ( τǫ ) / } , we indeed obtain that E ¯ τ εj ( ± ζ ) ≤ C ζ, from which the statement of the lemma follows. (cid:3) Proof of Lemma 5.

We start with the following usage of Lemma 2, U ( ¯ Z εnδ ) = n − X k =0 h U ( ¯ Z ε ( k +1) δ ) − U ( ¯ Z εkδ ) i + U ( z )= n − X k =0 h U ( ¯ Z ε ( k +1) δ ) − U ( ¯ Z εkδ + τ ) i + n − X k =0 (cid:2) U ( ¯ Z εkδ + τ ) − U ( ¯ Z εkδ ) (cid:3) + U ( z )= n − X k =0 (cid:2) ( δ − τ ) L U ( ¯ Z εkδ + τ ) + τ L U ( ¯ Z εkδ ) (cid:3) ++ p β n − X k =0 h √ δ − τ ∇ U ( ¯ Z εkδ + τ ) ξ ′ k + √ τ ∇ U ( ¯ Z εkδ ) ξ k i + n − X k =0 [ R ,k + R ,k ] + U ( z ) , where ξ ′ k , ξ k are independent standard normal random variables and R ,k , R ,k are as in the proofof Lemma 2. Using the independence of the involved normal random variables, we can then writethat in distribution U ( ¯ Z εnδ ) = n − X k =0 I δ,τ ,k + N , n − X k =0 I δ,τ ,k ! + n − X k =0 [ R ,k + R ,k ] + U ( z ) , (23)where I δ,τ ,k = (cid:2) ( δ − τ )( L U ( ¯ Z εkδ + τ ) − L U ( ¯ Z εkδ )) + δ L U ( ¯ Z εkδ ) (cid:3) ,I δ,τ ,k = 2 β (cid:2) ( δ − τ ) |∇ U ( ¯ Z εkδ + τ ) | + τ |∇ U ( ¯ Z εkδ ) | (cid:3) and N (cid:16) , P n − k =0 I δ,τ ,k (cid:17) represents a normal random variable with mean zero and variance P n − k =0 I δ,τ ,k .Let us recall now that  E n − X k =0 [ R ,k + R ,k ] !  / = O ( nδ / + n ( τ /ε ) / ) . We have that ¯ τ εj ( ± ζ ) is less or equal to the time when the random variable | U ( ¯ Z εnδ ) − U ( z ) | reaches the level 2 ζ . This happens if the term P n − k =0 h I δ,τ ,k + R ,k + R ,k i is small in absolute value,while the term N (cid:16) , β P n − k =0 ( δ − τ ) |∇ U ( ¯ Z εkδ + τ ) | + τ |∇ U ( ¯ Z εkδ ) | (cid:17) is large. In other words wehave the inclusion ( n − X k =0 h I δ,τ ,k + R ,k + R ,k i < ζ, N , n − X k =0 I δ,τ ,k ! > ζ ) ⊆ (cid:8) ¯ τ εj ( ± ζ ) < nδ (cid:9) . (24)We also have (cid:8) ¯ τ εj ( ± ζ ) ≥ nδ (cid:9) ⊆ ( ¯ τ εj ( ± ζ ) ≥ nδ, n − X k =0 I δ,τ ,k < ζ ) [[ ( ¯ τ εj ( ± ζ ) ≥ nδ, n − X k =0 I δ,τ ,k ≥ ζ , (cid:12)(cid:12) N (0 , ζ ) (cid:12)(cid:12) ≥ ζ ) [[ ( ¯ τ εj ( ± ζ ) ≥ nδ, n − X k =0 I δ,τ ,k ≥ ζ , (cid:12)(cid:12) N (0 , ζ ) (cid:12)(cid:12) < ζ ) . NALYSIS OF MULTISCALE INTEGRATORS FOR MULTIPLE ATTRACTORS AND IRREVERSIBLE LANGEVIN SAMPLERS21

Choose now nδ such that for the given ζ we have nδ < ζ and in particular that n − X k =0 h I δ,τ ,k + R ,k + R ,k i < ζ, for all trajectories ¯ Z ε for which ¯ τ εj ( ± ζ ) ≥ nδ . Then, by (24), the second inclusion in the last displaycannot hold. Thus we have P (cid:0) ¯ τ εj ( ± ζ ) ≥ nδ (cid:1) ≤ P ¯ τ εj ( ± ζ ) ≥ nδ, n − X k =0 I δ,τ ,k < ζ ! + P (cid:0)(cid:12)(cid:12) N (0 , ζ ) (cid:12)(cid:12) < ζ (cid:1) = P ¯ τ εj ( ± ζ ) ≥ nδ, n − X k =0 I δ,τ ,k < ζ ! + 0 . . Recall that we have chosen n such that nδ < ζ and in particular that P n − k =0 h I δ,τ ,k + R ,k + R ,k i <ζ . To be precise, the last requirement is that up to a deterministic constant n (cid:0) δ + δ / + ( τ /ε ) / (cid:1) <ζ . Let us enforce that by requiring that up to an appropriate deterministic constants ζ

0. Then, the probability of the ﬁrst term in the right hand sideof the last display can be made as small as we want, say less than 0 . n and for suﬃciently small ε, δ, τ and ζ such that ζ < n (cid:0) δ + δ / + ( τ /ε ) / (cid:1) < ζ and n (cid:0) δ + δ / + ( τ /ε ) / (cid:1) to be of the order of ζ | ln ε | , we have that P (cid:0) ¯ τ εj ( ± ζ ) ≥ nδ (cid:1) ≤ . . Then, by Markov property we obtain that P (cid:16) ¯ τ εj ( ± ζ ) ≥ N nδ (cid:17) ≤ . N , which then implies (usingthe fact that the random variable ¯ τ εj ( ± ζ ) is positive and that 0 . N is a geometric series) that upto a deterministic constant C < ∞ that may change from inequality to inequality E z ¯ τ εj ( ± ζ ) ≤ nδ − . ≤ Cζ | ln ε | ≤ Cζ | ln ζ | . The second to the last inequality of the previous display is true because nδ is chosen to be of order ζ | ln ε | and the last inequality because by assumption ζ ≥ ε α for some exponent α >

0. Thisconcludes the proof of the lemma. (cid:3)

Proof of Lemma 6.

Using [17, Lemma 8.6.2] for the discrete approximation ¯ Z εt we havelim ε,δ, τε ↓ max x ,x ∈ C i ( U ) max f : k f k≤ (cid:12)(cid:12)(cid:12) E x f ( ¯ Z ε ¯ τ ε ( U ,U ) ) − E x f ( ¯ Z ε ¯ τ ε ( U ,U ) ) (cid:12)(cid:12)(cid:12) = 0 , where f is deﬁned on ∂D i ( U , U ) and ¯ τ ε ( U , U ) is the ﬁrst time of exit of the process ¯ Z ε fromthe branch I i from either of the two sides U < U . Then, by Markov property, as in [17, Lemma8.6.3], we get that lim ε,δ, τε ↓ max x ,x ∈ C ji ( ζ ′ ) | F ε ( x ) − F ε ( x ) | = 0 , (25)where for ζ ′ < ζ , F ε ( x ) = P (cid:16) ¯ Z ε ¯ τ ε ( ± ζ ) / ∈ ∂D j ( ± ζ ) ∩ I ◦ i (cid:17) . The next thing to prove is that for every ζ > , κ > < ζ ′ < ζ such that for every ε, δ, τε suﬃciently smallmax x ,x ∈ ¯ D j ( ± ζ ′ ) | F ε ( x ) − F ε ( x ) | < κ. For edges I i ∼ O j let us set f ( x ) = 1 x/ ∈ ∂D j ( ± ζ ) ∩ I ◦ i . There are exactly three regions correspondingto I ◦ i , I ◦ i , I ◦ i that are separated by the separatrix C j . The region corresponding to I ◦ i adjoins thewhole curve C j , whereas I ◦ i , I ◦ i adjoins only part of it. In particular we have that C ji = C ji ∪ C ji .Then, as in the proof of [17, Lemma 8.3.6], it can be shown that(26) | F ε ( x ) − F ε ( x ) | ≤ " sup x ∈ ∂D j ( ± ζ ) f ( x ) − inf x ∈ ∂D j ( ± ζ ) f ( x ) × max n P x m (cid:16) ¯ Z ε ¯ τ / ∈ ( ∂D j ( ± ζ ) ∩ I ◦ i ) ∪ ( ∂D j ( ± ζ ) ∩ I ◦ i ) (cid:17) : m = 1 , o + " sup x ∈ C ji ( ζ ′ ) F ε ( x ) − inf x ∈ C ji ( ζ ′ ) F ε ( x ) , where ¯ τ is the ﬁrst time that the discrete approximation process ¯ Z εt exits ( ∂D j ( ± ζ ′ ) ∩ I ◦ i ) or( ∂D j ( ± ζ ) ∩ I ◦ i ) ∪ ( ∂D j ( ± ζ ) ∩ I ◦ i ). Clearly, we have that ¯ τ ≤ ¯ τ ε ( ± ζ ).By (25), the second additive term in (26) is arbitrarily small for suﬃciently small ε, δ, τ /ε . So, itremains to estimate P εx . = P x (cid:16) ¯ Z ε ¯ τ / ∈ ( ∂D j ( ± ζ ) ∩ I ◦ i ) ∪ ( ∂D j ( ± ζ ) ∩ I ◦ i ) (cid:17) . As in the proof of Lemma4 for an appropriate integer k and for δ, τ /ε suﬃciently small E U ( ¯ Z ε ¯ τ ) = E (cid:2) U ( ¯ Z ε ¯ τ ) − U ( ¯ Z εk δ + τ ) (cid:3) + E (cid:2) U ( ¯ Z εk δ + τ ) − U ( ¯ Z εk δ ) (cid:3) + δ E k X m =1 (cid:20) L U ( ¯ Z ε ( m − δ ) + O (cid:18) δ / + (cid:16) τε (cid:17) / δ (cid:19)(cid:21) + U ( z ) . (27)Notice now that U ( ¯ Z ε ¯ τ ) is either greater or equal than U ( O j ) ± ζ ′ on ( ∂D j ( ± ζ ′ ) ∩ I ◦ i ) or it isgreater or equal than U ( O j ) ∓ ζ on ( ∂D j ( ± ζ ) ∩ I ◦ i ) ∪ ( ∂D j ( ± ζ ) ∩ I ◦ i ). The latter implies that E U ( ¯ Z ε ¯ τ ) ≥ ( U ( O j ) ± ζ ′ )(1 − P εx ) + ( U ( O j ) ∓ ζ ) P εx . The latter and (27) imply that up to deterministic constants that do not depend on the smallparameters of the problem( ζ + ζ ′ ) P εx ≤ ζ ′ + | U ( O j ) − U ( z ) | + | E h U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) − U ( ¯ Z εk δ + τ ) i | + | E (cid:2) U ( ¯ Z εk δ + τ ) − U ( ¯ Z εk δ ) (cid:3) | + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) δ E k X m =1 (cid:20) L U ( ¯ Z ε ( m − δ ) + O (cid:18) δ / + (cid:16) τε (cid:17) / δ (cid:19)(cid:21)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ζ ′ + | E h U ( ¯ Z ε ¯ τ εj ( ± ζ ) ) − U ( ¯ Z εk δ + τ ) i | + | E (cid:2) U ( ¯ Z εk δ + τ ) − U ( ¯ Z εk δ ) (cid:3) | + E ¯ τ (cid:18) O (cid:18) δ / + (cid:16) τε (cid:17) / δ (cid:19)(cid:19) + | E ( δk − ¯ τ ) | (cid:18) O (cid:18) δ / + (cid:16) τε (cid:17) / δ (cid:19)(cid:19) ≤ ζ ′ + δ + τ + δ / + (cid:16) τε (cid:17) / + E ¯ τ (cid:18) O (cid:18) δ / + (cid:16) τε (cid:17) / δ (cid:19)(cid:19) ≤ ζ ′ + δ + τ + δ / + (cid:16) τε (cid:17) / + E ¯ τ ε ( ± ζ ) (cid:18) O (cid:18) δ / + (cid:16) τε (cid:17) / δ (cid:19)(cid:19) ≤ ζ ′ + δ + τ + δ / + (cid:16) τε (cid:17) / + ζ | ln ζ | (cid:18) O (cid:18) δ / + (cid:16) τε (cid:17) / δ (cid:19)(cid:19) . NALYSIS OF MULTISCALE INTEGRATORS FOR MULTIPLE ATTRACTORS AND IRREVERSIBLE LANGEVIN SAMPLERS23 where for the last line we used Lemma 5. Therefore, we have obtained that for suﬃciently small τ < δ ≪ τ /ε ↓ P x (cid:16) ¯ Z ε ¯ τ / ∈ ( ∂D j ( ± ζ ) ∩ I ◦ i ) ∪ ( ∂D j ( ± ζ ) ∩ I ◦ i ) (cid:17) ≤ ζ ′ ζ + ζ | ln ζ | + δ + τ + δ / + (cid:0) τε (cid:1) / ζ + ζ ′ ≤ ζ ′ ζ + ζ | ln ζ | + δ + (cid:0) τε (cid:1) / ζ + ζ ′ . The right hand side of the last display can be made arbitrarily small, if we choose ζ ′ < ζ small butsuch that δ + ( τε ) / ζ + ζ ′ ↓

0. This means that ζ ′ < ζ should be chosen small, but greater than δ + (cid:0) τε (cid:1) / .Hence, under this condition, we get that P x (cid:16) ¯ Z ε ¯ τ / ∈ ( ∂D j ( ± ζ ) ∩ I ◦ i ) (cid:17) has approximately the samevalue for all x ∈ ¯ D j ( ± ζ ′ ) when τ < δ < τε ≪ (cid:0) τε (cid:1) / δ ≪

1. Then, it remains to show that,in the limit, this value is actually equal to p ji . This part of the proof however follows very closelythe corresponding part of the proof of [17, Lemma 3.6] for Z ε when δ, τ /ε are suﬃciently small andit will not be repeated here. This concludes the proof of the lemma. (cid:3) References [1] A. Abdulle and S. Cirilli,

S-ROCK: Chebyshev methods for stiﬀ stochastic diﬀerential equations , SIAM J. Sci.Comput. (2008), 997–1014.[2] A. Abdulle, D. Cohen, G. Vilmart, and K. C. Zygalakis, High weak order methods for stochastic diﬀerentialequations based on modiﬁed equations , SIAM J. Sci. Comput. (2012), A1800–A1823.[3] A. Abdulle, W. E, B. Engquist, and E. Vanden-Eijnden, The heterogeneous multiscale method , Acta Numerica(2012), 1–87.[4] A. Abdulle and T. Li,

S-ROCK methods for stiﬀ Ito SDEs , Commun. Math. Sci. (2008), 845–868.[5] M. Brin and M. I. Freidlin, On stochastic behavior of perturbed Hamiltonian systems , Ergodic Theory andDynamical Systems (2000), 55–76.[6] K. Burrage and T. Tian, Stiﬄy accurate Runge-Kutta methods for stiﬀ stochastic diﬀerential equations , Comput.Phys. Commun. (2001), 186–190.[7] A. Duncan, T. Leli`evre, and G.A. Pavliotis,

Variance reduction using nonreversible Langevin samplers , J. Stat.Phys. (2016), 457–491.[8] A.B. Duncan, G. Pavliotis, and K.C. Zygalakis,

Nonreversible langevin samplers: Splitting schemes, analysis andimplementation , arXiv: 1701.04247 (2017).[9] A. Durmus and E. Moulines,

Nonasymptotic convergence analysis for the unadjusted langevin algorithm , Ann.Appl. Probab. (2017), no. 3, 1551–1587.[10] W. E, Principles of multiscale modeling , Cambridge University Press, Cambridge, 2011.[11] W. E and B. Engquist,

The heterogeneous multi-scale methods , Commun. Math. Sci. (2003), 87–133.[12] W. E, B. Engquist, X. Li, and W. Ren, Heterogeneous multiscale methods: A review , Communications in Com-putaitonal Physics (2007), 367–450.[13] W. E, D. Liu, and E. Vanden-Eijnden, Analysis of multiscale methods for stochastic diﬀerential equations , Comm.Pure App. Math. (2005), 1544–1585.[14] W. E and J. Lu, Seamless multiscale modeling via dynamics on ﬁber bundles , Commun. Math. Sci. (2007),649–663.[15] W. E, W. Ren, and E. Vanden-Eijnden, A general strategy for designing seamless multiscale methods , J. Comput.Phys. (2009), 54375453.[16] M. I. Freidlin and M. Weber,

Random perturbations of dynamical systems and diﬀusion processes with conser-vation laws , Probability Theory and Related Fields (2004), 441–466.[17] M. I. Freidlin and A. D. Wentzell,

Random perturbations of dynamical systems , 2nd ed., Springer-Verlag, NewYork, 1988.[18] M. I. Freidlin and A. D. Wentzell,

Diﬀusion processes on graphs and the averaging principle , Annals of Probability (1993), 2215–2245. [19] M. I. Freidlin and A. D. Wentzell, Random perturbations of Hamiltonian systems , Memoirs of the AmericanMathematical Society (1994), no. 523.[20] D. Givon, I. G. Kevrekidis, and R. Kupferman,

Strong convergence of projective integration schemes for singularlyperturbed stochastic diﬀerential systems , Commun. Math. Sci. (2006), 707–729.[21] M. Hutzenthaler, A. Jentzen, and P.E. Kloeden, Strong and weak divergence in ﬁnite time of Euler’s methodfor stochastic diﬀerential equations with non-globally Lipschitz continuous coeﬃcients , Proceedings of the RoyalSociety A: Mathematical, Physical and Enginnering Science (2011), 15631576.[22] C.-R. Hwang, S.Y. Hwang-Ma, and S.-J. Sheu,

Accelerating diﬀusions , The Annals of Applied Probability (2005), 1433–1444.[23] M. Ottobre, N.S. Pillai, and K. Spiliopoulos, Optimal scaling of the MALA algorithm with irreversible proposalsfor Gaussian targets , submited, arXiv:1702.01777 (2017).[24] G.A. Pavliotis and A.M. Stuart,

Multiscale methods: Averaging and homogenization , Springer-Verlag, New York,2008.[25] L. Rey-Bellet and K. Spiliopoulos,

Irreversible Langevin samplers and variance reduction: a large deviationapproach , Nonlinearity (2015), 2081–2103.[26] L. Rey-Bellet and K. Spiliopoulos, Variance reduction for irreversible Langevin samplers and diﬀusion on graphs ,Electronic Communications in Probability (2015), 1–16.[27] L. Rey-Bellet and K. Spiliopoulos, Improving the convergence of reversible samplers , Journal of Statistical Physics (2016), 472–494.[28] G. Roberts and R. Tweedie,

Exponential convergence of Langevin distributions and their discrete approximations ,Bernoulli (1996), 341–363.[29] A. Skorokhod, Asymptotic methods in the theory of stochastic diﬀerential equations , AMS Translation of Math-ematical Monographs, vol. 78, American Mathematical Society, Providence, RI, 1989.[30] M. Tao, H. Owhadi, and J. E. Marsden,

Nonintrusive and structure presering multiscale integration of stiﬀODEs, SDEs, and Hamiltonian systems with hidden slow dynamics via ﬂow averaging , Multiscale Modeling andSimulation (2010), 1269–1324.[31] E. Vanden-Eijnden, Numerical techniques for multiscale dynamical systems with stochastic eﬀects , Commun.Math. Sci. (2003), 385–391.[32] E. Vanden-Eijnden, On HMM-like integrators and projective integration methods for systems with multiple timescales , Commun. Math. Sci. (2007), 495–505. Department of Mathematics, Department of Physics, and Department of Chemistry, Duke Univer-sity, Box 90320, Durham NC 27708, USA

E-mail address : [email protected] Department of Mathematics and Statistics, Boston University, Boston MA 02446, USA

E-mail address ::