[PDF] Local and Global Perspectives on Diffusion Maps in the Analysis of Molecular Systems

Abstract

Diffusion maps approximate the generator of Langevin dynamics from simulation data. They afford a means of identifying the slowly-evolving principal modes of high-dimensional molecular systems. When combined with a biasing mechanism, diffusion maps can accelerate the sampling of the stationary Boltzmann-Gibbs distribution. In this work, we contrast the local and global perspectives on diffusion maps, based on whether or not the data distribution has been fully explored. In the global setting, we use diffusion maps to identify metastable sets and to approximate the corresponding committor functions of transitions between them. We also discuss the use of diffusion maps within the metastable sets, formalising the locality via the concept of the quasi-stationary distribution and justifying the convergence of diffusion maps within a local equilibrium. This perspective allows us to propose an enhanced sampling algorithm. We demonstrate the practical relevance of these approaches both for simple models and for molecular dynamics problems (alanine dipeptide and deca-alanine).

Full PDF

LLocal and Global Perspectives on Diﬀusion Maps in the Analysis ofMolecular Systems

Z. Trstanova , B. Leimkuhler and T. Leli`evre School of Mathematics, University of Edinburgh, James Clerk Maxwell Building,Peter Guthrie Tait Road, Edinburgh EH9 3FD, UK CERMICS (ENPC), INRIA, 77455 Marne-la-Vall´ee, France

November 26, 2019

Abstract

Diﬀusion maps approximate the generator of Langevin dynamics from simulation data. Theyaﬀord a means of identifying the slowly-evolving principal modes of high-dimensional molecularsystems. When combined with a biasing mechanism, diﬀusion maps can accelerate the sam-pling of the stationary Boltzmann-Gibbs distribution. In this work, we contrast the local andglobal perspectives on diﬀusion maps, based on whether or not the data distribution has beenfully explored. In the global setting, we use diﬀusion maps to identify metastable sets andto approximate the corresponding committor functions of transitions between them. We alsodiscuss the use of diﬀusion maps within the metastable sets, formalising the locality via theconcept of the quasi-stationary distribution and justifying the convergence of diﬀusion mapswithin a local equilibrium. This perspective allows us to propose an enhanced sampling algo-rithm. We demonstrate the practical relevance of these approaches both for simple models andfor molecular dynamics problems (alanine dipeptide and deca-alanine).

The calculation of thermodynamic averages for complex models remains a fundamental challengein computational chemistry [2], materials modelling [6] and biology [54]. In typical situations, thepotential energy U of the system is known, and the states are assumed to be distributed accordingto the Boltzmann-Gibbs distribution with density ρ β = Z − e − βU . The diﬃculty arises due to the high-dimensional, multimodal nature of the target distribution incombination with limited computational resources. The multimodality of the target distributioncauses that the high-likelihood conformational states are separated by low-probability transitionregions. In such a setting, the transitions between critical states become “rare events”, meaning thatnaive computational approaches converge slowly (if at all) or produce inaccurate results. Moreover,standard MD simulations of large proteins (¿500 residues) run on regular CPUs allow to simulatedynamics over hundreds of ns (or a few µ m max) in a reasonable amount of time. This time scaleis often not enough to sample relevant conformational transitions which may occur on the ms timescale. 1 a r X i v : . [ phy s i c s . d a t a - a n ] N ov s an illustration, consider the problem of exploring all folded conﬁgurations of a deca-alaninemolecule in vacuum at a speciﬁc temperature (300K). This short peptide is a common model forillustrating the diﬃculties of folding due its very complicated free energy landscape [14]. Threetypical states are shown in Figure 1. The many interactions between atoms of the molecule meanthat changes in structure are the result of a sequence of coordinated moves. Because the transitionsbetween the states occur infrequently (they are ‘rare events’) the results of short Langevin dynamicssimulation will typically constitute a highly localised exploration of the region surrounding oneparticular conformational minimum. (a) A (b) B1 (c) B2 Figure 1: Folding of deca-alanine: three metastable conformations.The goal of enhanced sampling strategies is to dramatically expand the range of observed statesby modifying in some way the dynamical model. Typical schemes [57, 51, 38, 16, 31, 20, 21] relyon the deﬁnition of collective variables in order to drive the enhanced sampling of the system onlyin relevant low dimensional coordinates describing the slowest time scales.In this article, we discuss the automatic identiﬁcation of collective variables for the purpose ofenhancing sampling in applications such as molecular dynamics. In some cases, the natural collec-tive variables relate to underlying physical processes and can be chosen using scientiﬁc intuition, butin many cases, this is far from straightforward, as there may be competing molecular mechanismsunderpinning a given conformational change. Methods capable of automatically detecting CVshave, moreover, much wider potential for application. Many Bayesian statistical inference calcula-tions arising in clustering and classifying data sets, and in the training of artiﬁcial neural networks,reduce to sampling a smooth probability distribution in high dimension and are frequently treatedusing the techniques of statistical physics [41, 15]. In such systems, a priori knowledge of the col-lective variables is typically not available, so methods that can automatically determine collectivevariables are of high potential value.Diﬀusion maps [17, 18] provide a dimensionality reduction technique which yields a parame-terized description of the underlying low-dimensional manifold by computing an approximation ofa Fokker-Planck operator on the trajectory point-cloud sampled from a probability distribution(typically the Boltzmann-Gibbs distribution corresponding to prescribed temperature). The con-struction is based on a normalized graph Laplacian matrix. In an appropriate limit (discussedbelow), the matrix converges (point-wise) to the generator of overdamped Langevin dynamics. Thespectral decomposition of the diﬀusion map matrix thus yields an approximation of the continuousspectral problem on the point-cloud [42] and leads to natural collective variables. Since the ﬁrstappearance of diﬀusion maps [17], several improvements have been proposed including local scal-ing [50], variable bandwidth kernels[7] and target measure maps (TMDmap) [4].The latter scheme2xtends diﬀusion maps on point-clouds obtained from a surrogate distribution, ideally one thatis easier to sample from. Based on the idea of importance sampling, it can be used on biasedtrajectories, and improves the accuracy and application of diﬀusion maps in high dimensions [4].Diﬀusion maps can be combined with the variational approach to conformation dynamics in orderto improve the approximation of the eigenfunctions and provide eigenvalues that relate directly tophysical relaxation timescales [10].Diﬀusion maps underpin a number of algorithms that have been designed to learn the collectivevariable adaptively and thus enhance the dynamics in the learned slowest dynamics [62, 47, 13, 12].These methods are based on iterative procedures whereby diﬀusion maps are employed as a toolto gradually uncover the intrinsic geometry of the local states and drive the sampling towardunexplored domains of the state space, either through sequential restarting [62] or pushing [13] thetrajectory from the border of the point-cloud in the direction given by the reduced coordinates.In [37] time structure-based independent component analysis was performed iteratively to identifythe slowest degrees of freedom and to use metadynamics [31] to directly sample them. All thesemethods try to gather local information about the metastable states to drive global sampling, usingad hoc principles. In this paper, we provide a rigorous perspective on the construction of diﬀusionmaps within a metastable state by formalizing the concept of a local equilibrium based on the quasi-stationary distribution [19]. Moreover, we provide the analytic form of the operator which isobtained when metastable trajectories are used in computing diﬀusion maps.Diﬀusion maps can also be used to compute committor functions [30], which play a central rolein transition path theory [59]. The committor is a function which provides dynamical informationabout the connection between two metastable states and can thus be used as a reaction coordinate(importance sampling function for biasing or splitting methods, for example). Committors, or the’commitment probabilities’, were ﬁrst introduced as ’splitting probability for ion-pair recombina-tion’ by Onsager [44] and appear for example as the deﬁnition of p fold , the probability of proteinfolding [25, 45]. Markov state models can in principle be used to compute committor probabili-ties [48], but high dimensionality makes grid-based methods intractable. The ﬁnite temperaturestring method [26] approximates the committor on a quasi-one-dimensional reaction tube, which ispossible under the assumption that the transition paths lie in regions of small measure comparedto the whole sampled state space. The advantage of diﬀusion maps is that the approximation ofthe Fokker-Planck operator holds on the whole space, and therefore we can compute the committoroutside the reaction tube. Diﬀusion maps have already been used to approximate committor func-tions in [30, 56]. In [30], in order to improve the approximation quality, a new method based on apoint-cloud discretization for Fokker-Planck operators is introduced (however without any conver-gence result). A more recent work [56] uses diﬀusion maps for committor computations. Finally,we mention that artiﬁcial neural networks were used to solve for the committor in [29], althoughthe approach is much diﬀerent than that considered here.The main conceptual novelty of this article lies in the insight on the local versus global per-spective provided by the quasi-stationary distribution. Thus, to be precise, compared to the workof [30], we clarify the procedure for going from the local perspective to designing an enhancedsampling method and we apply our methods to much more complicated systems (only toy modelsare treated in [30]). The second, more practical, contribution is the demonstration of the use ofdiﬀusion maps to identify the metastable states and to directly compute the committor function.We consider, for example, the use of the diﬀusion map as a diagnostic tool for transition out ofa metastable state. A third contribution lies in drafting an enhanced sampling algorithm based3n quasi-stationary distribution and diﬀusion maps. A careful implementation and application forsmall biomolecules shows the relevance and potential of this methodology for practical applications.Algorithm 1 detailed in Section 5 will provide a starting point for further investigations. The cur-rent article serves to bridge works in the mathematical and computational science literatures, thushelps to establish foundations for future rigorously based sampling frameworks.This paper is organized as follows: in Section 2, we start with the mathematical description ofoverdamped Langevin dynamics and diﬀusion maps. In Section 3, we formalize the application ofdiﬀusion maps to a local state using the quasi-stationary distribution. We present several examplesillustrating the theoretical ﬁndings. In Section 4, we deﬁne committor probabilities and the diﬀusionmap-based algorithm to compute them. We also apply our methodology to compute collectivevariables, metastable states and committors for various molecules, including the alanine dipeptideand a deca alanine system. In Section 5, we show how the quasi-stationary distribution can beused to reveal transitions between molecular conformations. Finally, we conclude by taking up thequestion of how the quasi-stationary distribution can be used as a tool for the enhanced sampling oflarge scale molecular models, paving the way for a full implementation of the described methodologyin software framework. We begin with the mathematical description of overdamped Langevin dynamics, which is used togenerate samples from the Boltzmann distribution. By introducing the generator of the Langevinprocess, we make a connection to the diﬀusion maps which is formalized in the following section.We review the construction of the original diﬀusion maps and deﬁne the target measure diﬀusionmap, which removes some of its limitations.

Langevin dynamics and the Boltzmann distribution

We denote the conﬁguration of thesystem by x ∈ D , where, depending on the application, D = R d , D is a subset of R d or D = ( R / Z ) d for systems with periodic boundary conditions. Overdamped Langevin dynamics is deﬁned by thestochastic diﬀerential equation dx t = −∇ V ( x t ) dt + (cid:112) /β dW t , (1)where W t is a standard d -dimensional Wiener process, β > V ( x )is the potential energy driving the diﬀusion process. The Boltzmann-Gibbs measure is invariantunder this dynamics: µ ( dx ) = Z − e − βV ( x ) dx, Z = (cid:90) D e − βV ( x ) dx. (2)The ergodicity property is characterized by the almost sure (a.s.) convergence of the trajectoryaverage of a smooth observable A to the average over the phase space with respect to a probabilitymeasure, in this case µ : lim t →∞ (cid:98) A t = E µ ( A ) a.s. , (cid:98) A t := 1 t (cid:90) t A ( x s ) ds. (3)The inﬁnitesimal generator of the Markov process ( x t ) t ≥ , a solution of (1), is the diﬀerentialoperator L β = −∇ V · ∇ + β − ∆ , (4)4eﬁned for example on the set of C ∞ functions with compact support. The fact that L β is theinﬁnitesimal generator of ( x t ) t ≥ means that (see for example [35]): ∀ x, ddt (cid:104) E ( A ( x t ) | x = x ) (cid:105)(cid:12)(cid:12) t =0 = L β A ( x ) , where A is C ∞ compactly supported function and x = x is the initial condition at time t = 0.Another way to make a link between the diﬀerential operator (4) and the stochastic diﬀerentialequation (1) is to consider the law of the process x t . Let us denote by ψ ( t, x ) the density of x t attime t . Then ψ is a solution of the Fokker-Planck equation ∂ t ψ = L ∗ β ψ, ψ (0) = ψ , where ψ is the density of x and L ∗ β is the L adjoint of L β . Under the assumption on thesmoothness of the potential and the compactness of the domain D , the solution can be written as ψ ( t, x ) = ∞ (cid:88) k =0 c k e λ k t φ k ( x ) , (5)with eigenvalues { λ k } ∞ k =0 with λ = 0 > λ ≥ λ ≥ . . . and eigenfunctions { φ k } ∞ k =0 of L ∗ β . Theeigenfunctions are smooth functions and the sum (5) converges uniformly in x for all times t >t > ψ ( t, x ) → c φ ( x ) as t → ∞ , and therefore the ﬁrstterm c φ in the sum (5) is equal to µ . The convergence rate is determined by the next dominanteigenfunctions and eigenvalues. A k -dimensional diﬀusion map at time t is a lower dimensionalrepresentation of the system deﬁned as a nonlinear mapping of the state space to the Euclideanspace with coordinates given by the ﬁrst k eigenfunctions: G k ( t, x ) := (e λ t φ ( x ) , . . . , e λ k t φ k ( x )) . The diﬀusion distance is then the Euclidean distance between the diﬀusion map coordinates(DC).Finally, we would like to stress that although the eigenfunctions of the kernel matrix do approx-imate point-wise the spatial eigenfunctions of the Fokker-Planck operator, there is no relationshipbetween the diﬀusion map matrix eigenvalues and the exact propagator eigenvalues as showedin [10]. This connection would be provided by constructing diﬀusion map based approximator ofthe generator of underdamped Langevin dynamics (with ﬁnite friction).

Diﬀusion maps

The diﬀusion map [17] reveals the geometric structure of a manifold M fromgiven data D ( m ) by constructing a m × m matrix that approximates a diﬀerential operator. Therelevant geometric features of M are expressed in terms of the dominant eigenfunctions of thisoperator.The construction requires that a set of points D ( m ) := { x , x , . . . , x m } ⊂ R N ( N >

0) whichhave been sampled from a distribution π ( x ) lie on a compact d -dimensional diﬀerentiable subman-ifold M ⊂ R N with dimension d < N . The original diﬀusion maps introduced in [17, 55] are basedon the isotropic kernel h ε ( x, y ) = h ( (cid:107) x − y (cid:107) /ε ) where h is an exponentially decaying function, ε > (cid:107) · (cid:107) is a norm in R N . A typical choice is h ε ( x, y ) = exp (cid:0) − (4 ε ) − (cid:107) x − y (cid:107) (cid:1) . (6) For example Euclidean or RMSD, which is the most commonly used norm in molecular simualtions [10]

5n the next step, an m × m kernel matrix K ε is built by the evaluation of h ε on the set D ( m ) .This matrix is then normalized several times to give a matrix P ε that can be interpreted as thegenerator of a Markov chain on the data. To be precise, the kernel matrix K ε is normalized usingthe power α ∈ [0 ,

1] of the estimate q of the density π , usually obtained from the kernel densityestimate q i = (cid:80) Nj =1 K ij as the row sum of K ε . In some cases the analytic expression of the density π ( x ) is known and we can set directly q ( x ) = π ( x ). After obtaining the transition matrix P ε = D − α K ε , (7)where D α = diag( q − α ), we compute in the last step the normalized graph Laplacian matrix L ε = ε − ( P ε − I ) . (8)As reviewed in [42], for given α and suﬃciently smooth functions f , the matrix L ε converges inthe limit m → ∞ point-wise to an integral operator describing random walk in continuous spaceand discrete time, which in the limit ε → f ] = ( f ( x ) , ..., f ( x m )) T for representingfunctions evaluated on the data set D ( m ) as vectors such that [ f ] i = f ( x i ), we formally write thepoint-wise convergence: for α ∈ [0 , m → ∞ and ε → L ε [ f ]) j → L f ( x j ) , for all x j ∈ D ( m ) , where L is a Fokker-Planck operator L f = ∆ f + (2 − α ) ∇ f · ∇ ππ (9)where ∆ is the Laplace-Beltrami operator on M and ∇ is the gradient operator on M . Note thatin the special case α = 1 /

2, and for the choice π = Z − e − βV (Boltzmann-Gibbs), the approximatedoperator corresponds to the generator of overdamped Langevin dynamics (4), such that L = β L β . (10)For this reason, we focus on the choice α = 1 / L ε approximate discretized eigenfunctions of L . Then eigenvectors of L ε approximate solutions tothe eigenproblem associated to L : L ε [ ψ ] = λ [ ψ ], an approximation of L ψ ( x ) = λψ ( x ) , ∀ x ∈ M ⊂ supp( π ) . The spectral decomposition of L ε provides real, non-positive eigenvalues 0 = λ > λ ≥ λ ≥ . . . ≥ λ m sorted in decreasing order. The dominant eigenfunctions allow for a structure preservingembedding Ψ of D ( m ) into a lower dimensional space and hence reveal the geometry of the data.Singer [55] showed that for uniform density π , the approximation error for ﬁxed ε and m is( L ε [ f ]) j = L f ( x j ) + O (cid:16) ε, m − / ε − / − d/ (cid:17) . (11)6 emark 2.1 (Inﬁnite friction limit) . In molecular dynamics, trajectories are usually obtained bydiscretizing [34] the (underdamped) Langevin dynamics:  dx t = M − p t dt,dp t = −∇ V ( x t ) dt − γM − p t dt + (cid:114) γβ dW t , (12) where x ∈ R d are positions, p ∈ R d momenta, W t is a standard d -dimensional Wiener process, β > is proportional to the inverse temperature, M = diag m , . . . , m d is the diagonal matrix ofmasses, and γ > is the friction constant. The generator of this process is L γ = M − p · ∇ x − ∇ V ( x ) · ∇ p + γ (cid:18) − M − p · ∇ p + 1 β ∆ p (cid:19) . Recall that the diﬀusion maps approximate the generator (9) which is the generator of the over-damped Langevin dynamics, an inﬁnite-friction-limit dynamics of the Langevin dynamics (12) .Diﬀusion maps therefore provide the dynamical information in the large friction limit γ → + ∞ ,rescaling time as γt , or in a small mass limit ( M → ). Remark 2.2.

Note that diﬀusion maps require the data to be distributed with respect to π ( x ) dx ,which appears in the limiting operator (9) . It implies that even though diﬀusion maps eventuallyprovide an approximation of the generator of overdamped Langevin dynamics (10) , one can usetrajectories from any ergodic discretization of underdamped Langevin dynamics to approximate theconﬁgurational marginal π ( x ) dx as for example the BAOAB integrator [33]. The target measure diﬀusion map If π is particularly diﬃcult to sample, it might be desirableto use points which are not necessarily distributed with respect to π in order to compute anapproximation to the operator L = ∇ log( π ) · ∇ + ∆ . (13)For this purpose, the target measure diﬀusion map was recently introduced [4]. The main advantageis that it allows construction of an approximation of L even if the data points are distributed withrespect to some distribution µ such that supp π ⊂ supp µ . The main idea is to use the previouslyintroduced kernel density estimator q ε , already obtained as an average of the kernel matrix K ε .Since we know the density of the target distribution π , the matrix normalization step can be doneby re-weighting q with respect to the target density, which allows for the matrix normalization inan importance sampling sense. More precisely, the target measure diﬀusion map (TMDmap) isconstructed as follows: The construction begins with the m × m kernel matrix K ε with components( K ε ) ij = k ε ( x i , x j ) and the kernel density estimate q ε ( x i ) = (cid:80) mj =1 ( K ε ) ij . Then the diagonal matrix D ε,π with components ( D ε,π ) ii = π / ( x i ) q − ε ( x i ) is formed and the kernel matrix is right normalizedwith D ε,π : K ε,π = K ε D ε,π . Let us deﬁne ˜ D ε,π as the diagonal matrix of row sums of K ε,π , that is,( ˜ D ε,π ) ii = ( K ε,π [ ]) i = m (cid:88) j =1 ( K ε,π ) ij . .0 0.5 0.0 0.5 1.0x0.020.010.000.010.02 E V s o f D i r i c h l e t B C EV 0EV 1EV 2EV 3EV 4 Number of points N10 M e a n a b s o l u t e e rr o r EV 0EV 1EV 2EV 3EV 4 N Figure 2: (a) Eigenvectors obtained from diﬀusion maps. (b) Mean absolute error on the ﬁrst ﬁveeigenfunctions over the number of samples.Finally, the TMDmap matrix is built as L ε,π = ε − (cid:16) ˜ D − ε,π K ε,π − I (cid:17) . (14)In [4], it is shown that L ε,π converges point-wise to the operator (13). Remark 2.3 (Nystr¨om extension of the eigenvectors [5]) . Note that the j -th eigenvector [ ψ ] of thematrix L ε can be extended on x ∈ M as ψ j ( x ) = 1 λ j N (cid:88) i =1 h ε ( x, x i ) D ( x ) [ ψ j ] i where λ j (cid:54) = 0 is the corresponding eigenvalue, D ( x ) = (cid:80) Ni =1 h ε ( x, x i ) and [ ψ j ] i = ψ j ( x i ) . Dirichlet boundary problems

Diﬀusion maps provide a matrix L ε , which converges point-wiseto the generator L deﬁned in (9). This method can be used to solve the following eigenvalue problemwith homogeneous Dirichlet boundary conditions: ﬁnd ( λ, f ) such that L f = λf in Ω , f = 0 in ∂ Ω . In the following example, we solve a linear eigenvalue problem with Dirichlet boundary conditions.In the ﬁrst step, we construct L ε as the diﬀusion map approximation of L on the point-cloud { x i } mi =1 .In order to express the Dirichlet boundary condition, we identify points outside the domain Ω thatwe deﬁne as C := { x j } j ∈ J where the set of indices J := { j : x j / ∈ Ω } . Finally, we solve the eigenvalueproblem with matrix L ε , in which rows with indices in J have been set to zero.Let us illustrate this on the following one-dimensional eigenvalue problem L v = λv, v ∈ ( − , , v = 0 , v ∈ {− , } , (15)with L = − x∂ + ∂ . The eigenfunctions are ψ k ( x ) = sin( π ( x + 1)( k + 1). To approximatethe solution of (15) using diﬀusion maps with α = 1 /

2, we have generated 10 points from adiscretized overdamped Langevin trajectory with potential V ( x ) = x /

2, using a second order nu-merical scheme [33]. In Figure 2(a) we show the diﬀusion map approximation of the eigenfunctions ψ k / (cid:107) ψ k (cid:107) . From Figure 2(right) we observe that the decay of the mean absolute error of the nor-malized eigenfunctions is asymptotically proportional to N − / , where N is the number of samples.Diﬀerent numbers N of samples were obtained by sub-sampling the trajectory.8 Deﬁning a ’local’ perspective in diﬀusion-map analysis

We now concentrate on the case when diﬀusion maps are built using trajectories of the Langevindynamics. As we have reviewed in the introduction, many iterative methods aim at graduallyuncovering the intrinsic geometry of the local states. The information obtained from the metastablestates can then be used, for example, to accelerate the sampling towards unexplored domains ofthe state space.The approximation error of diﬀusion maps (11) scales in O ( m − / ), m being the number ofsamples. In order to provide an approximation of a Fokker-Planck operator (4) using a point-cloudobtained from the process x t , a solution of (a discretized version of) (1), the time averages shouldhave converged with a suﬃciently small statistical error. In this section, using the notion of thequasi-stationary distribution, we explain to which operator converges a diﬀusion map approximationconstructed on the samples in a metastable subset of the state space and why it is possible to obtainconvergence of the approximation in this setup. Quasi-stationary distribution

The quasi-stationary distribution (QSD) is a local equilibriummacro-state describing a process trapped in a metastable state Ω ⊂ R d (see for example [19]). TheQSD ν can be deﬁned as follows: for all smooth test functions A : R d → R , ∀ X ∈ Ω , lim t →∞ E ( A ( X t )) | τ > t ) = (cid:90) Ω Adν, where Ω is a smooth bounded domain in R d , which is the support of ν and the ﬁrst exit time τ from Ω for X t is deﬁned by τ = inf { t > X t (cid:54)∈ Ω } . The following mathematical properties of the QSD were proved in [32]. The probability distribution ν has a density v with respect to the Boltzmann-Gibbs measure π ( dx ) = Z − e − βV ( x ) dx . Thedensity v is the ﬁrst eigenfunction of the inﬁnitesimal generator L of the process X t , with Dirichletboundary conditions on ∂ Ω: (cid:40) L v = − λv, in Ω ,v = 0 , on ∂ Ω , (16)where − λ < ν with respect to the Lebesgue measure is thus ∀ x ∈ Ω , ν ( x ) = v ( x )e − βV ( x ) (cid:82) Ω v ( x )e − βV ( x ) dx . (17)Let us consider the situation when Ω is a metastable state for the dynamics ( X t ), and X ∈ Ω.Then, for a long time, the process remains in Ω. If the diﬀusion map is built using those samplesin Ω, it then provides an approximation of the Kolmogorov operator (9) where π is replaced by thequasi-stationary distribution ν , namely L Ω f := ∆ f + (2 − α ) ∇ f · ∇ νν = ∆ f + (2 − α ) ∇ f · ( ∇ ln( v ) − β ∇ V ) , on supp(Ω) . (18)Notice that if Ω = D , ν = π and we recover the operator (9) with respect to the distribution π .In the case when Ω is in the basin of attraction of a local minimum x of V for the dynamics9 = 0.5 L = 3.1 exact -3 3 -1 320 1 1-1 x x v · L = 1.9 Figure 3: Quadratic potential in 1D. (left) The approximation of the ﬁrst eigenfunction v of Dirichletboundary eigenvalue problem (16). (right) The quasi-stationary-distribution ν .˙ x = −∇ V ( x ), the quasi-stationary distribution with density ν deﬁned by (17) is exponentiallyclose to π ( x ) = Z − e − βV ( x ) dx on any compact in Ω: the two distributions diﬀer essentially on theboundary ∂ Ω. More precisely, as proved in [24, Lemma 23, Lemma 85] for example (see also [28,Theorem 3.2.3], for any compact subset K of Ω, there exists c > β → ∞ , (cid:13)(cid:13)(cid:13)(cid:13) K exp( − βV ) (cid:82) K exp( − βV ) − ν (cid:13)(cid:13)(cid:13)(cid:13) L (Ω) = O (exp( − βc )) . In order to illustrate these ideas, we compare the diﬀusion map constructed from a trajectory in ametastable state and points from a trajectory which has covered the whole support of the underlyingdistribution. As explained above, the distribution of the samples in the metastable state is thequasi-stationary distribution and diﬀusion maps provide an approximation of the operator (18).Let us next illustrate the quasi-stationary distribution and explain how it diﬀers from thestationary distribution on a simple one-dimensional example. The density of the quasi-stationarydistribution (17) can be obtained using accurate numerical approximation of the solution v of theDirichlet problem (16) by a pseudo-spectral Chebyschev method [58, 43] in interval [ − L, L ] with

L >

0, with the grid chosen ﬁne enough to provide suﬃcient numerical accuracy. We consider asimple double-well potential V ( x ) = (( x − − . To illustrate the convergence of the quasi-stationary distribution ν = Z − ν v e − V to π = Z − e − V we increase the interval size by plotting theapproximation for increasing values of L . In Figure 3 (a), we plot the approximation of the solution v of the Dirichlet eigenvalue problem (18) on [ − L, L ] for several values of

L >

0. As expected,we observe that v ( x ) → L → ∞ . In Figure 3 (b), we plot the corresponding quasi-stationarydensities. As the size of the domain increases, the quasi-stationary distribution converges to π .In the next example, we sample from the Boltzmann distribution with a two-dimensional double-well [35, Section 1.3.3.1]: V DW ( x, y ) = 16 (4( − x − y + w ) + 2 h ( x − + (( x + y ) − w ) + (( x − y ) − w ) ) , (19)with h = 2 and w = 1. We employ a second-order discretization of Langevin dynamics (1) atlow temperature ( β = 10). Due to this low temperature, the samples are trapped in the ﬁrst welllong enough to locally equilibrate to the quasi-stationary distribution. We compute the statisticalerror of averages of various observables such as the conﬁgurational and the kinetic temperatures,10igure 4: Local geometry. (a) The sampling of the metastable state. (b) The dominant eigenvectorparameterizes the y -coordinate.Figure 5: Global geometry. (a) The samples cover the whole support of the distribution. (b) Thedominant eigenvector correlates with the x -coordinate.because of the knowledge of the exact expected values of these observables . We also track the ﬁrstdominant eigenvalues from the diﬀusion map, in order to detect when the sampling has reached alocal equilibrium. From this short trajectory, the diﬀusion map uncovers the geometry of the localstate: the dominant eigenvector clearly parameterizes the y -coordinate, i.e. the slowest coordinatewithin the metastable state, see Figure 4. On the other hand, when the trajectory explores alsothe second well and hence covers the support of the distribution, the diﬀusion map parameterizes x as the slowest coordinate, see Figure 5.These examples demonstrate that diﬀusion maps can be constructed using points of the qua-sistationary distribution to uncover the slowest local modes. As we will show in Section 5 thisproperty will eventually allow us to deﬁne local collective variables which can guide the construc-tion of sampling paths that exit the metastable state. The following equalities hold, respectively, for the kinetic and conﬁgurational temperatures: k B T = E π [ p ·∇ U ( p )] = E π [ |∇ V ( x ) | ] E [∆ V ( x )] , where k B is the Boltzmann constant, T is temperature and U ( p ) = p (cid:124) M − p is kinetictemperature. Global perspective: Identiﬁcation of metastable states and com-mittors

We illustrate in this section how diﬀusion maps applied to global sampling can be used to approx-imate the generator of Langevin dynamics and committor probabilities between metastable statesin inﬁnite-friction limit . We also use diﬀusion maps to automatically identify metastable sets inhigh-dimensional systems.The committor function is the central object of transition path theory [59, 40]. It provides adynamical reaction coordinate between two metastable states A ⊂ D and B ⊂ D . The committorfunction is the probability that the trajectory starting from x (cid:54)∈ A ∪ B reaches ﬁrst B rather than A , q ( x ) = P ( τ B < τ A | x = x )The committor q is also the solution of the following system of equations:  L q = 0 , in D \ ( A ∪ B ) ,q = 0 , in A,q = 1 , in B. (20)From the committor function, one can compute the reaction rates, density and current of transitionpaths [36].In the spirit of [30], we use diﬀusion maps to compute committors from the trajectory data.Given a point-cloud on D , diﬀusion maps provide an approximation of the operator L and can beused to compute q . After computing the graph Laplacian matrix (8) (choosing again α = 1 / L ε [ c, c ] q [ c ] = − L ε [ c, b ] q [ b ] , (21)where we deﬁned by c and b indices of points belonging to the set C = D \ ( A ∪ B ) and B respectively, and L ε [ I, J ] the projection of the matrix or vector on the set of indices I = { I k } Ik =1 and J = { J k } Jk =1 . One-dimensional Double-well.

First, we compute the committor function for a one-dimensionaldouble-well potential V ( x ) = ( x − . We use a second order discretized Langevin scheme withstep size ∆ t = 0 . points and compute the TMDmap with (cid:15) = 0 .

1. We ﬁx thesets A = [ − . , −

1] and B = [1 , . The most commonly used method for identifying metastable subsests is the Perron-cluster clusteranalysis (PCCA), which exploits the structure of the eigenvectors [52, 53, 49, 23]. In this work, we We are aware that diﬀusion maps do not provide access to dynamical properties. However, having access to betterreaction coordinate such as the committor can be used to obtain dynamical properties (for example in combinationwith Adaptive Multilevel Splitting[11]). .0 0.5 0.0 0.5 1.0x0.00.20.40.60.81.0 P o t e n t i a l CAB 1.0 0.5 0.0 0.5 1.0x0.00.20.40.60.81.0 C o mm i tt o r PseudospectralDFM

Figure 6: Double-well in 1D. (a) A plot of the potential showing the three sets

A, B and C = D \ ( A ∪ B ). (b) The committor function approximation by diﬀusion maps (DFM) and an accuratepseudospectral method.use the eigenvectors provided by diﬀusion maps in order to automatically identify the metastablesubsets. The main idea is to compute the dominant eigenvectors of the transfer operator P , whichare those with eigenvalues close to 1, excluding the ﬁrst eigenvalue. We approximate P by P ε deﬁned in (7). The metastable states can be clustered according to the ’sign’ structure of the ﬁrstdominant eigenvector, which is moreover constant on these states. More precisely, we ﬁnd themaximal and the minimal points of the ﬁrst eigenvector, which deﬁne the centers of the two sets.In the next step, we ‘grow’ the sets by including points with Euclidean distance in diﬀusion spacesmaller than a ﬁxed threshold. See for example Figure 5(b) for an illustration in the case of thetwo dimensional double-well potential. Two-dimensional Double-well

In the second example, we consider a two-dimensional problemwith potential (19) with h = 2 and w = 1. We generate m = 10 samples using discretizedLangevin dynamics with timestep ∆ t = 0 .

1. We use 10 points obtained by sub-sampling of thetrajectory for the diﬀusion maps, which is chosen with kernel (6) and ε = 0 .

1. We compute the ﬁrsttwo dominant eigenvectors and deﬁne the metastable subsets A and B using the ﬁrst dominanteigenvector as described above . In Figure 7(a) we see the sampling and the chosen sets. Wecompute the committor by solving the linear system (21) and extrapolate linearly the solutionon a two dimensional grid of x and y ; this is illustrated in the right panel of Figure 7 (b). Therepresentation of the committor in the diﬀusion coordinates is depicted in Figure 8 (b). We now use diﬀusion maps to compute dominant eigenfunctions of the transition operator, iden-tify metastable sets and approximate the committors of small molecules: alanine dipeptide anddeca-alanine. We discuss the local and global perspective by computing diﬀusion maps inside themetastable states. For the example of deca-alanine, we use a trajectory from biased dynamics anduse the TMDmap to approximate committor, comparing the dynamics at various temperatures. The results suggest that the method is robust with respect to the variation of the deﬁnition domains, under theassumption that the domains A and B are within the metastable state. y P o t e n t i a l E n e r g y y C o mm i tt o r Figure 7: Two dimensional double-well potential: committor approximations on the automaticallyidentiﬁed metastable sets. (a) The sampled trajectory with the automatically chosen sets set A (blue) and B (red). (b) The committor extended on a grid. Note that the value of 0 . x = 0 as expected. y P r o b a b ili t y A B D C C o mm i tt o r Figure 8: (a) The ﬂux streamlines coloured by probability ρ AB . (b) The diﬀusion map embeddingand the chosen sets A, B (colored blue and red).14igure 9: Alanine dipeptide. (a) The two metastable states uncovered by the ﬁrst diﬀusion coordi-nate. (b) The committor function over sets A (red) and B (blue). The ﬁrst eigenvector has a veryhigh correlation with the committor function.

Alanine dipeptide in vacuum

Alanine dipeptide ( CH − CO − N H − C α HCH − CO − N H − CH ) is commonly used as a toy problem for sampling studies since it has two well-deﬁnedmetastable states which can be parameterized by the dihedral angles φ between C, N, C α , C and ψ deﬁned between N, C α , C, N .We simulate a 20 nanosecond Langevin dynamics trajectory (using the BAOAB integrator [34])at temperature 300 K with a 2fs stepsize, friction γ = 1ps − and periodic boundary conditions usingthe openmmtools [1] library where the alanine dipeptide in vacuum is provided by the AMBERﬀ96 force ﬁeld. We sub-sample and RMSD-align the conﬁgurations with respect to a reference oneleaving only 5 × points for the diﬀusion map analysis. We use kernel (6) with ε = 1 and theEuclidean metric.In Figure 9(a) we show the ﬁrst diﬀusion coordinate, which has opposite signs on the twometastable states. Since the transition is very rare, there are only very few points in the vicinity ofthe saddle point, the dominant eigenvector however clearly parameterizes the dihedral angles, andseparates the two metastable sets. In order to deﬁne the reactive sets, we use the ﬁrst diﬀusioncoordinate as described in Section 4. Figure 9 (a) shows the dominant diﬀusion coordinate, whereasthe Figure 9(b) shows the committor function. We observe that the committor strongly correlateswith the ﬁrst eigenvector, which is expected due to the deﬁnition of the metastable state givenby the dominant eigenvector. Figure 10 shows the free energy proﬁle of the committor function,which was extended from the sub-sampled points to the trajectory of length 20 nanoseconds usingnonlinear regression (a multi-layer perceptron) .In the previous example, we have used a globally converged trajectory. The diﬀusion map anal-ysis therefore describes the dynamics from the global perspective. In order to illustrate the localperspective, we compute the diﬀusion maps and the committor approximation from a trajectorywhich has not left the ﬁrst metastable state. The ﬁrst eigenvector parameterizes the two wells ofthis metastable state and we deﬁne the reactive states as before using the ﬁrst DC. The approxi-mated committor assigns the probability 0 . More precisely, the subset served as a training set and ﬁtted model was evaluated on the full trajectory. We haveperformed cross-validation to ﬁnd the right parameters of the neural network. D C C o mm i tt o r Figure 11: The ﬁrst metastable state of alanine dipeptide. (a) The dominant eigenvector parame-terizes the two wells of the ﬁrst metastable state. (b) The committor approximation with sets

A, B in blue and red.with the automatically chosen sets. Figure 12 shows the corresponding diﬀusion map embeddingcoloured with respect to the committor values. Note that the diﬀusion maps used on the samplesfrom this metastable state, whose distribution is the QSD, correlate with the dihedral angles. Thisobservation suggests that alanine dipeptide in vacuum is a trivial example for testing enhanceddynamics methods using collective variables learned on-the-ﬂy because it is likely that the slowdynamics of the metastable state are similar to the global slow dynamics, which might not be thecase in more complicated molecules.

Deca-alanine

In the following example, we use a long trajectory of the deca-alanine molecule.This molecule has 132 atoms and a very complex free-energy landscape. A kinetic analysis wasdone in [60] showing the force-ﬁeld-dependent slowest dynamics processes. A representation of themolecular conformations associated to the two main metastable states are shown in Figure 1. Thesystem is highly metastable: a standard simulation at 300K requires at least 5 µ s to converge [60].Therefore, the trajectory studied here was obtained using the inﬁnite-swap simulated tempering(ISST) [39] method with temperatures in the range from 300K to 500K (nominal stepsize 2fs andnominal simulation length 2 µs ). This method is incorporated into the MIST [8] library which is16 .01 0.00 0.01DC 10.020.010.000.010.02 D C C o mm i tt o r Figure 12: The embedding in the ﬁrst two diﬀusion coordinates, coloured by the committor values.The sets

A, B are showed in blue and red respectively.coupled to the GROMACS and Amber96 forceﬁelds. ISST provides the weights which are necessaryto recover the Boltzmann distribution at a given temperature. We use the weights within TMDmapallowing us to eﬃciently compute the committor iso-surfaces at various temperatures. We extendthe committors in the root-mean-square deviation (RMSD) and the radius of gyration (Rg) forbetter visualisation in Figure 13 (however, note that the diﬀusion maps were applied to all degreesof freedom). We automatically identify the metastable sets using the dominant eigenvector. Notethat at lower temperatures of 300K and 354K, the two states are A and B

1, while at highertemperatures they are A and B

2. See Figure 1 for the corresponding conformations. The reasonis that at the lower temperature, the dominant barrier is enthalpic and at higher temperatures, itis rather entropic, suggesting that the slowest transition is between the states A and B

2, a statewhich is not very probable at low temperatures. The diﬀerent dynamical behaviours can be alsoseen by the varying 0 . In this section, we ﬁrst illustrate how the spectrum computed from diﬀusion maps converges withinthe quasi-stationary distribution. Next, we use diﬀusion coordinates built from samples within themetastable state to identify local CVs, which are physically interpretable and valid over the wholestate space. Finally, we demonstrate that when used in combination with enhaced dynamics asmetadynamics, these CVs lead to more eﬃcient global exploration. The combination of these threeprocedures deﬁnes an algorithm for enhanced sampling which we formalize and brieﬂy illustrate.

The local perspective introduced in Section 3 allows us to deﬁne a metastable state as an ensemble ofconﬁgurations (snapshots) along a trajectory, for which the diﬀusion map spectrum converges. Theidea is that, when trapped in a metastable state, one can compute the spectrum of the inﬁnitesimalgenerator associated with the QSD, which will typically change when going to a new metastablestate. 17 a) 300K (b) 354K(c) 413K (d) 485K

Figure 13: Committor of deca-alanine. Sets A(blue) and B1 (red, top) and B2 (red, bottom)were obtained from the ﬁrst DC. The transition region is showed by committor isosurface linesat [0 . , . K and 345 K ) and high temperatures (413 K and 485 K ).18 a) 300K (b) 354K(c) 413K (d) 485K Figure 14: Committor of deca-alanine in diﬀusion coordinates.19igure 15: The ﬁrst dominant eigenvalues of matrix L ε with ﬁxed number of points from alaninedipeptide trajectory. The bottom right plot shows the values of φ -angle over the simulation steps,with various colors corresponding to the iterations after which the spectrum is recomputed. Above,we plot the two functions of the dominant eigenvalues, computed over each iteration, which indicatethe exit from the state as the eigenvalues suddenly change. Before the ﬁrst transition to the secondstate, the spectrum is the one of L Ω (see (18) with α = ), where Ω is the metastable state wherethe process is trapped. After the transition, the spectrum is evolving towards the spectrum of theoperator L R d = − β ∇ V + ∆ on the whole domain.To illustrate this, we analyze an alanine dipeptide trajectory. Every 4000 steps, we computethe ﬁrst dominant eigenvalues of the diﬀusion map matrix L ε (with α = 1 /

2) by sampling 2000points from the trajectory { x n } mn =0 until the end of the last iteration τ = m ∆ t . We observein Figure 15(left), that the sampling has locally equilibrated during the ﬁrst 3 iterations withinthe metastable state, and the eigenvalues have converged. The transition occurs after the fourthiteration, which we can see in Figure 15(bottom right) which shows the values of the dihedral angle φ during the simulation step. In the left ﬁgure, we clearly observe a change in the spectrum at thispoint, with an increase in the spectral gap. After the trajectory has exited the metastable state,the eigenvalues begin evolving to new values, corresponding to the spectrum of the operator onthe whole domain. The change in the spectrum allows us to detect the exit from the metastablestate. Instead of tracking each of the ﬁrst eigenvalues separately, we compute the average and themaximal diﬀerence of the dominant eigenvalues, the values of which we plot in Figure 15(top right).This example illustrates the local perspective on the diﬀusion maps: their application on apartially explored distribution provides an approximation of a diﬀerent operator since the usedsamples are distributed with respect to a quasi-stationary distribution. It is therefore possible touse samples from the local equilibrium to learn the slowest dynamics within the metastable state.This observation can be used together with biasing techniques to improve sampling. We now describe an outline of an algorithm for enhanced sampling.Sampling from the QSD allows us to build high quality local CVs (within the meastable state)by looking for the most correlated physical CVs to the DCs. In typical practice the CVs are chosenfrom a list of physically relevant candidates, as for example the backbone dihedral angles or atom-atom contacts. There are several advantages to using physical coordinates instead of the abstractDC. First, the artiﬁcial DCs are only deﬁned on the visited states, i.e. inside the metastable20tate. Extrapolated DCs outside the visited state lose their validity further from points used forthe computation. By contrast, the physical coordinates can typically be deﬁned over the entirestate space. Second, the slowest physical coordinates might provide more understanding of themetastable state.Once the best local CVs have been identiﬁed, we can use metadynamics to enhance the sampling,eﬀectively driving the dynamics to exit the metastable state. In the next iteration, we suggest touse TMDmap to unbias the inﬂuence of metadynamics on the newly generated trajectory.The above strategy can be summarized in the following algorithm (Enhanced sampling algorithmbased on local-global analysis):1. Run molecular dynamics until the spectrum is converged (the reference walker is trapped)2. [Optional]

Reﬁne the convergence of the spectrum and the DC by using a Fleming-Viot [9]process within the metastable state (each replica being initialized with the whole initial tra-jectory of the reference dynamics, for the diﬀusion map computation)3. Identify CVs which are the most correlated with DCs4. Build an eﬀective bias using, for example, metadynamics based on these CVs5. When an exit is observed for the reference walker (through a change in the spectrum), restartthe procedure from step 1, keeping in memory the bias built using TMDmapAs proof-of-concept, we will illustrate this method in a simpliﬁed setup based on metadynamicsin the case of the alanine dipeptide. We stress that the construction of sampling procedures forgeneral macromolecules is a complex undertaking and many challenges still need to be addressedto implement the full proposed methodology for larger systems; this is the subject of current studyby the authors.We ﬁrst run a short trajectory to obtain samples from the ﬁrst metastable state . This samplingis done by Fleming-Viot process [9] with boundary deﬁned by the converged spectrum which weregularly recompute during the sampling . We compute the two dominant DCs and use them toselect the two most correlated from a list of physical coordinates (a similar approach was presentedin [27]). In this case, we looked at the collection of all possible dihedral angles from all the atomsexcept the hydrogens. In order to select the two most correlated CVs we employed the Pearsoncorrelation coeﬃcient, ρ ( X, Y ) = cov(

X, Y ) σ X σ Y , where σ Z is the standard deviation of a random variable Z . We found dihedral angles ACE1C-ALA2C-ALA2C-NME3N and ACE1C-ACE1O-ALA2N-NME3N to be the best candidates withhighest correlations with the ﬁrst and the second eigenvector respectively. Note however, thatthere were also several other CVs with high correlations, among these the φ and ψ angles. Inthe next step, we use the most correlated physical CVs within metadynamics and track the φ -angle as a function of time. Moreover, we run metadynamics with a priori chosen CV’s based onexpert knowledge, speciﬁcally the φ, ψ angles. As shown in Figure 16, there is no transition for The ’ﬁrst’ state corresponds to the left top double well in Figure 9. Alternatively, one can deﬁne the boundary by the free-energy in φ, ψ angles. We also found that resampling usingFleming-Viot improves the quality of diﬀusion maps. φ, ψ angles (middle) is also more eﬃcient thanstandard Langevin.the standard dynamics (top). On the other hand, metadynamics with the learned CVs exhibitsseveral transitions, see Figure 16(bottom), similarly to the knowledge based CVs (middle). We havealso compared the least correlated CVs for metadynamics and this approach was much worse thanLangevin dynamics: we did not observe any exit during the expected Langevin exit time. Thesenumerical experiments are strongly suggestive of the relevance of the most correlated CVs withthe dominant DCs . In conclusion, learning the slowest CVs from a local state provides importantinformation allowing to escape the metastable state and hence enhance the sampling. Once theprocess leaves the ﬁrst visited metastable state, one keeps the bias, new CVs are computed to updatethe bias once trapped elsewhere. This process can be iterated, and the weights from metadynamicscan subsequently be unbiased by TMDmap to compute the relevant physical coordinates at everyiteration.We have thus demonstrated, at least in this speciﬁc case, how the local perspective can beused together with biasing techniques (metadynamics) to get more rapid sampling of the targetdistribution. Diﬀusion maps are an eﬀective tool for uncovering a natural collective variable as needed for en-hanced sampling. In this work, we have formalized the use of diﬀusion maps within a metastablestate, which provides insight into diﬀusion map-driven sampling based on iterative procedures. In this example, we did not compute the full expected exit times since the diﬀerence was signiﬁcant for severalrealizations. To estimate the actual speed up of the sampling method, one would need to run thousands of trajectoriesand perform a detailed statistical analysis on the results, which is proposed for future study.

Data is accessible as ”Martinsson, Anton; Trstanova, Zoﬁa. (2019). ISST Deca-Alanine. Universityof Edinburgh. School of Mathematics.https://doi.org/10.7488/ds/2491”. The following librarieswere used: - pydiﬀmap https://github.com/DiﬀusionMapsAcademics/pyDiﬀMap - OpenMM East-man, Peter, et al. ”OpenMM 7: Rapid development of high-performance algorithms for moleculardynamics.” PLoS computational biology 13.7 (2017): e1005659

B.L and Z.T. were supported by EPSRC Grant EP/P006175/1. B.L. was further supported by theAlan Turing Institute (EPSRC EP/N510129/1) as a Turing Fellow. T.L. is supported by the Eu-ropean Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement number 614492.The authors would like to thank Ben Goddard and Antonia Mey (both at University of Edin-burgh) for helpful discussions.

References [1] OpenMMTools.[2] Advanced Sampling in Molecular Simulation.

J. Comput. Chem. , 37(6):543–622, 2016.233] Rosalind J Allen, Chantal Valeriani, and Pieter Rein ten Wolde. Forward ﬂux sampling forrare event simulations.

Journal of physics: Condensed matter , 21(46):463102, 2009.[4] Ralf Banisch, Zoﬁa Trstanova, Andreas Bittracher, Stefan Klus, and P´eter Koltai. Diﬀusionmaps tailored to arbitrary non-degenerate Itˆo processes.

Appl. Comput. Harmon. Anal. , 2018.[5] Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux, Jean-Fran¸cois Paiement, Pascal Vincent,and Marie Ouimet. Learning eigenfunctions links spectral embedding and kernel PCA.

Neuralcomputation , 16(10):2197–2219, 2004.[6] Tristan Bereau, Denis Andrienko, and Kurt Kremer. Research Update: Computational mate-rials discovery in soft matter.

APL Mater. , 4(5):53101, 2016.[7] Tyrus Berry and John Harlim. Variable bandwidth diﬀusion kernels.

Appl. Comput. Harmon.Anal. , 40(1):68–96, 2016.[8] Iain Bethune, Ralf Banisch, Elena Breitmoser, Antonia B K Collis, Gordon Gibb, GianpaoloGobbo, Charles Matthews, Graeme J Ackland, and Benedict J Leimkuhler. MIST: A Simpleand Eﬃcient Molecular Dynamics Abstraction Library for Integrator Development.

Comp.Phys. Comm. , 236:224–236, 2019.[9] A. Binder, G. Simpson, and T. Leli`evre. A generalized parallel replica dynamics,.

J. Comput.Phys. , 284:595–616, 2015.[10] Lorenzo Boninsegna, Gianpaolo Gobbo, Frank No´e, and Cecilia Clementi. Investigating molec-ular kinetics by variationally optimized diﬀusion maps.

Journal of chemical theory and com-putation , 11(12):5947–5960, 2015.[11] Fr´ed´eric C´erou and Arnaud Guyader. Adaptive multilevel splitting for rare event analysis.

Stoch. Anal. Appl. , 25(2):417–443, 2007.[12] Wei Chen and Andrew L Ferguson. Molecular enhanced sampling with autoencoders: On-the-ﬂy collective variable discovery and accelerated free energy landscape exploration.

J. Comput.Chem. , 39(25):2079–2102, 2018.[13] Eliodoro Chiavazzo, Roberto Covino, Ronald R Coifman, C William Gear, Anastasia S Geor-giou, Gerhard Hummer, and Ioannis G Kevrekidis. Intrinsic map dynamics exploration foruncharted eﬀective free-energy landscapes.

Proc. Natl. Acad. Sci. , 114(28):E5494—-E5503,2017.[14] Christophe Chipot and J´erˆome H´enin. Exploring the free-energy landscape of a short peptideusing an average force.

J. Chem. Phys. , 123(24):244906, 2005.[15] Nicolas Chopin, Tony Leli`evre, and Gabriel Stoltz. Free energy methods for Bayesian inference:eﬃcient exploration of univariate Gaussian mixture posteriors.

Stat. Comput. , 22(4):897–916,2012.[16] Giovanni Ciccotti, Raymond Kapral, and Eric Vanden-Eijnden. Blue moon sampling, vectorialreaction coordinates, and unbiased constrained dynamics.

ChemPhysChem , 6(9):1809–1814,2005. 2417] R. Coifman and S. Lafon. Diﬀusion maps.

Appl. Comput. Harmon. Anal. , 21(1):5–30, 2006.[18] Ronald R Coifman, Ioannis G Kevrekidis, St´ephane Lafon, Mauro Maggioni, and Boaz Nadler.Diﬀusion maps, reduction coordinates, and low dimensional representation of stochastic sys-tems.

Multiscale Model. Simul. , 7(2):842–864, 2008.[19] Pierre Collet, Servet Martinez, and Jaime San Martin.

Quasi-stationary distributions: Markovchains, diﬀusions and dynamical systems . Springer Science & Business Media, 2012.[20] Jeﬀrey Comer, James C Gumbart, J´erˆome H´enin, Tony Leli`evre, Andrew Pohorille, andChristophe Chipot. The adaptive biasing force method: Everything you always wanted toknow but were afraid to ask.

J. Phys. Chem. B , 119(3):1129–1151, 2014.[21] Eric Darve and Andrew Pohorille. Calculating free energies using average force.

J. Chem.Phys. , 115(20):9169–9183, 2001.[22] Eric Darve, David Rodr´ıguez-G´omez, and Andrew Pohorille. Adaptive biasing force methodfor scalar and vector free energy calculations.

J. Chem. Phys. , 128(14), 2008.[23] Peter Deuﬂhard and Marcus Weber. Robust perron cluster analysis in conformation dynamics.

Linear algebra and its applications , 398:161–184, 2005.[24] G. Di Gesu, T. Leli`evre, D. Le Peutrec, and B. Nectoux. Sharp asymptotics of the ﬁrst exitpoint density. arXiv:1706.08728v1 , 2017.[25] Rose Du, Vijay S Pande, Alexander Yu Grosberg, Toyoichi Tanaka, and Eugene S Shakhnovich.On the transition coordinate for protein folding.

J. Chem. Phys. , 108(1):334–350, 1998.[26] Weinan E, Ren Weiqing, and Eric Vanden-Eijnden. Finite temperature string method for thestudy of rare events.

J. Phys. Chem. B , 109(14):6688–6693, 2005.[27] Andrew L Ferguson, Athanassios Z Panagiotopoulos, Pablo G Debenedetti, and Ioannis GKevrekidis. Integrating diﬀusion maps with umbrella sampling: Application to alanine dipep-tide.

The Journal of chemical physics , 134(13):04B606, 2011.[28] Bernard Helﬀer and Francis Nier.

Quantitative analysis of metastability in reversible diﬀusionprocesses via a Witten complex approach: the case with boundary . Soci´et´e math´ematique deFrance, 2006.[29] Yuehaw Khoo, Jianfeng Lu, and Lexing Ying. Solving for high dimensional committor functionsusing artiﬁcial neural networks. arXiv Prepr. arXiv1802.10275 , 2018.[30] Rongjie Lai and Jianfeng Lu. Point Cloud Discretization of Fokker–Planck Operators forCommittor Functions.

Multiscale Model. Simul. , 16(2):710–726, 2018.[31] Alessandro Laio and Michele Parrinello. Escaping free-energy minima.

Proc. Natl. Acad. Sci. ,99(20):12562–12566, 2002.[32] Claude Le Bris, Tony Leli`evre, Mitchell Luskin, and Danny Perez. A mathematical formaliza-tion of the parallel replica dynamics.

Monte Carlo Methods Appl. , 18(2):119–146, 2012.2533] Benedict Leimkuhler and Charles Matthews. Robust and eﬃcient conﬁgurational molecularsampling via Langevin dynamics.

The Journal of chemical physics , 138(17):174102, 2013.[34] Benedict Leimkuhler, Charles Matthews, and Gabriel Stoltz. The computation of averagesfrom equilibrium and nonequilibrium Langevin molecular dynamics.

IMA J. Numer. Anal. ,36(1):13–79, 2015.[35] T Leli`evre, M Rousset, and G Stoltz.

Free Energy Computations: A Mathematical Perspective .Imperial College Press, 2010.[36] Jianfeng Lu and James Nolen. Reactive trajectories and the transition path process.

Probab.Theory Relat. Fields , 161(1-2):195–244, 2015.[37] Mohammad M. Sultan and Vijay S Pande. tICA-metadynamics: accelerating metadynamicsby using kinetically selected collective variables.

J. Chem. Theory Comput. , 13(6):2440–2447,2017.[38] Luca Maragliano and Eric Vanden-Eijnden. A temperature accelerated method for samplingfree energy and determining reaction pathways in rare events simulations.

Chem. Phys. Lett. ,426(1-3):168–175, 2006.[39] Anton Martinsson, Jianfeng Lu, Benedict Leimkuhler, and Eric Vanden-Eijnden. SimulatedTempering Method in the Inﬁnite Switch Limit with Adaptive Weight Learning. arXiv Prepr.arXiv1809.05066 , 2018.[40] Philipp Metzner, Christof Sch¨utte, and Eric Vanden-Eijnden. Illustration of transition paththeory on a collection of simple examples.

J. Chem. Phys. , 125(8):84110, 2006.[41] Kevin P Murphy.

Machine learning : a probabilistic perspective . MIT Press, Cambridge, Mass.[u.a.], 2013.[42] Boaz Nadler, Stephane Lafon, Ronald Coifman, and Ioannis G Kevrekidis. Diﬀusion Maps -a Probabilistic Interpretation for Spectral Embedding and Clustering Algorithms. In Alexan-der N Gorban, Bal´azs K´egl, Donald C Wunsch, and Andrei Y Zinovyev, editors,

Princ. Man-ifolds Data Vis. Dimens. Reduct. , pages 238–260, Berlin, Heidelberg, 2008. Springer BerlinHeidelberg.[43] Andreas Nold, Benjamin D Goddard, Peter Yatsyshin, Nikos Savva, and Seraﬁm Kalliadasis.Pseudospectral methods for density functional theory in bounded and unbounded domains.

J.Comput. Phys. , 334:639–664, 2017.[44] Lars Onsager. Initial recombination of ions.

Phys. Rev. , 54(8):554, 1938.[45] Vijay S Pande, Alexander Yu Grosberg, Toyoichi Tanaka, and Daniel S Rokhsar. Pathwaysfor protein folding: is a new view needed?

Curr. Opin. Struct. Biol. , 8(1):68–79, 1998.[46] D. Perez, B.P. Uberuaga, and A.F. Voter. The parallel replica dynamics method - coming ofage.

Computational Materials Science , 100:90–103, 2015.[47] Jordane Preto and Cecilia Clementi. Fast recovery of free energy landscapes via diﬀusion-map-directed molecular dynamics.

Phys. Chem. Chem. Phys. , 16(36):19181–19191, 2014.2648] Jan-Hendrik Prinz, Martin Held, Jeremy C Smith, and Frank No´e. Eﬃcient computation,sensitivity, and error analysis of committor probabilities for complex dynamical processes.

Multiscale Model. Simul. , 9(2):545–567, 2011.[49] Jan-Hendrik Prinz, Bettina Keller, and Frank No´e. Probing molecular kinetics with markovmodels: metastable states, transition pathways and spectroscopic observables.

Physical Chem-istry Chemical Physics , 13(38):16912–16927, 2011.[50] Mary A. Rohrdanz, Wenwei Zheng, Mauro Maggioni, and Cecilia Clementi. Determination ofreaction coordinates via locally scaled diﬀusion map.

J. Chem. Phys. , 134(12), 2011.[51] Lula Rosso, Peter Min´ary, Zhongwei Zhu, and Mark E Tuckerman. On the use of the adiabaticmolecular dynamics technique in the calculation of free energy proﬁles.

J. Chem. Phys. ,116(11):4389–4402, 2002.[52] Ch Sch¨utte, Alexander Fischer, Wilhelm Huisinga, and Peter Deuﬂhard. A direct approachto conformational dynamics based on hybrid monte carlo.

Journal of Computational Physics ,151(1):146–168, 1999.[53] Christof Sch¨utte and Wilhelm Huisinga. Biomolecular conformations can be identiﬁed asmetastable sets of molecular dynamics.

Handb. Numer. Anal. , 10:699–744, 2003.[54] Patrick Shaﬀer, Omar Valsson, and Michele Parrinello. Enhanced, targeted sampling of high-dimensional free-energy landscapes using variationally enhanced sampling, with an applicationto chignolin.

Proc. Natl. Acad. Sci. , 113(5):1150–1155, 2016.[55] Amit Singer. From graph to manifold Laplacian: The convergence rate.

Appl. Comput.Harmon. Anal. , 21(1):128–134, 2006.[56] Erik H Thiede, Dimitrios Giannakis, Aaron R Dinner, and Jonathan Weare. Galerkin Ap-proximation of Dynamical Quantities using Trajectory Data. arXiv Prepr. arXiv1810.01841 ,2018.[57] Glenn M Torrie and John P Valleau. Nonphysical sampling distributions in Monte Carlofree-energy estimation: Umbrella sampling.

J. Comput. Phys. , 23(2):187–199, 1977.[58] Lloyd N Trefethen.

Spectral methods in MATLAB , volume 10. Siam, 2000.[59] Eric Vanden-Eijnden. Transition path theory. In

Comput. Simulations Condens. Matter Syst.From Mater. to Chem. Biol. Vol. 1 , pages 453–493. Springer, 2006.[60] Francesca Vitalini, Antonia S J S Mey, Frank No´e, and Bettina G Keller. Dynamic propertiesof force ﬁelds.

J. Chem. Phys. , 142(8):02B611 1, 2015.[61] A.F. Voter. Parallel replica method for dynamics of infrequent events.

Phys. Rev. B , 57(22):R13985, 1998.[62] Wenwei Zheng, Mary A Rohrdanz, and Cecilia Clementi. Rapid exploration of conﬁgurationspace with diﬀusion-map-directed molecular dynamics.