[PDF] A Toolbox for Quantifying Memory in Dynamics Along Reaction Coordinates

Abstract

Memory effects in time-series of experimental observables are ubiquitous, have important cosequences for the interpretation of kinetic data, and may even affect the function of biomolecular nanomachines such as enzymes. Here we propose a set of complementary methods for quantifying conclusively the magnitude and duration of memory in a time series of a reaction coordinate. The toolbox is general, robust, easy to use, and does not rely on any underlying microscopic model. As a proof of concept we apply it to the analysis of memory in the dynamics of the end-to-end distance of the analytically solvable Rouse-polymer model, an experimental time-series of extensions of a single DNA hairpin measured by optical tweezers, and the fraction of native contacts in a small protein probed by atomistic Molecular Dynamics simulations.

Full PDF

AA Toolbox for Quantifying Memory in Dynamics Along Reaction Coordinates

Alessio Lapolla and Aljaˇz Godec ∗ Mathematical bioPhysics Group, Max Planck Institute for Biophysical Chemistry, G¨ottingen 37077, Germany (Dated: January 14, 2021)Memory eﬀects in time-series of experimental observables are ubiquitous, have important con-sequences for the interpretation of kinetic data, and may even aﬀect the function of biomolecularnanomachines such as enzymes. Here we propose a set of complementary methods for quantifyingconclusively the magnitude and duration of memory in a time series of a reaction coordinate. Thetoolbox is general, robust, easy to use, and does not rely on any underlying microscopic model. Asa proof of concept we apply it to the analysis of memory in the dynamics of the end-to-end distanceof the analytically solvable Rouse-polymer model and an experimental time-series of extensions ofa single DNA hairpin measured in an optical tweezers experiment.

The dynamics of complex, high-dimensional physicalsystems such as complex biomolecules is frequently de-scribed by means of memory-less, Markovian diﬀusionalong a low-dimensional reaction coordinate [1–9]. Suchsimpliﬁed models often accurately describe selected ob-servations in experiments [10–14] and computer simula-tions [1, 6, 7]. However, as soon as latent, hidden de-grees of freedom that become projected out do not relaxinstantaneously on the time scale we observe the reac-tion coordinate [15], or the reaction coordinate does notlocally equilibrate in long-lived meta-stable meso-states[16], almost any projection of high-dimensional dynamicsonto a lower dimensional coordinate introduces memory[15–24].Memory eﬀects can have intriguing manifestations inthe evolution of both, ensemble- [15, 25–28] and time-average observables [15, 29], and are often particularlywell-pronounced in observations that reﬂect, or coupleto, intra-molecular distances in conformationally ﬂexiblebiomolecules [16, 18, 20–22, 25, 30–38]. Moreover, if thedynamics is ergodic in the sense that the system relaxesto a unique equilibrium probability density function fromany initial condition (i.e. the reaction coordinate has aunique free energy landscape) then the memory is neces-sarily transient [15]. Whether or not memory is in factrelevant depends on how its extent compares to the re-laxation time and whether or not the latter is reached inan experiment. If the extent of memory is comparableto, or longer than, the time-scale on which biomoleculesoperate, such e.g. enzymes catalyzing chemical reactions[39, 40], non-Markovian eﬀects shape biological function.It is therefore important to asses the presence andduration of memory eﬀects in the dynamics along re-action coordinates. An elegant “test of Markovianity”of a reaction coordinate has recently been proposed byBerezhkovskii and Makarov, who considered the behav-ior of transition paths [41]. The authors provide a pair ofinequalities whose violation conclusively reﬂects that thedynamics is non-Markovian. However, memory-eﬀectsare typically transient [15] although their extent mayexceed the duration of experimental observations [33].There is thus a need to determine not only the presence of memory in a time-series of a reaction coordinate butalso its extent and attenuation at diﬀerent time-scales.Here, we ﬁll this gap by providing a toolbox for quan-tifying memory in a time-series of a low-dimensional re-action coordinate. We propose a set of model-free com-plementary methods that are easy to use and are suitedto treat reaction coordinates with arbitrary dimensional-ity. As a proof of concept we apply these methods to theanalysis of an experimental time-series of the extensionof a DNA-hairpin measured in an optical tweezers exper-iment and the exactly-solvable Rouse model of polymerchain.

Theory.—

Our approach rests on a comparison of thetrue time-evolution with a pair reference processes: oneconstructed as a ﬁctitious, Markovian time-series and theother testing for the semi-group property. Let q t with0 ≤ t ≤ T denote the time-series of the reaction coor-dinate and q M t the ﬁctitious Markovian series. Withoutany loss of generality we assume that the reaction co-ordinate is one-dimensional – the generalization to mul-tiple dimensions is straightforward. We assume q t and q M t to be ergodic with an equilibrium probability den-sity p eq ( q ) that is by construction identical for both pro-cesses. Let G ( q, t | q ) = (cid:104) δ ( q t − q ) (cid:105) q denote the proba-bility density that the reaction coordinate evolving from q t =0 = q is found at time t to have a value in an inﬁnites-imal neighborhood of q and G M ( q, t | q ) = (cid:104) δ ( q M t − q ) (cid:105) q the corresponding Markovian counterpart, where δ ( x ) de-notes Dirac’s delta function and the angular brackets (cid:104)·(cid:105) q the average over all realizations of q t evolving from q .We then have lim t →∞ G ( q, t | q ) = lim t →∞ G M ( q, t | q ) = p eq ( q ) as a result of ergodicity. In practice the limits areachieved as soon as t becomes suﬃciently larger than therelaxation time t rel , i.e. t (cid:38) t rel , which may or may notbe reached in an experiment. Note that the relaxationtimes of the true and Markovian reference process aretypically not equal.We use two descriptors, the Kullback-Leibler diver-gence between the transition probabilities of the true and a r X i v : . [ c ond - m a t . s t a t - m ec h ] J a n a ﬁctitious reference process deﬁned as [42] D a q ( t ) ≡ (cid:90) dqG ( q, t | q ) ln[ G ( q, t | q ) /G a ( q, t | q )] , (1)where a = M,CK denotes the particular kind of referenceprocess that we detail below. By construction D a q ( t ) (cid:54) = 0if and only if G ( q, t | q ) (cid:54) = G a ( q, t | q ) and thus non-zerovalues of D a q ( t ) reﬂect memory in the dynamics of q t .When q t reaches equilibrium in the coarse of the exper-iment we may also consider the normalized equilibriumautocorrelation function deﬁed as C ,M ( t ) ≡ (cid:104) q t q (cid:105) ,M − (cid:104) q (cid:105) (cid:104) q (cid:105) eq − (cid:104) q (cid:105) (2)where we have introduced (cid:104) q t q (cid:105) ,M ≡ (cid:90) (cid:90) qq G ,M ( q, t | q ) p eq ( q ) dqdq T (cid:29) t rel = ( T − t ) − (cid:90) T − t q τ + t q τ dτ, t (cid:28) T, (cid:104) q n (cid:105) eq ≡ (cid:90) q n p eq ( q ) dq, for n = 1 , T (cid:29) t rel = T − (cid:90) T q nτ dτ (3)where the respective deﬁnitions in terms of time-averageshold if the trajectory is much longer than the relaxationtime, i.e. T (cid:29) t rel . The absence of an index refers to thetrue process and the index M to the ﬁctitious Markoviancounterpart.In the comparison we consider two distinct referenceprocesses. The ﬁrst one builds on the semi-group prop-erty of Markov processes, i.e. the so-called Chapman-Kolmogorov equation that we may write as G CK τ ( q, t | q ) ≡ (cid:90) G ( q, t − τ | q (cid:48) ) G ( q (cid:48) , τ | q ) dq (cid:48) (4)because the Green’s function of a time-homogeneousMarkov process is time-translation invariant, G ( q, t − τ | q (cid:48) ) = G ( q, t | q (cid:48) , τ ) [43]. The physical interpretation ofthe construction is that we observe the true dynamics q t until time τ and then instantaneously reset the memory(if any) to zero.If q t is indeed memoryless we have identically G CK τ ( q, t | q ) = G ( q, t | q ) for any τ and thus D CK τ,q ( t ) =0 for any t and τ . If G CK τ ( q, t | q ) (cid:54) = G ( q, t | q ) forsome t and τ then q t is conclusively non-Markovian and D CK τ,q ( t ) > q t to reach equilibrium during an exper-iment and requires only G ( q, t | q ) that is straightforwardto determine from a time series q t . However, if equilib-rium is reached then lim t (cid:29) t rel D CK τ,q ( t ) = 0 for any q . By analyzing D CK τ,q ( t ) we can quantify the degree of mem-ory and its range as a function of τ and q which wedemonstrate below.In the second method we construct from q t a cor-responding Markovian time-series q M t evolving underthe inﬂuence of the potential of mean force w ( q ) ≡− k B T ln p eq ( q ) according to the Itˆo Langevin equation ddt q M t = Df ( q M t ) /k B T + ξ t (5)where f ( q M t ) ≡ − k B T ∂ q ln p eq ( q ) | q = q M t and ξ t denoteszero mean Gaussian white noise with variance (cid:104) ξ t ξ t (cid:48) (cid:105) =2 Dδ ( t − t (cid:48) ) and D is the diﬀusion coeﬃcient. This methodassumes that we are able to determine the equilibriumprobability density p eq ( q ) and thus requires q t to reachequilibrium. In the simplest model, which we adopt here,it is assumed that the diﬀusion coeﬃcient does not de-pend on q . In this case inferring D is straightforward.However, we note that the best possible Markovian ap-proximation would allow that the diﬀusion coeﬃcient de-pends on q , i.e. D → D ( q ) [45] and eﬃcient methodshave been developed to treat this case as well [46, 47].In this case Eq. (5) is to be interpreted in the thermo-dynamically consistent anti-Itˆo or post-point convention[16].On the level of the probability density function Eq. (5)corresponds to the Fokker-Planck equation ∂ t G M ( q, t | q ) = ∂ q D [ ∂ q − ( k B T ) − f ( q )] G M ( q, t | q ) (6)with initial condition G M ( q, t | q ) = δ ( q − q ) andnatural boundary conditions imposed by the under-lying physics. Depending on the speciﬁc problem G M ( q, t | q ) can be found by a numerical integration ofthe Langevin equation and subsequent histogram analy-sis, i.e. G M ( q, t | q ) = (cid:104) δ ( q M t − q ) (cid:105) q , or directly by solvingEq. (6) as e.g. [15, 48]. In the following we illustrate bothapproaches. End-to-end distance of a Rouse polymer.—

As a ﬁrstexample we consider a Rouse polymer chain with N + 1-beads ( N -bonds) in absence of hydrodynamic interac-tions [49, 50] and focus on the end-to-end distance as thereaction coordinate, i.e. q t ≡ | r − r N +1 | . An importantfeature of this model is that it is exactly solvable. We ex-press time in units of t Kuhn , the characteristic diﬀusiontime of a Kuhn-segment, i.e. t Kuhn = b /D , where b isthe Kuhn-length and D the diﬀusion coeﬃcient of a bead.The probability density function for the positions of allbeads { r i } is well-known [49, 50] and allows us to de-termine exactly the probability density of the end-to-enddistance. Introducing ν k ≡ kπ/ N + 1), α k = 4 sin ( ν k )as well as Q ik ≡ (cid:112) / ( N + 1) cos( ν k [2 i − η t ≡ (cid:80) Nk =1 ( Q k − Q k ) e − α k t / α k the equilibrium probabil-ity density of q is given by p eq ( q ) = q e − q / η / √ πη / for q ∈ [0 , ∞ ) with the mean extension (cid:104) d (cid:105) = 4 (cid:112) η /π and mean square extension (cid:104) d (cid:105) = 6 η . The probabilitydensity function of q reads exactly (for a derivation seeRef. [26]) G R ( q, t | q ) = qq e − η ( q + q ) / η − η t ) p eq ( q )2 πη t (cid:112) η − η t sinh (cid:18) η t qq η − η t ) (cid:19) . (7)The exact autocorrelation function in Eq. (2) is in turnobtained in the form C ( t ) = 6 (cid:112) η − η t (3 π − η + 4( η + η t ) arctan (cid:16) η t / (cid:112) η − η t (cid:17) (3 π − η η t (8)The Fokker-Planck equation in the Markovian approx-imation to the evolution of q for the Rouse polymer(i.e. Eq. (6)) can be obtained exactly in the formof a spectral expansion [48] and reads G M R ( q, t | q ) = (cid:80) ∞ k =0 ψ Rk ( q ) ψ Lk ( q )e − kt/η where ψ Lk ( x ) ≡ (cid:115) k ! √ π / k ) L / k (cid:18) x η (cid:19) (9)where Γ( x ) denotes the Gamma-function and L / k ( x ) thegeneralized Laguerre polynomial of degree k with pa-rameter 1 / ψ Rk ( x ) = p eq ( x ) ψ Lk ( x ). Here from it isstraightforward to obtain the autocorrelation function inthe Markovian approximation that reads C M ( t ) = √ π − /π ∞ (cid:88) k =1 e − kt/η [ k !Γ(3 / − k ) Γ(3 / k )] − . (10)A comparison of the true autocorrelation function C ( t )given by Eq. (8) and the Markovian approximation C M ( t )given by Eq. (10) is shown in Fig. 1a, with the inset de-picting the corresponding equilibrium probability densityfunction p eq ( q ). Note that when the free energy land-scape w ( q ) overestimates the conﬁning eﬀect of hiddendegrees of freedom on q t the approximate Markovian evo-lution always overestimates the relaxation rate (e.g. [15];see also SM). Namely, the Markovian evolution assumesthe hidden degrees of freedom to remain at equilibrium atall times whereas the instantaneous ﬂuctuating restoringforce on q t is in this case smaller than the force arisingfrom the free energy landscape that corresponds to theaverage of the ﬂuctuating force.The Chapman-Kolmogorov-construct for the Rousepolymer, G CK τ ( q, t | q ), is somewhat lengthy and is the-fore given explicitly in the Supplementary Material, SM. G CK τ ( q, t | q ) diﬀers from Eq. (7) for all expect large val-ues of t − τ . A quantiﬁcation of the discrepancy betweenthe true and “Chapman-Kolmogorov” evolution of theend-to-end distance of the Rouse-polymer in terms of theKullback-Leibler divergence (1) is shown in Fig. 2a.A typical time evolution of D CK τ,q ( t ) gradually increases . . . .

81 10 a) C ( t ) t TrueMarkovian . . . .

81 10 − − b) C ( t ) t [ ms ] TrueMarkovian p e q ( q ) q p e q ( q ) q [nm] Figure 1. Comparison of the true autocorrelation func-tion C ( t ) (orange) with the Markovian approximation C M ( t )(blue) for a) a Rouse-polymer with 1000 monomers with timeexpressed in units of the diﬀusion time of a Kuhn-segment t Kuhn ; b) the extension of a DNA-hairpin. The insets depictthe respective equilibrium probability density function p eq ( q ).The dashed lines depict the initial conditions we consider inFig. 3. . . . . .

025 10 a) D C K τ , q ( t ) tq = 40 ,τ = 10040 , , , . . . . . .

06 10 b) D C K τ , q ( t ) t [ms] q = 664 nm ,τ = 0 . ms , . , . , . Figure 2. Kullback-Leibler divergence deﬁned in Eq. (1)between the true Green’s function G ( q, t | q ) and the ap-proximate Green’s function constructed from the Chapman-Kolmogorov equation (4) as a function of time t for a) theRouse-polymer with 1000 beads and b) the extension of aDNA-hairpin. Several values of τ are considered in the con-struction of G CK τ ( q, t | q ) as well as a pair of diﬀerent initialconditions q . Note that due to the particular construction ofEq. (4) times shorter than depicted are not accessible due tonumerical instability and poor statistics. from zero, reaches a maximum and afterwards returnsback to 0, which reﬂects the gradual build-up and atten-uation of memory because q t “remembers” the initial con-dition of the hidden degrees of freedom [15]. As a result,the Chapman-Kolmogorov Green’s function G CK τ ( q, t | q )fails to predict the true evolution of q t , and D CK τ,q ( t ) con-structed this way depends on both, τ and initial condition q . For the Rouse-polymer with 1000 beads D CK τ,q ( t ) (cid:54) = 0at least up to t ∼ × t Kuhn .Next we examine D M q ( t ), the Kullback-Leibler diver-gence (1) between the true Green’s function G ( q, t | q )and the Markovian approximation corresponding to thewhite-noise Markovian diﬀusion in the exact free energy . . . . a) D M q ( t ) t q = 504060 00 . . . . . − − b) D M q ( t ) t [ms] q = 668 [nm] Figure 3. Kullback-Leibler divergence D M t deﬁned in Eq. (1)between the true Green’s function G ( q, t | q ) and the Marko-vian approximation to the Green’s function G M ( q, t | q ) corre-sponding to the evolution equation (5) as a function of time t for a) the Rouse-polymer with 1000 beads evolving from q = 40 (orange), q = 50 (black), q = 60 (blue) and b) theextension of a DNA-hairpin evolving from a distance withina bin of thickness 1nm centered at q = 664 nm (orange), q = 668 (black) and q = 671 nm (blue). landscape (i.e. Eq. (5)). The results are shown in Fig. 3a.The qualitative features of the time-dependence of D M q ( t ) are similar to those observed in Fig. 2 – mem-ory builds up in a ﬁnite interval and smoothly returnsback to zero from the attained maximum. The intuitionbehind this result is that it takes a ﬁnite time to allowfor distinct evolutions of hidden degrees of freedom thatintroduce memory in the dynamics of the reaction coor-dinate q t . At long times memory is progressively lost asa result of the gradual relaxation of the hidden degrees offreedom to their respective equilibrium that in turn ren-ders the dynamics of the reaction coordinate eﬀectivelymemory-less and correspondingly D M q ( t ) vanishes. Single-molecule experiments on a DNA hairpin.—

As asecond example we consider an experimental time-seriesof the end-to-end distance of a single-strand DNA hairpinmeasured in an optical tweezers experiment performedby the Woodside group [53]. The data-set contains 11million measurements of the extension of the DNA hair-pin 30R50T4 held in a pair of optical traps with stiﬀness0 .

63 pN/nm and 1 . . µ s temporal resolution. It has been shown that thistime-series is non-Markovian [38]. We ﬁnd the lengthof the time-series to be much larger that the relaxationtime (see Fig. 1b) and therefore slice it into several piecesthat are statistically independent. More precisely, we usea conservative estimate, namely the time-scale t cut wherethe autocorrelation function of the extension, C ( t ), fallsto below (cid:39) t cut (cid:29) t rel andyields an ensemble of 50 statistically independent trajec-tories.We determine the equilibrium probability density p eq ( q ) (see inset of Fig. 1b) and two-point joint prob-ability density p ( q, t, q ,

0) = p ( q, t + t, q , t ) by per-forming a standard histogram analysis with a bin-size of l bin =0.35 nm, such that q refers to a bin of width l bin centered at q . The Greens function is thereupon ob-tained by the law of conditional probability, G ( q, t | q ) = p ( q, t, q , /p eq ( q ) while the autocorrelation function inEq. (2) is determined directly from the respective secondlines of Eq. (3).The Chapman-Kolmogorov construct is determinedfrom G ( q, t | q ) by direct integration of Eq. (4) and isused to determine D CK τ,q ( t ), while the corresponding ﬁc-titious Markovian process evolves as Markovian diﬀusionin a free energy landscape w ( q ) ≡ − k B T ln p eq ( q ) witha diﬀusion coeﬃcient D that we parameterize as fol-lows. We ﬁrst determine the ﬁrst and second momentof the displacement from each bin-point q l after a singletime-step δt = 2 . µ s, i.e. (cid:104) δq δt ( l ) (cid:105) and (cid:104) δq δt ( l ) (cid:105) where δq δt ( l ) = q t + δt − q t | q t = q l . We consider two bin-sizes, l D =0.01 nm and l D =0.001 nm, and ﬁnd the result tobe essentially independent on the precise value of l D wechoose (see SM). Moreover, the results are also ratherindependent of the location of the bin q l (see SM), im-plying that to a good approximation D may indeed betaken as being constant. It is of course possible to extendthe analysis to include a coordinate dependent diﬀusioncoeﬃcient D ( q ) (see e.g. [47]), which, however, is beyondthe scope of this proof-of-concept paper.From the ﬁrst and second local moments of q the dif-fusivity D is in turn determined according to D = N b (cid:88) l =1 (cid:104) δq δt ( l ) (cid:105) − (cid:104) δq δt ( l ) (cid:105) N b δt , (11)where the brackets (cid:104)·(cid:105) here denote the average over alldisplacements from this bin observed during the entiretime-series. The analysis yields D = 447 ± / ms for l D =0.001 nm and D = 448 ± / ms for l D =0.01nm, respectively. Using the values for and D obtainedin this way we generate the Markovian time-series q M t byintegrating the Itˆo Langevin equation (5) using the Euler-Mayurama scheme, and determine D M t ( q ) in Eq. (1) and C M ( t ) in Eq. (2), respectively.In contrast to the Rouse-polymer the DNA hairpin ex-ists in two characteristic conformational states – foldedand unfolded. As a result, the equilibrium probabilitydensity function p eq ( q ) is bimodal and the dynamics of q t displays signatures of metastability [53]. However, sincethe two peaks corresponding to the two sub-populationsare not separated (see inset of Fig.1b) the potential ofmean force w ( q ) is expected to underestimate the freeenergy barrier and therefore the Markovian evolution islikely to overestimate the relaxation rate. In completeagreement Fig.1b displays an overestimation of the rateof decay of autocorrelations in the Markovian approxi-mation by two orders of magnitude in time. Moreover, along-lived plateau is observed in the true C ( t ) spanningmore than an order of magnitude in time.In order to assess whether the mismatch between trueand Markovian time evolution is predominantly due toan underestimation of the free energy barrier betweenfolded and unfolded states of the hairpin we inspect heKullback-Leibler divergence (1) between the true and“Chapman-Kolmogorov evolution” of the extension ofthe hairpin is shown in Fig. 2b. The result clearly showspronounced signatures of memory in the evolution of theend-to-end distance of the hairpin that extend over morethan ∼

10 ms. Note that the “Chapman-Kolmogorov evo-lution” describes the correct evolution until time t = τ whereupon memory is reset to zero. Therefore a non-zero D Ck τ,q ( t ) is a clear signature of memory arising fromthe dynamical coupling of q t to hidden degrees of free-dom. Similar to the Rouse-polymer D CK τ,q ( t ) depends onthe initial condition q .A build-up and decay of memory similar to the Rouse-polymer is also observed in the time evolution of D M q ( t ),the Kullback-Leibler divergence between the Green’sfunction of the true evolution and the white-noise Marko-vian diﬀusion in the exact free energy landscape shown inFig. 3b. Notably, Fig. 2b and Fig. 3b display essentiallythe same extent of memory (though the peak is attainedsooner in the white-noise Markovian diﬀusion), demon-strating that metastability does not necessarily destroynor dominate memory in the evolution of reaction coordi-nates. Note that the presence of memory in metastablesystems is not unusual (see e.g. [20, 21] and [28]). Intotal, the analysis conclusively identiﬁes the presence ofextended memory eﬀects in the dynamics of the extensionof the hairpin.It is important to note that the extent of memory (ofthe order of ∼ q t and the initial conditions of thehidden degrees of freedom [15]. The information encodedin C ( t ) and D M , CK ( t ) is therefore diﬀerent – D M , CK ( t ) isa genuine measure of the extent and duration of memory. Conclusion.—

We presented a set of complementarymethods to quantify conclusively the degree and dura-tion of memory in a time series of a reaction coordinate q t . The proposed toolbox does not assume any partic-ular physical model. Instead it exploits the semi-groupproperty of Markov processes and constructs a ﬁctitiousMarkovian diﬀusion process in the free energy landscapeof q t , and compares the artiﬁcially constructed transi-tion probability density with the observed probabilitydensity. The analysis not only determines whether thedynamics of q t has memory but also quantiﬁes the mag-nitude and duration of memory and thus complementsthe recently proposed “test for Markovianity” based ontransition paths [41]. Whereas in our examples we con-sidered only one-dimensional coordinates and constant diﬀusion coeﬃcients the toolbox generalizes straightfor-wardly to higher-dimensional reaction coordinates anddiﬀusion landscapes D ( q ). The method is general, ro-bust, and easy to use. We therefore hope that it will ﬁndnumerous applications involving time-series derived fromexperiments and computer simulations. Acknowledgments

We thank Krishna Neupane andMichael T. Woodside for providing unlimited access totheir DNA-hairpin data. The ﬁnancial support from theGerman Research Foundation (DFG) through the EmmyNoether Program GO 2762/1-1 to AG is gratefully ac-knowledged. ∗ [email protected][1] R. B. Best and G. Hummer, Reaction coordinates andrates from transition paths, Proc. Natl. Acad. Sci. ,6732–6737 (2005).[2] J. J. Portman, S. Takada, and P. G. Wolynes, Microscopictheory of protein folding rates. ii. local reaction coordi-nates and chain dynamics, J. Chem. Phys. , 5082–5096(2001).[3] B. Peters, P. G. Bolhuis, R. G. Mullen, and J.-E. Shea, Re-action coordinates, one-dimensional smoluchowski equa-tions, and a test for dynamical self-consistency, J. Chem.Phys. , 054106 (2013).[4] A. K. Faradjian and R. Elber, Computing time scales fromreaction coordinates by milestoning, J. Chem. Phys. ,10880–10889 (2004).[5] A. Berezhkovskii and A. Szabo, One-dimensional reactioncoordinates for diﬀusive activated rate processes in manydimensions, J. Chem. Phys. , 014503 (2005).[6] G. Hummer, Position-dependent diﬀusion coeﬃcients andfree energies from bayesian analysis of equilibrium andreplica molecular dynamics simulations, New J. Phys. ,34–34 (2005).[7] R. B. Best and G. Hummer, Coordinate-dependent dif-fusion in protein folding, Proc. Natl. Acad. Sci. ,1088–1093 (2009).[8] W. Zhang, C. Hartmann, and C. Sch¨utte, Eﬀective dy-namics along given reaction coordinates, and reaction ratetheory, Faraday Discussions , 365–394 (2016).[9] A. M. Berezhkovskii and D. E. Makarov, Communication:Coordinate-dependent diﬀusivity from single molecule tra-jectories, J. Chem. Phys. , 201102 (2017).[10] O. K. Dudko, G. Hummer, and A. Szabo, The-ory, analysis, and interpretation of single-molecule forcespectroscopy experiments, Proceedings of the NationalAcademy of Sciences , 15755–15760 (2008).[11] K. Neupane, A. P. Manuel, and M. T. Woodside, Pro-tein folding trajectories can be described quantitativelyby one-dimensional diﬀusion over measured energy land-scapes, Nat. Phys. , 700 (2016).[12] K. Neupane, D. A. N. Foster, D. R. Dee, H. Yu, F. Wang,and M. T. Woodside, Direct observation of transitionpaths during the folding of proteins and nucleic acids, Sci-ence , 239 (2016).[13] J. Gladrow, M. Ribezzi-Crivellari, F. Ritort, and U. F.Keyser, Experimental evidence of symmetry breaking oftransition-path times, Nat. Commun. , 10.1038/s41467- , 10.1126/sciadv.aaz4642 (2020).[15] A. Lapolla and A. Godec, Manifestations of Projection-Induced Memory: General Theory and the Tilted SingleFile, Front. Phys. , 10.3389/fphy.2019.00182 (2019).[16] D. Hartich and A. Godec, Emergent memory and kinetichysteresis in strongly driven networks, arXiv:2011.04628(2020), arXiv:2011.04628 [cond-mat.stat-mech].[17] N. van Kampen, Remarks on Non-Markov Processes,Brazilian J. Phys. , 10.1590/S0103-97331998000200003(1998).[18] S. S. Plotkin and P. G. Wolynes, Non-markovian con-ﬁgurational diﬀusion and reaction coordinates for proteinfolding, Phys. Rev. Lett. , 5015 (1998).[19] R. Zwanzig, Nonequilibrium statistical mechanics (Ox-ford Univ. Press, 2010).[20] D. E. Makarov, Interplay of non-markov and internal fric-tion eﬀects in the barrier crossing kinetics of biopolymers:Insights from an analytically solvable model, J. Chem.Phys. , 014102 (2013).[21] M. Ozmaian and D. E. Makarov, Transition path dynam-ics in the binding of intrinsically disordered proteins: Asimulation study, J. Chem. Phys. , 235101 (2019).[22] H. Meyer, P. Pelagejcev, and T. Schilling, Non-markovianout-of-equilibrium dynamics: A general numerical pro-cedure to construct time-dependent memory kernels forcoarse-grained observables, EPL (Europhys. Lett.) ,40001 (2020).[23] E. Herrera-Delgado, J. Briscoe, and P. Sollich, Tractablenonlinear memory functions as a tool to capture and ex-plain dynamical behaviors, Phys. Rev. Research , 043069(2020).[24] F. M¨uller, U. Basu, P. Sollich, and M. Kr¨uger, Coarse-grained second-order response theory, Phys. Rev. Research , 043123 (2020).[25] W. Min, G. Luo, B. J. Cherayil, S. C. Kou, and X. S.Xie, Observation of a power-law memory kernel for ﬂuctu-ations within a single protein molecule, Phys. Rev. Lett. , 198302 (2005).[26] A. Lapolla and A. Godec, Faster uphill relaxation in ther-modynamically equidistant temperature quenches, Phys.Rev. Lett. , 10.1103/physrevlett.125.110602 (2020).[27] A. Lapolla and A. Godec, Bethesf: Eﬃcient computationof the exact tagged-particle propagator in single-ﬁle sys-tems via the bethe eigenspectrum, Comput. Phys. Com-mun. , 107569 (2021).[28] A. Lapolla and A. Godec, Single-ﬁle diﬀusion in a bi-stable potential: Signatures of memory in the barrier-crossing of a tagged-particle, J. Chem. Phys. , 194104(2020).[29] A. Lapolla and A. Godec, Unfolding tagged particle his-tories in single-ﬁle diﬀusion: exact single- and two-tag lo-cal times beyond large deviation theory, New J. Phys. ,113021 (2018).[30] S. C. Kou and X. S. Xie, Generalized langevin equationwith fractional gaussian noise: Subdiﬀusion within a singleprotein molecule, Phys. Rev. Lett. , 180603 (2004).[31] T. Neusius, I. Daidone, I. M. Sokolov, and J. C. Smith,Subdiﬀusion in peptides originates from the fractal-likestructure of conﬁguration space, Phys. Rev. Lett. ,188103 (2008). [32] S. Press´e, J. Peterson, J. Lee, P. Elms, J. L. MacCallum,S. Marqusee, C. Bustamante, and K. Dill, Single moleculeconformational memory extraction: P5ab rna hairpin, J.Phys. Chem. B , 6597–6603 (2014).[33] X. Hu, L. Hong, M. Dean Smith, T. Neusius, X. Cheng,and J. Smith, The dynamics of single protein molecules isnon-equilibrium and self-similar over thirteen decades intime, Nature Physics , 171–174 (2015).[34] A. K. Sangha and T. Keyes, Proteins Fold by Subdiﬀu-sion of the Order Parameter, J. Phys. Chem. B , 15886(2009).[35] S. M. Avdoshenko, A. Das, R. Satija, G. A. Papoian,and D. E. Makarov, Theoretical and computational vali-dation of the Kuhn barrier friction mechanism in unfoldedproteins, Sci. Rep. , 269 (2017).[36] Y. Cote, P. Senet, P. Delarue, G. G. Maisuradze, andH. A. Scheraga, Anomalous diﬀusion and dynamical cor-relation between the side chains and the main chain ofproteins in their native state, Proc. Natl. Acad. Sci. ,10346 (2012).[37] I. Grossman-Haham, G. Rosenblum, T. Namani, andH. Hofmann, Slow domain reconﬁguration causes power-law kinetics in a two-state enzyme, Proc. Natl. Acad. Sci. , 513 (2018).[38] A. G. T. Pyo and M. T. Woodside, Memory eﬀectsin single-molecule force spectroscopy measurements ofbiomolecular folding, Phys. Chem. Chem. Phys. , 24527(2019).[39] H. P. Lu, Single-molecule enzymatic dynamics, Science , 1877–1882 (1998).[40] B. P. English, W. Min, A. M. van Oijen, K. T. Lee,G. Luo, H. Sun, B. J. Cherayil, S. C. Kou, and X. S.Xie, Ever-ﬂuctuating single enzyme molecules: Michaelis-menten equation revisited, Nat. Chem. Biol. , 87–94(2005).[41] A. M. Berezhkovskii and D. E. Makarov, Single-MoleculeTest for Markovianity of the Dynamics along a ReactionCoordinate, J. Phys. Chem. Lett. , 2190 (2018).[42] S. Kullback and R. Leibler, On information and suﬃ-ciency, Ann. Math. Statist , 79 (1951).[43] Gardiner, C.W., Handbook of Stochastic Methods forPhysics, Chemistry and Natural Sciences , 2nd ed.(Springer-Verlag, 1985).[44] W. Feller, Non-Markovian Processes with the SemigroupProperty, The Annals of Mathematical Statistics , 1252(1959).[45] A. Berezhkovskii and A. Szabo, Time scale separationleads to position-dependent diﬀusion along a slow coordi-nate, J. Chem. Phys. , 074108 (2011).[46] B. J. Berne, M. Borkovec, and J. E. Straub, Classical andmodern methods in reaction rate theory, J. Phys. Chem. , 3711–3725 (1988).[47] G. Hummer, Position-dependent diﬀusion coeﬃcientsand free energies from bayesian analysis of equilibrium andreplica molecular dynamics simulations, New J. Phys. ,34 (2005).[48] S. Sunagawa and M. Doi, Theory of Diﬀusion-ControlledIntrachain Reactions of Polymers, Polymer J. , 604(1975).[49] P. E. Rouse, A Theory of the Linear Viscoelastic Prop-erties of Dilute Solutions of Coiling Polymers, J. Chem.Phys. , 1272 (1953).[50] K. H. Ahn, J. L. Schrag, and S. J. Lee, Bead-spring chainmodel for the dynamics of dilute polymer solutions, J. Non-Newton Fluid Mech. , 349 (1993).[51] Abramowitz, Milton and Stegun, Irene A., Handbookof Mathematical Functions with Formulas, Graphs, andMathematical Tables, in Handbook of Mathematical Func-tions with Formulas, Graphs, and Mathematical Tables (Dover, New York, 1964) ninth dover printing, tenth gpoprinting ed.[52] F. Johansson, Arb: Eﬃcient Arbitrary-PrecisionMidpoint-Radius Interval Arithmetic, IEEE Transactionson Computers , 1281 (2017).[53] K. Neupane, A. P. Manuel, J. Lambert, and M. T. Wood-side, Transition-Path Probability as a Test of Reaction-Coordinate Quality Reveals DNA Hairpin Folding Is aOne-Dimensional Diﬀusive Process, J. Phys. Chem. Lett. , 1005 (2015). Supplementary Material for:A Toolbox for Quantifying Memory in Dynamics Along Reaction Coordinates

Alessio Lapolla and Aljaˇz Godec

Mathematical bioPhysics Group, Max Planck Institute for Biophysical Chemistry, 37077 G¨ottingen, Germany

Abstract

In this Supplementary Material (SM) we present the exact result for the “Champman-Kolmogorov”construction of the Green’s function of the end-to-end distance and other internal coordinates of the Rousepolymer deﬁned in Eq. (4) in the manuscript as well as a local error analysis of the inferred diﬀusioncoeﬃcient D entering the Markovian approximation of the end-to-end diﬀusion of the DNA hairpin. Inaddition, supplementary ﬁgures are included showing the various Green’s functions for the Rouse polymerand DNA hairpin. DETAILS OF THE PROJECTION AFFECT THE RELAXATION TIME AND EXTENT OF MEMORY

In the main text we consider Rouse polymer chain composed of 1000 beads and we focus on the autocorrelationfunction of its end-to-end distance as the reaction coordinate q t . We ﬁnd that the ﬁctitious Markovian reference processdescribing Brownian diﬀusion in the free energy landscape overestimates the relaxation rate; a similar observation isalso made in the case of the experimental hairpin data. However this diﬀerence in the rate of relaxation is non-uniqueand in fact depends on the observable, i.e. on details of the projection. . . . .

81 10 − − C ( t ) t True, 1-2Markovian, 1-2True, 1-1000Markovian, 1-1000

Figure S4. The autocorrelation function of the distance between the ﬁrst and the second bead (dashed lines) and ﬁrst and lastbead (full lines) of a Rouse chain composed by 1000 according to the true (orange) and ﬁctitious Markovian evolution (blue).Note the the free energy landscape for both orange-blue pairs is by construction identical. The continuous lines are those shownin the main text.

For example we demonstrate in Fig. S4 the opposite trend that arises when we observe the autocorrelation functionof the distance between the ﬁrst and the second bead of the same Rouse Chain (see dashed lines).In addition, is worth to note that if the Green’s function G describing the full many-dimensional system is diago-nalizable (like in the Rouse chain case [1] or any Markovian dynamics obeying detailed balance), it can be writtenas G ( x , t | x ) = (cid:88) k ψ Rk ( x ) ψ Lk ( x )e − λ k t , (S1)where ψ Rk and ψ Lk are respectively the right and left eigenfunctions of the underlying Fokker-Planck-Smoluchowskioperator, while λ k denotes the eigenvalues. Then the Green’s function of the projected observable – the reactioncoordinate q = Γ( x ) – can be written in full generality [2] as G ( q, t | q ) = (cid:88) k V Rk ( q ; Γ) V Lk ( q ; Γ)e − λ k t , (S2)where the elements V Rk and V Lk depend both on ψ Rk and ψ Lk , and on the projection Γ( x ). In turn the autocorrelationfunction can be easily computed as: C ( t ) = (cid:88) k ( (cid:90) qV Rk ( q ; Γ) dq )( (cid:90) q V Lk ( q ; Γ) dq )e − λ k t ≡ (cid:88) k a R, Γ k b L, Γ k e − λ k t , (S3)and one can show that for systems obeying detailed balance a R, Π k b L, Π k ≥ ∼ while in the ﬁrst-to-second distance is ∼ . This disparity is simply a result of the projection that determinesthe relative contribution of diﬀerent eigenfunctions. THE CHAPMAN-KOLMOGOROV CONSTRUCTION

In the case of the Rouse chain the integral deﬁned in Eq. (4) in the main text can be solved analytically via astraightforward but tedious calculation using Eq. (7) in the main text. The result of the integral reads exactly G CKt ( q, t | q ) = η / qq η t − τ η τ (cid:112) π Ξ τ,t − τ sinh (cid:18) qq η η τ η t − τ τ,t − τ (cid:19) × exp (cid:34) − q η − t − τ (cid:18) − η t − τ Ω − τ τ,t − τ (cid:19) − q − τ (cid:32) η Ω − τ − η t − τ Ω + τ Ω − t − τ − η η τ Ω − t − τ Ξ τ,t − τ (cid:33)(cid:35) (S4)having deﬁned Ω ± t = η ± η t , Ξ τ,t − τ = 4 η − Ω + τ Ω + t − τ . (S5)Notably, the structure of Eq. (S4) is identical to the structure of the plain Green’s function (Eq. (7) in the main text)but here the temporal dependence is obviously diﬀerent.Note that in when the observation time is much larger than the relaxation time of the observable t rel , we ﬁndfor t − τ > t rel that G CK τ ( q, t | q ) (cid:39) p eq ( q ) (cid:82) dq (cid:48) G ( q (cid:48) , t | q ) = p eq ( q ). Therefore, since lim t →∞ G ( q, t | q (cid:48) ) = p eq ( q ), thedeﬁnition of G CK τ ( q, t | q ) (Eq. (4) in the main text) by construction ensures lim t →∞ D CK τ,q ( t ) = 0. In Fig. S5 weexplicitly show the Green’s function that is required for the computation of the Kullback-Liebler divergence. ESTIMATION OF THE DIFFUSION COEFFICIENT

Eq. (12) in the main text demonstrates how one can estimate the ( q -independent) diﬀusion coeﬃcient D from atime-series using the ﬁrst two moments of the local displacement.The moments are in turn determined as follows: if q t is found during an epoch in the bin centered around q and ofsize l D its displacement from the previous epoch is determined and its square and both are averaged over all epochsin the time-series yielding (cid:104) δq δt ( l ) (cid:105) and (cid:104) δq δt ( l ) (cid:105) respectively.This procedure is repeated for diﬀerent positions q on the support of the equilibrium distribution. We ﬁnd that theresulting diﬀusion coeﬃcient changes little over the entire support even for diﬀerent bin-sizes l D (see Fig.S6). Thereforewe treat it, as a ﬁrst approximation, as being independent of q and average over all bins to obtain D (cid:39) D = 448nm / ms. Using this value of D we perform Brownian Dynamics simulations of the ﬁctitious Markovian processaccording to the Euler-Mayurama scheme.0 . . .

12 0 20 40 60 80 100 120 140 a) G ( q , t | q ) q t = 10100100000 . . . . . c) G M ( q , t | q ) q t = 101001000 00 . . . .

16 650 655 660 665 670 675 680 b) G ( q , t | q ) q [nm] t = 0 . ms . . . . . .

16 650 655 660 665 670 675 680 d) G M ( q , t | q ) q [nm] t = 0 . ms . . Figure S5. Green’s function at diﬀerent times for both considered systems. a) and b) depict the true Green’s function for theend-to-end distance of the Rouse chain and of the DNA hairpin respectively. Panels c) and d) show the Green’s function oftheir respective ﬁctitious Markovian processes at the same times. The initial conditions are q = 60 for the Rouse chain and q = 671 nm for the hairpin. − . − . − . . . . . ( D − D ) / D q [nm] ∆ l = 0 . nm ∆ l = 0 . nm Figure S6. Relative deviation of the local diﬀusion coeﬃcient in a given bin D l from the average value D = N − (cid:80) Nl =1 D l asa function of the position of the bin. In the case of ∆ l = 0.001 nm we ﬁnd D = 447 nm / ms with a deviation ± / msand for ∆ l = 0.01 nm we ﬁnd D = 448 nm / ms with a deviation ± / ms. In a ﬁrst approximation the values of D l areindependent of l and we thus set D ≈ D (cid:39)

448 nm / ms. ∗ [email protected][1] G. Wilemski and M. Fixman, Diﬀusion-controlled intrachain reactions of polymers. I Theory, J. Chem. Phys. , 866 (1974).[2] A. Lapolla and A. Godec, Manifestations of Projection-Induced Memory: General Theory and the Tilted Single File, Front.Phys.7