[PDF] OPES: On-the-fly Probability Enhanced Sampling Method

Abstract

Molecular simulations are playing an ever increasing role, finding applications in fields as varied as physics, chemistry, biology and material science. However, many phenomena of interest take place on time scales that are out of reach of standard molecular simulations. This is known as the sampling problem and over the years several enhanced sampling methods have been developed to mitigate this issue. We propose a unified approach that puts on the same footing the two most popular families of enhanced sampling methods, and paves the way for novel combined approaches. The on-the-fly probability enhanced sampling method provides an efficient implementation of such generalized approach, while also focusing on simplicity and robustness.

Full PDF

aa r X i v : . [ phy s i c s . c o m p - ph ] J a n PREPRINT

OPES: On-the-ﬂy Probability Enhanced Sampling Method

Michele Invernizzi ( )( )( ) ( ) Department of Physics, ETH Zurich - Lugano, Switzerland ( ) Institute of Computational Science, Universit`a Svizzera Italiana - Lugano, Switzerland ( ) Italian Institute of Technology - Genova, Italy

Summary. — Molecular simulations are playing an ever increasing role, ﬁndingapplications in ﬁelds as varied as physics, chemistry, biology and material science.However, many phenomena of interest take place on time scales that are out of reachof standard molecular simulations. This is known as the sampling problem and overthe years several enhanced sampling methods have been developed to mitigate thisissue. We propose a uniﬁed approach that puts on the same footing the two mostpopular families of enhanced sampling methods, and paves the way for novel com-bined approaches. The on-the-ﬂy probability enhanced sampling method providesan eﬃcient implementation of such generalized approach, while also focusing onsimplicity and robustness.

1. – Introduction

Despite the remarkable improvements over the last decades, both in computationalpower and in algorithms eﬃciency, molecular simulations are still limited in their scopeby the sampling problem. Many phenomena of interest, such as protein-ligand bindingor crystal nucleation, are rare events that take place on macroscopic time scales andthus would require an impractical amount of computation to be simulated using stan-dard molecular dynamics or Markov-chain Monte Carlo. To circumvent this problem,a plethora of enhanced sampling methods have been developed that aim at allowing asimulation to visit all the relevant metastable states, without being hindered by kineticbottlenecks. Apart from some remarkable exceptions [1, 2], all the most popular en-hanced sampling techniques can be roughly grouped in two main families, that we willrefer to as collective variables methods and expanded ensembles methods. We proposea uniﬁed perspective on enhanced sampling, that allows us to develop a general methodto perform both kinds of sampling in a robust and eﬃcient way. This new perspectivemakes enhanced sampling more accessible and simpler to use and can also open up tonovel sampling strategies. The method we developed, called on-the-ﬂy probability en-hanced sampling (OPES), is described in detail in Refs. [3] and [4]. Here we present itsmain features in a synthetic fashion. MICHELE INVERNIZZI

2. – Uniﬁed approach

The goal of enhanced sampling is to increase the probability of observing in a simu-lation certain rare events, and to do it in such a way that it is still possible to retrievestatistics about the original system. We call target distribution the modiﬁed probabilitydistribution that is sampled instead of the Boltzmann one. Rather than focusing on thevarious computational techniques used by the diﬀerent enhanced sampling methods, wepropose to group them according to how they deﬁne the target distribution they aim tosample. Following this criteria we can identify two main families.A ﬁrst family is the one of methods such as umbrella sampling [5] and metadynamics[6]. These methods make use of collective variables (CVs) or order parameters that aresmooth functions of the atomistic coordinates and encode the slow modes of the system.In this family, the target distribution is deﬁned by requiring that its marginal probabilitydistribution over such CVs has a given functional form. Typically the marginal is chosento be a constant ﬂat distribution, as in adaptive umbrella sampling [7], but other choicesare possible, such as the well-tempered distribution often used in metadynamics [8].A second family includes tempering methods, such as simulated tempering [9] andreplica exchange [10]. These methods deﬁne their target distribution as the combinationof slightly diﬀerent versions of the original system, for example the same system but athigher temperatures. These target distribution are also known as generalized ensemblesor expanded ensembles [11].The OPES method can be used to sample either kind of target distributions. It doesso by adding to the potential energy of the system U ( x ) a bias potential V ( x ) such thatthe sampled distribution is not the equilibrium Boltzmann distribution, P ( x ) ∝ e − βU ( x ) ,but the target one, p tg ( x ). This bias potential is deﬁned as(1) V ( x ) = − β log p tg ( x ) P ( x ) , where β is the inverse Boltzmann temperature. The bias potential is not known a priori ,but it is self-consistently learned during the simulation via an on-the-ﬂy estimate ofthe probability distributions. Statistics of the unbiased system can be retrieved via areweighting procedure, by assuming that the bias is updated in an adiabatic way. Givenany observable O ( x ), its ensemble average h O i over the unbiased system can be estimatedvia ensemble averages over the sampled biased system:(2) h O i = h Oe βV i V h e βV i V . In this way also free energy diﬀerences and free energy surfaces can be estimated [3, 4].

3. – OPES for collective variables sampling

Given a set of collective variables s = s ( x ), one can deﬁne the marginal probability(3) P ( s ) = Z P ( x ) δ [ s ( x ) − s ] d x . The well-tempered ensemble with respect to these CVs is obtained by requiring thatthe marginal of the target distribution is p WT ( s ) ∝ [ P ( s )] /γ , where γ > PES: ON-THE-FLY PROBABILITY ENHANCED SAMPLING METHOD bias factor. Notice that the exact target distribution p tg ( x ) is not known but this doesnot constitute a problem. In fact, the core requirements are that the corresponding biaspotential can be expressed and that the target distribution is easy to sample, thus thekinetic bottlenecks between metastable states are removed. This is indeed guaranteedfor the well-tempered distribution, given that the CVs are chosen properly and the biasfactor is large enough. The case of uniform marginal target distribution can be seen asa special case of the well-tempered one, where γ = ∞ .When using OPES for CVs sampling we need to estimate P ( s ). To do so, we use aweighted kernel density estimation with an automatic kernel merging algorithm, that isexplained in detail in Ref. [3]. We also introduce a regularization term ǫ and a normal-ization Z over the explored CV-space. At step n the bias, Eq. (1), can be written as afunction of the CVs:(4) V n ( s ) = (1 − /γ ) 1 β log (cid:18) P n ( s ) Z n + ǫ (cid:19) , where P n ( s ) is the estimate of P ( s ) obtained via reweighting. Reference [3] presents thefull derivation of this expression.

4. – OPES for expanded ensembles sampling

To deﬁne the expanded ensemble target distribution, we ﬁrst deﬁne a class of proba-bility distributions P λ ( x ) ∝ e − βU λ ( x ) , where λ can be a parameter (e.g. the temperature)or a set of parameters, and U is the original system potential. For simplicity we onlyconsider nonweighted expanded ensembles, as done in Ref. [4]. The expanded ensemblecontains a discrete set { λ } of N { λ } parameters such that the corresponding P λ ( x ) havean overlap in the conﬁguration space. We can write the expanded target distribution as:(5) p { λ } ( x ) = 1 N { λ } X λ P λ ( x ) . One can then deﬁne the expansion collective variables as ∆ u λ ( x ) = βU λ ( x ) − βU ( x )and use them to write the bias potential at step n :(6) V n ( x ) = − β log N { λ } X λ e − ∆ u λ ( x )+ β ∆ F n ( λ ) ! , where ∆ F n ( λ ) are the estimates of the free energy diﬀerences between the unbiasedsystem U and the one at a given λ . These are obtained via on-the-ﬂy reweighting,similarly to P n ( s ) in Sec. , but this time without the need for kernel density estimationas { λ } is a discrete set. The details of the derivation are explained in Ref. [4].Finally, we notice that often it is possible to rewrite Eq. (6) so that, similarly toEq. (4), the bias is a function of only a small number of CVs. For example, in case ofa multithermal expanded target distribution the bias can be expressed as a function ofthe potential energy only [4]. MICHELE INVERNIZZI (a) P r obab ili t y sampledreweighted 0 0.01 0.02 0.03 (a) W E LL − T E M PE R E D sampledreweighted 0 0.2 0.4 0.6 0.8 1 −3 −2 −1 0 1 2 3 (b) P r obab ili t y Angle f sampledreweighted 0 0.01 0.02 0.03 −50 0 50 100 150 200 (b) M U L T I − T H E R M A L Potential Energy Usampledreweighted

Fig. 1. – Marginal probabilities over the φ angle and the potential energy for OPES simulationsof alanine dipeptide with diﬀerent target distributions. The Boltzmann distribution as obtainedvia reweighting is also shown. Top row: well-tempered distribution over φ , γ = 50. Bottomrow: temperature-expanded ensemble, from 300 to 1000 K.

5. – Example: alanine dipeptide

As an example we consider alanine dipeptide in vacuum at 300 K, which is a pro-totypical system for enhanced sampling. It presents two main metastable basins, thatcan be characterized using as CV the torsion angle φ . The most stable one containstwo minima and lays in the region where φ is negative, while the second basin has oneminimum at φ ≃

1. A standard unbiased simulation would suﬀer the sampling problemand almost never make transitions between the two basins. In Fig. 1 we show diﬀerenttarget distributions that can be used to study alanine, by plotting their marginal along φ and the potential energy. On the top, Fig. 1a, is the well-tempered ensemble over φ ( γ = 50), which allows the system to easily visit both basins, and greatly increasesthe probability of sampling intermediate conﬁgurations around φ = 0. On the bottom,Fig. 1b, is an expanded ensemble that combines four diﬀerent temperatures, startingfrom 300 K up to 1000 K. At higher temperatures alanine explores conﬁgurations withhigher energy where the barrier between the two basins is smaller. Also in this case theprobability of visiting conﬁgurations around φ = 0 is increased, but less sensibly. On theother hand from this second simulation it is possible to retrieve statistics about a wholerange of temperatures, instead of 300 K only. The two target distributions presented areclearly diﬀerent but we can see that, once reweighting is performed, one retrieves thesame underlying Boltzmann probability (dashed lines).It is well known that is possible to enhance the sampling along a CV also using anexpanded ensemble as target, e.g. by combining multiple umbrella sampling distributions[12]. In Fig. 2a one can see that a multiumbrella target over φ can look very diﬀerent froma well-tempered target over the same CV, Fig. 1a. However, in both cases the system PES: ON-THE-FLY PROBABILITY ENHANCED SAMPLING METHOD (a) P r obab ili t y sampledreweighted 0 0.01 0.02 0.03 (a) M U L T I − U M B R E LL A sampledreweighted 0 0.2 0.4 0.6 0.8 1 −3 −2 −1 0 1 2 3 (b) P r obab ili t y Angle f sampledreweighted 0 0.01 0.02 0.03 −50 0 50 100 150 200 (b) M U L T I − T H E R M A L − U M B R E LL A Potential Energy Usampledreweighted

Fig. 2. – Marginal probabilities over the φ angle and the potential energy for OPES simulationsof alanine dipeptide with diﬀerent expanded target distributions. The Boltzmann distributionas obtained via reweighting is also shown. Top row: multiumbrella distribution over φ , with43 evenly spaced umbrellas. Notice that it looks quite diﬀerent from Fig. 1a, even though inthe limit of inﬁnite umbrellas and γ = ∞ they would both sample a uniform distribution over φ . Bottom row: combined multithermal and multiumbrella distribution, with 43 umbrellas andtemperature range from 300 to 1000 K. easily transitions from one metastable basin to the other and the probability of being inthe transition region (around φ = 0) is greatly increased compared to the unbiased case.By increasing the number of umbrellas in the target we can eventually reach a uniformmarginal, similar to the well-tempered one with γ = ∞ . Finally, it is also possible tocombine diﬀerent expansions in the same target, as shown in Fig. 2b, where the usedbias is a function both of the angle φ and of the potential energy U .All the simulation details, together with the inputs and the trajectories are availableonline on the PLUMED-NEST repository ( , plumID:21.006) [13].

6. – Conclusion

We brieﬂy presented a target distribution perspective on enhanced sampling and theon-the-ﬂy probability enhanced sampling method (OPES), that have been developed inRefs. [3, 4]. OPES is a general and ﬂexible method that can be used to sample diﬀerenttypes of target distributions. It is also easy to use and robust with respect to suboptimalcollective variables [14]. It has been implemented as a contributed module in the opensource library PLUMED [15, 16], and has been already used for various applications[17, 18, 19, 20, 21]. We believe OPES can be a handy tool for anyone interested inenhanced sampling and it also has the capability of supporting novel types of targetdistributions. ∗ ∗ ∗

MICHELE INVERNIZZI

The author thanks Luigi Bonati for carefully reading the manuscript. This researchwas supported by the NCCR MARVEL, funded by the Swiss National Science Founda-tion, and European Union Grant No. ERC-2014-AdG-670227/VARMET.

REFERENCES[1]

Bolhuis P. G. et al. , Annu. Rev. Phys. Chem. , (2002) 1[2] Skilling J. , Tech. Rep. , (2006) 4[3] Invernizzi M. and

Parrinello M. , J. Phys. Chem. Lett. , (2020) 7[4] Invernizzi M., Piaggi P. M. and

Parrinello M. , Phys. Rev. X , (2020) 4[5] Torrie G. M. and

Valleau J. P. , Chem. Phys. Lett. , (1974) 4[6] Laio A. and

Parrinello M. , P. Natl. Acad. Sci. , (2002) 20[7] Mezei M. , J. Comput. Phys. , (1987) 1[8] Barducci A., Bussi G. and

Parrinello M. , Phys. Rev. Lett. , (2008) 2[9] Marinari E. and

Parisi G. , Europhys. Lett. , (1992) 6[10] Sugita Y. and

Okamoto Y. , Chem. Phys. Lett. , (199) 1-2[11] Lyubartsev A. P. et al. , J. Chem. Phys. , (1992) 3[12] Sugita Y., Kitao A. and

Okamoto Y. , J. Chem. Phys. , (2000) 15[13] The PLUMED consortium ,. Nat. Methods , (2019) 8[14] Invernizzi M. and

Parrinello M. , J. Chem. Theory Comput. , (2019) 4[15] Tribello G. A. et al. , Comput. Phys. Commun. , (2014) 2[16] [17] Mandelli D., Hirshberg B. and

Parrinello M. , Phys. Rev. Lett. , (2020) 2[18] Bonati L., Rizzi V. and

Parrinello M. , J. Phys. Chem. Lett. , (2020) 8[19] Rizzi V. et al. , Nat. Commun. , (2021) 1[20] Piaggi P. M. et al. , arXiv preprint , (2021) arXiv:2101.04806[21] Karmakar T. et al. , arXiv preprintarXiv preprint