OPES: On-the-fly Probability Enhanced Sampling Method
aa r X i v : . [ phy s i c s . c o m p - ph ] J a n PREPRINT
OPES: On-the-fly Probability Enhanced Sampling Method
Michele Invernizzi ( )( )( ) ( ) Department of Physics, ETH Zurich - Lugano, Switzerland ( ) Institute of Computational Science, Universit`a Svizzera Italiana - Lugano, Switzerland ( ) Italian Institute of Technology - Genova, Italy
Summary. — Molecular simulations are playing an ever increasing role, findingapplications in fields as varied as physics, chemistry, biology and material science.However, many phenomena of interest take place on time scales that are out of reachof standard molecular simulations. This is known as the sampling problem and overthe years several enhanced sampling methods have been developed to mitigate thisissue. We propose a unified approach that puts on the same footing the two mostpopular families of enhanced sampling methods, and paves the way for novel com-bined approaches. The on-the-fly probability enhanced sampling method providesan efficient implementation of such generalized approach, while also focusing onsimplicity and robustness.
1. – Introduction
Despite the remarkable improvements over the last decades, both in computationalpower and in algorithms efficiency, molecular simulations are still limited in their scopeby the sampling problem. Many phenomena of interest, such as protein-ligand bindingor crystal nucleation, are rare events that take place on macroscopic time scales andthus would require an impractical amount of computation to be simulated using stan-dard molecular dynamics or Markov-chain Monte Carlo. To circumvent this problem,a plethora of enhanced sampling methods have been developed that aim at allowing asimulation to visit all the relevant metastable states, without being hindered by kineticbottlenecks. Apart from some remarkable exceptions [1, 2], all the most popular en-hanced sampling techniques can be roughly grouped in two main families, that we willrefer to as collective variables methods and expanded ensembles methods. We proposea unified perspective on enhanced sampling, that allows us to develop a general methodto perform both kinds of sampling in a robust and efficient way. This new perspectivemakes enhanced sampling more accessible and simpler to use and can also open up tonovel sampling strategies. The method we developed, called on-the-fly probability en-hanced sampling (OPES), is described in detail in Refs. [3] and [4]. Here we present itsmain features in a synthetic fashion. MICHELE INVERNIZZI
2. – Unified approach
The goal of enhanced sampling is to increase the probability of observing in a simu-lation certain rare events, and to do it in such a way that it is still possible to retrievestatistics about the original system. We call target distribution the modified probabilitydistribution that is sampled instead of the Boltzmann one. Rather than focusing on thevarious computational techniques used by the different enhanced sampling methods, wepropose to group them according to how they define the target distribution they aim tosample. Following this criteria we can identify two main families.A first family is the one of methods such as umbrella sampling [5] and metadynamics[6]. These methods make use of collective variables (CVs) or order parameters that aresmooth functions of the atomistic coordinates and encode the slow modes of the system.In this family, the target distribution is defined by requiring that its marginal probabilitydistribution over such CVs has a given functional form. Typically the marginal is chosento be a constant flat distribution, as in adaptive umbrella sampling [7], but other choicesare possible, such as the well-tempered distribution often used in metadynamics [8].A second family includes tempering methods, such as simulated tempering [9] andreplica exchange [10]. These methods define their target distribution as the combinationof slightly different versions of the original system, for example the same system but athigher temperatures. These target distribution are also known as generalized ensemblesor expanded ensembles [11].The OPES method can be used to sample either kind of target distributions. It doesso by adding to the potential energy of the system U ( x ) a bias potential V ( x ) such thatthe sampled distribution is not the equilibrium Boltzmann distribution, P ( x ) ∝ e − βU ( x ) ,but the target one, p tg ( x ). This bias potential is defined as(1) V ( x ) = − β log p tg ( x ) P ( x ) , where β is the inverse Boltzmann temperature. The bias potential is not known a priori ,but it is self-consistently learned during the simulation via an on-the-fly estimate ofthe probability distributions. Statistics of the unbiased system can be retrieved via areweighting procedure, by assuming that the bias is updated in an adiabatic way. Givenany observable O ( x ), its ensemble average h O i over the unbiased system can be estimatedvia ensemble averages over the sampled biased system:(2) h O i = h Oe βV i V h e βV i V . In this way also free energy differences and free energy surfaces can be estimated [3, 4].
3. – OPES for collective variables sampling
Given a set of collective variables s = s ( x ), one can define the marginal probability(3) P ( s ) = Z P ( x ) δ [ s ( x ) − s ] d x . The well-tempered ensemble with respect to these CVs is obtained by requiring thatthe marginal of the target distribution is p WT ( s ) ∝ [ P ( s )] /γ , where γ > PES: ON-THE-FLY PROBABILITY ENHANCED SAMPLING METHOD bias factor. Notice that the exact target distribution p tg ( x ) is not known but this doesnot constitute a problem. In fact, the core requirements are that the corresponding biaspotential can be expressed and that the target distribution is easy to sample, thus thekinetic bottlenecks between metastable states are removed. This is indeed guaranteedfor the well-tempered distribution, given that the CVs are chosen properly and the biasfactor is large enough. The case of uniform marginal target distribution can be seen asa special case of the well-tempered one, where γ = ∞ .When using OPES for CVs sampling we need to estimate P ( s ). To do so, we use aweighted kernel density estimation with an automatic kernel merging algorithm, that isexplained in detail in Ref. [3]. We also introduce a regularization term ǫ and a normal-ization Z over the explored CV-space. At step n the bias, Eq. (1), can be written as afunction of the CVs:(4) V n ( s ) = (1 − /γ ) 1 β log (cid:18) P n ( s ) Z n + ǫ (cid:19) , where P n ( s ) is the estimate of P ( s ) obtained via reweighting. Reference [3] presents thefull derivation of this expression.
4. – OPES for expanded ensembles sampling
To define the expanded ensemble target distribution, we first define a class of proba-bility distributions P λ ( x ) ∝ e − βU λ ( x ) , where λ can be a parameter (e.g. the temperature)or a set of parameters, and U is the original system potential. For simplicity we onlyconsider nonweighted expanded ensembles, as done in Ref. [4]. The expanded ensemblecontains a discrete set { λ } of N { λ } parameters such that the corresponding P λ ( x ) havean overlap in the configuration space. We can write the expanded target distribution as:(5) p { λ } ( x ) = 1 N { λ } X λ P λ ( x ) . One can then define the expansion collective variables as ∆ u λ ( x ) = βU λ ( x ) − βU ( x )and use them to write the bias potential at step n :(6) V n ( x ) = − β log N { λ } X λ e − ∆ u λ ( x )+ β ∆ F n ( λ ) ! , where ∆ F n ( λ ) are the estimates of the free energy differences between the unbiasedsystem U and the one at a given λ . These are obtained via on-the-fly reweighting,similarly to P n ( s ) in Sec. , but this time without the need for kernel density estimationas { λ } is a discrete set. The details of the derivation are explained in Ref. [4].Finally, we notice that often it is possible to rewrite Eq. (6) so that, similarly toEq. (4), the bias is a function of only a small number of CVs. For example, in case ofa multithermal expanded target distribution the bias can be expressed as a function ofthe potential energy only [4]. MICHELE INVERNIZZI (a) P r obab ili t y sampledreweighted 0 0.01 0.02 0.03 (a) W E LL − T E M PE R E D sampledreweighted 0 0.2 0.4 0.6 0.8 1 −3 −2 −1 0 1 2 3 (b) P r obab ili t y Angle f sampledreweighted 0 0.01 0.02 0.03 −50 0 50 100 150 200 (b) M U L T I − T H E R M A L Potential Energy Usampledreweighted
Fig. 1. – Marginal probabilities over the φ angle and the potential energy for OPES simulationsof alanine dipeptide with different target distributions. The Boltzmann distribution as obtainedvia reweighting is also shown. Top row: well-tempered distribution over φ , γ = 50. Bottomrow: temperature-expanded ensemble, from 300 to 1000 K.
5. – Example: alanine dipeptide
As an example we consider alanine dipeptide in vacuum at 300 K, which is a pro-totypical system for enhanced sampling. It presents two main metastable basins, thatcan be characterized using as CV the torsion angle φ . The most stable one containstwo minima and lays in the region where φ is negative, while the second basin has oneminimum at φ ≃
1. A standard unbiased simulation would suffer the sampling problemand almost never make transitions between the two basins. In Fig. 1 we show differenttarget distributions that can be used to study alanine, by plotting their marginal along φ and the potential energy. On the top, Fig. 1a, is the well-tempered ensemble over φ ( γ = 50), which allows the system to easily visit both basins, and greatly increasesthe probability of sampling intermediate configurations around φ = 0. On the bottom,Fig. 1b, is an expanded ensemble that combines four different temperatures, startingfrom 300 K up to 1000 K. At higher temperatures alanine explores configurations withhigher energy where the barrier between the two basins is smaller. Also in this case theprobability of visiting configurations around φ = 0 is increased, but less sensibly. On theother hand from this second simulation it is possible to retrieve statistics about a wholerange of temperatures, instead of 300 K only. The two target distributions presented areclearly different but we can see that, once reweighting is performed, one retrieves thesame underlying Boltzmann probability (dashed lines).It is well known that is possible to enhance the sampling along a CV also using anexpanded ensemble as target, e.g. by combining multiple umbrella sampling distributions[12]. In Fig. 2a one can see that a multiumbrella target over φ can look very different froma well-tempered target over the same CV, Fig. 1a. However, in both cases the system PES: ON-THE-FLY PROBABILITY ENHANCED SAMPLING METHOD (a) P r obab ili t y sampledreweighted 0 0.01 0.02 0.03 (a) M U L T I − U M B R E LL A sampledreweighted 0 0.2 0.4 0.6 0.8 1 −3 −2 −1 0 1 2 3 (b) P r obab ili t y Angle f sampledreweighted 0 0.01 0.02 0.03 −50 0 50 100 150 200 (b) M U L T I − T H E R M A L − U M B R E LL A Potential Energy Usampledreweighted
Fig. 2. – Marginal probabilities over the φ angle and the potential energy for OPES simulationsof alanine dipeptide with different expanded target distributions. The Boltzmann distributionas obtained via reweighting is also shown. Top row: multiumbrella distribution over φ , with43 evenly spaced umbrellas. Notice that it looks quite different from Fig. 1a, even though inthe limit of infinite umbrellas and γ = ∞ they would both sample a uniform distribution over φ . Bottom row: combined multithermal and multiumbrella distribution, with 43 umbrellas andtemperature range from 300 to 1000 K. easily transitions from one metastable basin to the other and the probability of being inthe transition region (around φ = 0) is greatly increased compared to the unbiased case.By increasing the number of umbrellas in the target we can eventually reach a uniformmarginal, similar to the well-tempered one with γ = ∞ . Finally, it is also possible tocombine different expansions in the same target, as shown in Fig. 2b, where the usedbias is a function both of the angle φ and of the potential energy U .All the simulation details, together with the inputs and the trajectories are availableonline on the PLUMED-NEST repository ( , plumID:21.006) [13].
6. – Conclusion
We briefly presented a target distribution perspective on enhanced sampling and theon-the-fly probability enhanced sampling method (OPES), that have been developed inRefs. [3, 4]. OPES is a general and flexible method that can be used to sample differenttypes of target distributions. It is also easy to use and robust with respect to suboptimalcollective variables [14]. It has been implemented as a contributed module in the opensource library PLUMED [15, 16], and has been already used for various applications[17, 18, 19, 20, 21]. We believe OPES can be a handy tool for anyone interested inenhanced sampling and it also has the capability of supporting novel types of targetdistributions. ∗ ∗ ∗
MICHELE INVERNIZZI
The author thanks Luigi Bonati for carefully reading the manuscript. This researchwas supported by the NCCR MARVEL, funded by the Swiss National Science Founda-tion, and European Union Grant No. ERC-2014-AdG-670227/VARMET.
REFERENCES[1]
Bolhuis P. G. et al. , Annu. Rev. Phys. Chem. , (2002) 1[2] Skilling J. , Tech. Rep. , (2006) 4[3] Invernizzi M. and
Parrinello M. , J. Phys. Chem. Lett. , (2020) 7[4] Invernizzi M., Piaggi P. M. and
Parrinello M. , Phys. Rev. X , (2020) 4[5] Torrie G. M. and
Valleau J. P. , Chem. Phys. Lett. , (1974) 4[6] Laio A. and
Parrinello M. , P. Natl. Acad. Sci. , (2002) 20[7] Mezei M. , J. Comput. Phys. , (1987) 1[8] Barducci A., Bussi G. and
Parrinello M. , Phys. Rev. Lett. , (2008) 2[9] Marinari E. and
Parisi G. , Europhys. Lett. , (1992) 6[10] Sugita Y. and
Okamoto Y. , Chem. Phys. Lett. , (199) 1-2[11] Lyubartsev A. P. et al. , J. Chem. Phys. , (1992) 3[12] Sugita Y., Kitao A. and
Okamoto Y. , J. Chem. Phys. , (2000) 15[13] The PLUMED consortium ,. Nat. Methods , (2019) 8[14] Invernizzi M. and
Parrinello M. , J. Chem. Theory Comput. , (2019) 4[15] Tribello G. A. et al. , Comput. Phys. Commun. , (2014) 2[16] [17] Mandelli D., Hirshberg B. and
Parrinello M. , Phys. Rev. Lett. , (2020) 2[18] Bonati L., Rizzi V. and
Parrinello M. , J. Phys. Chem. Lett. , (2020) 8[19] Rizzi V. et al. , Nat. Commun. , (2021) 1[20] Piaggi P. M. et al. , arXiv preprint , (2021) arXiv:2101.04806[21] Karmakar T. et al. , arXiv preprintarXiv preprint