Training collective variables for enhanced sampling via neural networks based discriminant analysis
aa r X i v : . [ phy s i c s . c o m p - ph ] J a n PREPRINT
Training collective variables for enhanced samplingvia neural networks based discriminant analysis
Luigi Bonati ( )( )( ) ( ) Department of Physics, ETH Zurich, Switzerland ( ) Institute of Computational Science, Universit´a della Svizzera italiana Lugano, Switzerland ( ) Atomistic Simulations, Italian Institute of Technology, Genova, Italy
Summary. — A popular way to accelerate the sampling of rare events in molec-ular dynamics simulations is to introduce a potential that increases the fluctua-tions of selected collective variables. For this strategy to be successful, it is criticalto choose appropriate variables. Here we review some recent developments in thedata-driven design of collective variables, with a focus on the combination of Fisher’sdiscriminant analysis and neural networks. This approach allows to compress thefluctuations of metastable states into a low-dimensional representation. We illus-trate through several examples the effectiveness of this method in accelerating thesampling, while also identifying the physical descriptors that undergo the most sig-nificant changes in the process.
1. – Introduction: rare events in molecular dynamics
Atomistic simulations [1] play an important role in many fields of science. One ofthe prominent techniques is Molecular Dynamics (MD), where the time evolution ofthe system is obtained by integrating Newton’s equation of motion. However, its directapplication is hindered by the fact that many systems present high free energy barriersbetween metastable states, and consequently transitions occur on timescales too long tobe simulated [2]. This is why such processes are called rare events. A variety of importantphenomena fall into this category, including phase transitions and chemical reactions, aswell as biological processes such as protein folding and ligand binding.
2. – Accelerating the sampling with collective variables
Several enhanced sampling methods have been devised to extend the scope of MDsimulations. A large family of these techniques relies on the identification of a smallset of order parameters s , called collective variables (CVs), which are functions of theatomic positions R [2]. The potential energy surface U ( R ) is modified by the additionof an external bias potential V ( s ( R )) which enhances the fluctuations of the selected LUIGI BONATI variables by lowering the energetic barriers. This increases the sampling of the transitionregion, in turn accelerating the occurrence of rare events.A popular method is metadynamics [3], in which the bias potential V ( s ) is expressedas a sum of repulsive Gaussians that are deposited at the points visited in the CVsspace. In this way, the system is discouraged from re-exploring configurations that havealready been visited and can escape local minima. A recent evolution of metadynamics,called on-the-fly probability enhanced sampling (OPES), builds the bias potential froman on-the-fly estimate of the probability distribution P ( s ) [4].
3. – Data-driven identification of collective variables
For these methods to be successful, the collective variables must be related to the slowrelaxation modes of the system or, in other terms, to the transition pathways connectinglocal minima. Since this information is usually not known beforehand, the quest forsuch variables often becomes a chicken-and-egg problem. If we have ergodic simulationswe can extract appropriate CVs based on free energy paths [5] or slowest relaxationmodes [6], but for simulations to be ergodic we typically need knowledge of good CVs.Here we look at the problem from a different perspective, and review some recentdevelopments which exploits machine learning (ML) techniques to design CVs usinginformation limited to the metastable states. ML methods have risen to prominence inrecent years, and there is no shortage of applications to atomistic simulations [7, 8, 9] andenhanced sampling [10, 11, 12]. Both supervised [13] and unsupervised methods [14, 15]have been applied to the search of data-driven CVs. We discuss here in detail the ap-proaches that exploit Fisher’s linear discriminant analysis [16, 17], and in particular re-cent advances in which this statistical method has been coupled with neural networks [18]. .1. Linear Discriminant Analysis . – Linear Discriminant Analysis (LDA) [19] is asupervised method, first developed by Fisher, which is routinely used to perform classifi-cation and dimensionality reduction tasks. In summary, LDA seeks for a linear projectionof the input features into a lower-dimensional space such that the classes are maximallyseparated. We illustrate here for simplicity the case of two classes, i.e. two metastablestates A and B . First, we choose a set of descriptors d that can be, for instance, dis-tances, angles, dihedrals or coordination numbers. We run short MD simulations in basinA and B and compute the values of such descriptors. We use the average values µ A , µ B and the variance matrices Σ A , Σ B of the descriptors to characterize the basins anddefine the within-scatter matrix as S w = S A + S B and the between-scatter matrix as S b = ( µ A − µ B )( µ A − µ B ) T .Fisher’s criterion states that the linear combination ¯ w that optimally separates classesis the one for which the ratio of the two scattering matrices in the projected space ismaximal:(1) ¯ w = arg max w w T S b ww T S w w From a practical perspective, we can recast the above task as a generalized eigenvalueproblem: S b ¯ w = u S w ¯ w , where the eigenvalue u measures the degree of separation of thetwo states along the corresponding eigenvector ¯ w . Once ¯ w is found, the one-dimensionalprojection s = ¯ w T d can be used as CV. A modified version of this procedure has beenproposed to better describe the fluctuations of metastable states, which replaces thearithmetic mean in the definition of S w with an harmonic one [16]. RAINING COLLECTIVE VARIABLES FOR ENHANCED SAMPLING Fig. 1. – Deep-LDA CV scheme. The NN performs a nonlinear trasformation of the inputdescriptors into to a compressed latent space, where LDA is applied. .2. Neural network based discriminant analysis . – While LDA provides a robustmethod for finding low-dimensional representations, it suffers from several limitations,the most important being that it can only provide a linear combination of the givendescriptors. This precludes the application of the LDA CV to complex systems, whereseveral descriptors need to be combined in a nonlinear way to fully capture the dynamicsof the process. For this reason, we proposed to extend it with neural networks (NNs),similarly to what has been done in the context of computer science for classification [20].This is achieved by using a NN to perform a nonlinear featurization of the input descrip-tors, that is optimized using Fisher’s linear discriminant as the objective function. Inthe following we refer to this method as Deep-LDA.As shown in Fig. 1, the descriptors are fed into a NN that compresses the informationinto a lower-dimensional space. LDA is then performed in this latent space, and the
Fig. 2. – Comparison of LDA (left) and Deep-LDA (right) for a 2D Gaussian toy model. Thepoints represent the two states, while the lines correspond to the CV isolines. In the case ofLDA, the CV projection, orthogonal to the isolines, is a straight line, while for Deep-LDA itconnects the fluctuations of the two states in a nonlinear fashion. In the insets we report thedensity distribution along the CVs, to highlight the different discriminative power.
LUIGI BONATI eigenvalue u of the generalized eigenvalue equation is used as loss function to maximizethe discriminative power of the NN. By applying Cholesky decomposition we can rewritethe generalized eigenvalue equation into a standard one. This allows to train the NNvia gradient descent using the machine learning library PyTorch [21]. Furthermore, afew regularizations are proposed to make the optimization more stable and to providebounds to the objective function [18].In fig. 2 we exemplify the behavior of LDA and Deep-LDA in the simple case of two-dimensional Gaussians. We observe that, due to the nonlinear transformation performedby the NN, the Deep-LDA CV discriminates the two states better than its linear version.
4. –
Intermezzo : how to train and apply the Deep-LDA CV
Before moving on to applications, we summarize the main steps of this method froma practical viewpoint. Once a set of descriptors has been chosen, the first step is tosimulate short MD trajectories in the metastable states and compute the descriptors’values. The NN can be trained on these datasets using the Pytorch library, followingthe tutorial on Github [27]. Finally, the CV can be used along with enhanced samplingmethods through the open-source library PLUMED[22], taking advantage of an interfacewe developed to import Pytorch models. Codes and input files are available also onthe PLUMED-NEST [23] repository with plumID:20.004 . Below is an example of aPLUMED input file, which shows the ease of use of this method. ---------------------
PLUMED INPUT ---------------------COMPUTE DESCRIPTORSd1: DISTANCE ATOMS=1,2 ...dN: DISTANCE ATOMS=17,19LOAD DEEP-LDA CVdeep: PYTORCH_MODEL MODEL=model.pt ARG=d1,...,dNAPPLY BIAS POTENTIALopes: OPES_METAD ARG=deep.node-0 PACE=500 BARRIER=40--------------------------------------------------------
5. – Applications5 .1.
Enhancing the sampling with Deep-LDA . – As an application of the Deep-LDACV, we examine the chemical reaction between vinyl alcohol and formaldehyde which hasbeen studied in ref. [18]. Since the free energy barrier between reactants and products ison the order of 170 kJ/mol (68 k B T at room temperature), it is impractical to observeany transition in a MD simulation without resorting to enhanced sampling methods.The set of descriptors consists of all 40 interatomic distances, whose values are mea-sured in short simulations (5 ps) of the reactants and products. The Deep-LDA CV istrained on these datasets, using a NN with 3 hidden layers and {
24, 12, 4 } nodes perlayer. Further computational details can be found in ref. [18]. The CV is used to accel-erate sampling in combination with the OPES method, which leads to the observationof several transitions between the two states. This allows the free energy profile of thereaction to be computed with great accuracy (Fig. 3a). This sampling efficiency impliesthat the CV, even if based only on the configurations of the metastable states, is able tocapture the concerted reaction mechanism. RAINING COLLECTIVE VARIABLES FOR ENHANCED SAMPLING Fig. 3. – (a) Free energy as a function of the Deep-LDA CV, obtained from the OPES simulationvia a reweighting procedure. The error, calculated with a block average, is about 1 kJ/mol.Snapshots of the reactants (left) and product (right) are also displayed. (b) Contributionsof the descriptors to the CV. They are computed as the sum of the modulus of the weightparameters between the input and the first hidden layer, multiplied by the standard deviationof the inputs. The weights are normalized so that their sum equals one. The three dominantdescriptors are colored in black, and the corresponding distances are reported in the box. .2. Extracting physical insights from the CVs . – A notable feature of these data-drivenCVs is their ability to highlight the most important descriptors, thereby deepening thephysical understanding of the process. We illustrate this property with two examples.The first is the chemical reaction described above: in Fig. 3b the descriptors areranked according to their contribution to the CV, from which we can recognize threedominant distances. This fact is in accordance with the chemical changes that the systemundergoes during the reaction, in which a bond is formed between the two carbon atomsand, simultaneously, a proton transfer occurs between the oxygen atoms.The second example is the study of the binding between a calixarene host and a setof ligands, reported in [24]. The Deep-LDA CV is used to include the effect of water inthe study of the binding process, which is otherwise typically modeled using informationlimited to the ligand and the host. The analysis of the descriptors’ weights allows tounderstand the role of water in the binding and unbinding events, and the inclusion ofthis contribution in the CV set results in a very efficient sampling of the process [24].
6. – Conclusions
The combination of Fisher’s linear discriminant and neural networks allows the designof data-driven collective variables using information limited to local minima. This ap-proach has proven effective in accelerating the sampling of different types of rare eventsand identifying the most relevant atomic descriptors. It represents a promising tool forstudying complex systems, and can be used as an initial guess for methods designed toextract CVs from enhanced sampling simulations [25, 26]. ∗ ∗ ∗
The author thanks Dr. Michele Invernizzi for carefully reading the manuscript. Theresearch was supported by the NCCR MARVEL, funded by the Swiss National ScienceFoundation, and European Union Grant No. ERC-2014-AdG-670227/VARMET.
LUIGI BONATI
REFERENCES[1]
Frenkel D. and
Smit B. , Understanding molecular simulation , (Elsevier), 2001.[2]
Valsson O., Tiwary P. and
Parrinello M. , Annu. Rev. Phys. Chem. , (2016)[3] Laio A., and
Parrinello M. , Proc. Natl. Acad. Sci. U. S. A. , (2002) 20[4] Invernizzi M. and
Parrinello M. , J. Phys. Chem. Lett. , (2020) 7[5] Branduardi D., Gervasio F. L. and
Parrinello M. , J. Chem. Phys , (2007) 5[6] P´erez-Hern´andez G. et al. , J. Chem. Phys , (2013)[7] Wang S. and
Riniker S. , Artificial Intelligence in Drug Discovery , (2020) 184[8] No´e F. et al. , Annu. Rev. Phys. Chem. , (2020)[9] Bonati L. and
Parrinello M. , Phys. Rev. Lett. , (2018) 26[10] Sidky H., Chen W., and
Ferguson A. L. , Mol. Phys. , (2020) 5[11] Wang Y., Ribeiro J. M. L. and
Tiwary, P. , Curr. Opin. Struct. Biol. , (2020)[12] Bonati L., Zhang Y. and
Parrinello M. , Proc.Natl.Acad.Sci.U.S.A. , (2019) 36[13] Sultan M. M. and
Pande V. S. , J. Chem. Phys , (2018) 9[14] Chen W. and
Ferguson A. L. , J. Comput. Chem , (2018) 25[15] Sch¨oberl M., Zabaras N., and
Koutsourelakis P.-S. , J. Chem. Phys , (2019) 2[16] Mendels D., Piccini G. and
Parrinello M. , J. Phys. Chem. Lett. , (2018) 11[17] Piccini G., Mendels D. and
Parrinello M. , J. Chem. Theory Comput , (2018) 10[18] Bonati L., Rizzi V. and
Parrinello M. , J. Phys. Chem. Lett. , (2020) 8[19] Welling M. , Dep. Comput. Sci. Univ. Toronto. (2005)[20]
Dorfer M., Kelz R. and
Widmer, G. , 4th Int. Conf. Learn. Represent. ICLR. (2016)[21]
Paszke, A. et al. , Adv. Neural. Inf. Process. Syst. (2019)[22]
Tribello, G. A. et al. , Comput. Phys. Commun , (2014) 2[23] The PLUMED consortium , Nat. Methods , (2019) 8, [24] Rizzi V., Bonati L., Ansari N. and
Parrinello M. , Nat. Commun. , (2021) 1[25] McCarty J. and
Parrinello M. , J. Chem. Phys , (2017) 20[26] Sultan M. M. and
Pande V. S. , J. Chem. Theory Comput. , (2017) 6[27](2017) 6[27]