Metadynamics with Discriminants: a Tool for Understanding Chemistry
MMetadynamics with Discriminants: a Tool forUnderstanding Chemistry
GiovanniMaria Piccini, † , ‡ Dan Mendels, † , ‡ and Michele Parrinello ∗ , † , ‡ † Department of Chemistry and Applied Biosciences, ETH Zurich, c/o USI Campus, ViaGiuseppe Buffi 13, CH-6900, Lugano, Ticino, Switzerland ‡ Facolt´a di Informatica, Instituto di Scienze Computationali, Universit’a della Svizzeraitaliana (USI), Via Giuseppe Buffi 13, CH-6900, Lugano, Ticino, Switzerland
E-mail: [email protected]
Abstract
We introduce an extension of a recently published method to obtain low-dimensionalcollective variables for studying multiple states free energy processes in chemical reac-tions. The only information needed is a collection of simple statistics of the equilibriumproperties of the reactants and product states. No information on the reaction mecha-nism has to be given. The method allows studying a large variety of chemical reactivityproblems including multiple reaction pathways, isomerization, stereo- and regiospeci-ficity. We applied the method to two fundamental organic chemical reactions. Firstwe study the SN2 nucleophilic substitution reaction of a Cl in CH2Cl2 leading to anunderstanding of the kinetic origin of the chirality inversion in such processes. Subse-quently, we tackle the problem of regioselectivity in the hydrobromination of propenerevealing that the nature of empirical observations such as the Markovinikov’s ruleslies in the chemical kinetics rather than the thermodynamic stability of the products. a r X i v : . [ phy s i c s . c o m p - ph ] S e p ntroduction Molecular dynamics (MD) is a powerful tool for studying several properties of differentchemical systems. However, chemistry is mostly interested in activated processes that bringsreactants to products, namely chemical reactions. Thermodynamically speaking, reactantsand products are free energy basins separated by a high barrier. In most cases, such barriersare associated to extremely long time scales that are incomparably larger than those reach-able in MD. For many years this has limited the application of MD to problems distant fromthe main interest of chemistry.Since the landmark paper of Torrie and Valleau in which umbrella sampling was intro-duced, a large variety of methods has been proposed to study rare events. One of the mostpopular and widely applied method is metadynamics . In this method a history dependentadaptive bias is recursively added to the underlying free energy of the system. Its effect isto enhance the fluctuations in the basins accelerating the transitions between reactants andproducts.Umbrella sampling and Metadynamics, like other methods, rely on the definition of someorder parameters, or collective variables (CVs), able to discriminate the thermodynamicsstates of interest in the free energy space. The CVs are expected to capture the slowestmotions of the system, i.e. those degrees of freedom whose sampling needs to be accelerated .More formally, a CV is a function s ( R ) of the microscopic atomic coordinates R . Over theyears a large variety of CVs has been proposed to enhance the sampling for different freeenergy processes ranging from nucleation problems to protein folding .From a simple point of view, chemical reaction can be seen as a drastic bonds rearrange-ment. Reactants and products can be discriminated by their bonding topology. After thesystem has reacted some bonds may have formed while others may have been broken. Agood CV should preferably combine all this information into the most simple mathematicalexpression. By means of chemical intuition one can select a set of simple bond distancesand combine them into one single linear combination. For simple cases this may be an easy2ask. However, if the reaction is somewhat complex an a priori evaluation of the coefficientsof the linear combination may lead to a very poor convergence or, even worse, to a wrongdescription of the physics. A further complication occurs if the reactant state can evolveinto different products or if the latter can isomerize into one another. In a recent publicationby our group we have shown that part of the chemical intuition process can be automa-tized by means of a method that we called harmonic linear discriminant anlysis (HLDA). Inthis work we extend this method to a multi class (MC) problem in order to treat multiplethermodynamic states simultaneously. We shall refer to this extension as MC-HLDA.The paper starts with an introduction to MC-HLDA. Two examples have been consideredto elucidate the power of the method in understanding fundamental chemical processes. Thefirst is the SN2 nucleophilic substitution of a chlorine atom in dichloromethane while thesecond is the electrophilic addition of hydrogen bromide the propene. Method
Even for rather small systems a chemical reaction may involve several slow degrees of freedomthat play an important role in determining the transition between reactants to products.However, enhanced sampling methods like metadynamics tend to converge slowly when alarge number of CVs is employed and the interpretation of the result may lack of sufficientclarity. Our goal is to pack large sets of descriptors into few linear combinations that canstill describe accurately the physics of the problem.We assume that we are dealing with a chemical reaction characterized by M states sep-arated by high free energy barriers, e.g. reactants that may evolve into different products,and that N d descriptors d ( R ) suffice to characterize and distinguish them like distances be-tween pairs of atoms involved in breaking and/or forming a chemical bond. Since transitionsbetween the M states are rare we can compute for each I -th state the expectation value µ I of the N d descriptors, a vector collecting the averages of each d ( R ) for state I , and3he associated multivariate variance Σ I , i.e. the covariance matrix. These quantities can bestraightforwardly evaluated in short unbiased runs. Our goal is to find an M − M sets of data along which their mutual overlap is minimal. Multi classlinear discriminant analysis aims at finding the directions along which the projected data areat best separated. To do this one needs a measure of their degree of separation. Following upour previous work we introduce the Fisher’s criterion. This is given by the ratio betweenthe so called “between class” S b and the “within class” S w scatter matrices transformed bythe rotation matrix W . The former is measured by the square of the distance between theprojected means W T S b W (1)with S b = M X I ( µ I − µ ) ( µ − µ I ) T (2)where µ I is the mean value of the I -th class and µ is the overall mean of the datasets, i.e. µ = 1 /M P MI µ I .The within scatter is the pooled covariance leading to the expression W T S w W (3)with S w = M X I Σ I (4)being Σ I the covariance matrix of state I . The Fishers object function then reads like aRayleigh ratio 4 ( W ) = W T S b WW T S w W . (5)We want W T S b W to be maximized subject to the condition W T S w W = 1, correspondingto Lagrangian, L LDA = W T S b W + λ (cid:0) W T S w W − (cid:1) . (6)This problem can be solved by means of a generalized eigenvalue solution of the form S b W = λ S w W . (7)Assuming that the inverse of S w exists the above equation can be reduced to the standardeigenvalue problem in the form of S − w S b W = λ W (8)The theory of linear discriminants tells us that for M classes we get M − N d space along which the distributions of the M states exhibit the least overlap. Thus, theycan be used as CVs in metadynamics or other CV based enhance sampling methods.As discussed in our previous work, from the rare event point it is more appropriate togive more weight to the state with the smaller fluctuations. In other words focus more onthose states that are more difficult to get into and escape from. For these reasons we haveproposed a different measure of the scatter and rather than using the arithmetic average webase the measure of the within scatter matrix on the harmonic average as follows: S w = 1 Σ A + Σ B + · · · + Σ M . (9)As discussed in our previous work, from a machine learning point of view the more compact5tates are better defined and thus have a larger weight in determining the discriminants.Experience has shown us that this second choice is far superior. Results and Discussion
Nucleophilic substitution of a Cl atom in CH2Cl2
In order to illustrate the power of MC-HLDA in the study of chemical reactions we startwith a simple SN2 nucleophilic substitution, namely the substitution of a chlorine atomin the compound CH2Cl2 by a Cl – anion. For this system reactants and products areformally identical and the statistics of the state descriptors in the free-energy basins ispermutationally equivalent. To enhance sampling between the three basins we employed thedistances between the chlorine atoms and the central carbon atom similarly to what donerecently by Pfaender et al. and by our group (see Fig. 1). Ab initio MD simulationsat 300 K were performed using the CP2K package patched with the PLUMED2 code.A time step of 0.5 fs was used. The PM6 Hamiltonian was used to calculate energies andgradients. To control the temperature the systems has been coupled to the velocity rescalingthermostat of Bussi et al. every 100 MD steps. Here, the thermostat aims at mimickingin a simple manner the effect of the environment as we do not intend to simulate a processin the gas phase for which a microcanonical simulation must be considered. Well-temperedMatadynamics has been used to enhance the sampling along the MC-HLDA CVs for a totalsimulation time of 3 ns. For further computational details we refer the interested reader tothe Supporting Information.These three descriptors represent a good choice in describing the slow motions of theatoms in going from reactants to products. More precisely, they are related to the slowestcollective oscillations, i.e. low vibrational frequencies, as they involve the dynamics of theheaviest atoms of the system. In order to calculate the three µ I and Σ I needed tosolve eq. 8 we just have to perform the calculation in one of the basins and use symmetry6igure 1: Carbon-chlorine distances used as descriptors for the SN2 reaction.to obtain the other two sets of data. Since this is a 3-class problem the dimensionalityreduction operated by MC-HLDA provides two linear combinations of descriptors. Hence,the CV space is two dimensional. Fig. 2 reports the free energy surface (FES) obtainedby enhancing the sampling along the two CVs by means of metadynamics. The variablesprovide an excellent discrimination of the states neatly reflecting the 3-fold symmetry of theproblem. Convergence of the FES with respect to simulation time and relative error barsestimated via bootstrap analysis can be found in the supporting information.To extract further information we calculated the free energy profile along the minimumfree energy path (MFEP). The latter is obtained by taking a series of points lyingalong a guess path and minimized according to the nudged elastic band algorithm (NEB). In principle such a MFEP is not required to go necessarily through the transition states.However, in the present case it does and the conformations extracted from the apparenttransition state do correspond to the classical back-side nucleophilic attack as depicted inblue in Fig. 2. This is a clear indication of the quality of the CVs. The estimated barrieralong this path is about 50 kJ / mol. The back-side attack is the most probable route in a7N2 reaction mechanism and it would be responsible for the chiral center inversion if anychirality were present. However, a closer inspection of the FES revealed the existence ofanother reaction pathway much higher in terms of free energy barriers. The MFEP analysisshows that along this possible, although much less probable path, the system must overcomea barrier in the order of 200 kJ / mol. Rather surprisingly, the analysis revealed the presence ofa high energy intermediate in which the nucleophile and the leaving group coordinate the Cl –atom bonded to the central carbon atom. In physical organic chemistry this mechanism isusually referred to as front-side attack and it is known experimentally and theoretically to be high in terms of energy barriers, thus, unlikely to happen. In fact, this mechanism isexpected to retain the chirality of the reactants a feature that is almost never observed inpure SN2 reactions. Therefore, it is clear that the nature of the SN2 chirality inversion isa consequence of the chemical kinetics as the barrier for the back-side attack mechanism ismuch lower than the front-side one.As a general and final remark in this section we would like to stress once more that noinformation on the route the system can take to go from reactants to products has beengiven as an input. All this wealth of information on the system was hidden in the simplestatistics that one collects from a short monitoring of the local fluctuations in the free energybasins. Electrophilic addition of HBr to propene
Another fundamental type of organic reaction is the electrophilic addition. One such processis the hydrobromination of propene. Here, a hydrogen bromide molecule is used to break thepropene double bond by adding the hydrogen atom and the bromine to the carbon atomsinvolved in the double bond. Such a reaction can give rise to two different isomers dependingon which carbon atoms the halide group will bind to (see Fig. 3).It is known that the halide atom prefers to bond to the most substituted carbon. Thisbecause the mechanisms starts with the abstraction of the acidic hydrogen by a carbon atom8igure 2: Upper panel: free energy surface for the nucloephilic substitution of a chlorineatom in dichloromethane showing the two possible reaction paths and associated relevantreference structures. Lower panel: minimum free energy paths along for the two possiblemechanisms of the reaction, namely the low-barrier back-side attack (blue line), and thehigh-barrier front-side attack (orange line).Figure 3: Reaction scheme for the hydrobromination of propene showing the two possibleisomers classified as Markovnikov and anti -Markovnikov products.9f the double bond implying the formation of a carbocation. Carbocations are much morestable if the carbon center is surrounded by other carbon groups rather than by hydrogenatoms. In the case of propene the central carbon atom has a hydrogen atom formallysubstituted by a methyl group. Therefore, this stabilizes the transient carbocation withrespect to the less substituted carbon in the dobule bond. The carbocation is formed in thetransition region and is unstable, thus, the preference of the bromine atom to sit in the mostsubstituted carbon has to be interpreted as a kinetic rather that a thermodynamic effect.These are to the so called Markovnikov rules and the associated reaction outcomes arereferred as Markovnikov and anti-Markovnikov products (see Fig. 3).To study this particular reaction we applied MC-HLDA to the three states associatedto reactants, Markovnikov, and anti-Markovnikov products states. We used as descriptorsthe five distances illustrated in Fig. 4. These CVs are both able to break and form thedesired bonds but at the same time they embody the necessary information to discriminatebetween Markovnikov and anti-Markovnikov products. Similarly to the previous case, abinitio MD simulations at 300 K were performed using the CP2K package patched withthe PLUMED2 code. A time step of 1.0 fs was used. The PM6 Hamiltonian was used tocalculate energies and gradients. To control the temperature the systems has been coupled tothe velocity rescaling thermostat of Bussi et al. every 100 MD steps. Again, the thermostathere aims at mimicking in a simple manner the effect of the environment. Well-temperedMatadynamics using 3 multiple-walkers has been used to enhance the sampling along theMC-HLDA CVs for a total simulation time of 2.5 ns. For further computational details werefer the interested reader to the Supporting Information.Fig. 5 reports in the upper panel the FES of the hydrobromination reaction. Convergenceof the FES with respect to simulation time and relative error bars estimated via bootstrapanalysis can be found in the supporting information. The lower elongated minimum isassociated to the reactants state. This state can evolve into the Markovnikov and anti-Markovnikov product states by crossing two different barriers. The lower panel of Fig. 510igure 4: Fundamental distances used as descriptors for the hydrobromination of propene.shows the free energy profile along the MFEP. Two things are worth noticing in this plot.Firstly, the difference in thermodynamic stability of the Markovnikov and anti-Markovnikovproducts is almost negligible. Secondly, the barriers separating the states differ from eachother of about 80 kJ / mol. Recalling that the transition probability for a system to transitfrom one metastable state to another is proportional to the exponential of the free energybarrier it is clear that the nature of the Markovnikov’s regioselectivity is almost purely kineticas deduced from empirical observations.To further support our conclusion we compare the barriers obtained with metadynam-ics with the ones calculated by means of a NEB optimization of 60 images for both theMarkovnikov and anti-Markovnikov route at 0 K on the potential energy surface. The graydashed dotted line in the lower panel of Fig. 5 represents the energy of the images optimizedby means of the NEB algorithm. It is clear that metadynamics is able to reproduce prettywell the energy landscape (see also Ref. ). One must keep in mind that metadynamics workson the free energy surface at finite temperature whereas the NEB optimization is applied onthe potential energy surface without any account for temperature effects. This fact is the11rigin of the slight differences in terms of barrier heights between the two methods.The large advantage of metadynamics is that all the entropic effects, even the anharmonicones, are automatically included since the simulation is performed at finite temperature.Moreover, NEB optimization may be a rather complicated method if the complexity ofthe reaction rises with an increasing number of important degrees of freedom. This happenswhen the interpolated images between reactants and products lie far from the ideal minimumenergy path resulting in a collection of unphysical configurations. Again, our method doesnot imply the knowledge of what lies in between reactants and products, as this will beexplored automatically by metadynamics directly on the free energy surface, but it rathersuggests the proper direction to follow in order to connect them. Conclusions
The method presented in this paper has the potential to become a routine tool in studyingchemical reactions. Its power lies in the combination of a biasing method like Metadynam-ics with a statistical classification method like harmonic linear discriminant analysis. Thefirst allows to enhance the fluctuations within the free energy basins in order to favour thetransition to other states while the second drives the reaction along the favoured directiontowards the desired products. Moreover, MC-HLDA needs no more information than theone that can be collected from a short unbiased run in the free energy basins of interest.This fact is fundamental because no assumptions on the possible mechanisms are needed butmore importantly no specific reaction pathway is preferentially chosen in order to go fromthe reactants to the different products. This approach gives a very clear physical pictureof the free energy landscape on which the process takes place. The accuracy of the derivedCVs is such that the estimated barriers along the apparent transition states are close tothose obtained by means of standard quantum chemical approaches such as NEB on thepotential energy surface at 0 K. The analysis of the possible reaction paths and relative ap-12igure 5: Upper panel: free energy surface for the hydrobromination of propene showing thetwo possible reaction paths and associated reactants and products structures. Lower panel:minimum free energy paths along for the two possible mechanisms of the reaction, namelythe high-barrier anti -Markovnikov’s route (orange line), and the low-barrier Markovnikov’sroute (blue line). 13arent barriers has allowed us to understand from a free energy perspective the nature of thechirality inversion in the SN2 nucleophilic substitution reactions and the kinetic groundingsof Markovikov’s rules in electrophilic additions. The method will be soon available as partof the PLUMED2 program. We are confident that this method may find a large range ofapplications, from molecular biology/medicine to heterogeneous and homogeneous catalysis. Acknowledgement
This research was supported by the European Union Grant No. ERC-2014-AdG-670227/VARMET.Calculations were carried out on the M¨onch cluster at the Swiss National SupercomputingCenter (CSCS).
References (1) Mendels, D.; Piccini, G.; Parrinello, M.
J. Phys. Chem. Lett. 0 , 2776–2781.(2) Torrie, G. M.; Valleau, J. P.
J. Comput. Phys. , , 187–199.(3) Laio, A.; Parrinello, M. Proc. Natl. Acad. Sci. U.S.A. , , 12562–12566.(4) Barducci, A.; Bussi, G.; Parrinello, M. Phys. Rev. Lett. , .(5) Valsson, O.; Tiwary, P.; Parrinello, M. Annu. Rev. Phys. Chem. , , 159–184.(6) Piaggi, P. M.; Parrinello, M. J. Chem. Phys. , .(7) Shaffer, P.; Valsson, O.; Parrinello, M. Proc. Natl. Acad. Sci. U.S.A. , , 1150–1155.(8) Fisher, R. A. Ann. Eugen. , , 179–188.(9) Fleming, K. L.; Tiwary, P.; Pfaendtner, J. J. Phys. Chem. A , , 299–305.1410) Piccini, G.; McCarty, J. J.; Valsson, O.; Parrinello, M. J. Phys. Chem. Lett. , ,580–583.(11) Hutter, J.; Iannuzzi, M.; Schiffmann, F.; VandeVondele, J. WIREs Comput Mol Sci , , 15–25.(12) Tribello, G. A.; Bonomi, M.; Branduardi, D.; Camilloni, C.; Bussi, G. Comput. Phys.Commun. , , 604–613.(13) Bussi, G.; Donadio, D.; Parrinello, M. J. Chem. Phys. , , 14101.(14) Vande Linde, S. R.; Hase, W. L. J. Chem. Phys. , , 7962–7980.(15) Tucker, S. C.; Truhlar, D. G. J. Phys. Chem. , , 8138–8142.(16) Branduardi, D.; Faraldo-G´omez, J. D. J. Chem. Theory Comput. , , 4140–4154.(17) Ensing, B.; Laio, A.; Gervasio, F. L.; Parrinello, M.; Klein, M. L. J. Am. Chem. Soc. , , 9492–9493.(18) Jonsson, H.; Mills, G.; Jacobsen, K. W. Class. Quantum Dyn. Condens. Phase Simu-lations ; 1998; pp 385–404.(19) Dougherty, R. C.; Dalton, J.; David roberts, J.
Org. Mass Spectrom. , , 77–79.(20) Schlegel, H. B.; Mislow, K.; Bernardi, F.; Bottoni, A. Theor. Chim. Acta , ,245–256.(21) Szab´o, I.; Czak´o, G. Nat. Commun. , .(22) Xie, J.; Hase, W. L. Science (80-. ). , , 32–33.(23) Sæthre, L. J.; Thomas, T. D.; Svensson, S. J. Chem. Soc. Perkin Trans. 2 ,749–756. 1524) Aizman, A.; Contreras, R.; Galv´an, M.; Cedillo, A.; Santos, J. C.; Chamorro, E.
J.Phys. Chem. A , , 7844–7849.(25) Yang, Z.; Ding, Y.; Zhao, D. ChemPhysChem , , 2379–2389.(26) Franzen, S.; Cochran, K. H.; Weng, J.; Bartolotti, L.; Delley, B. Chem. Phys. , , 46–54.(27) Tribello, G. A.; Bonomi, M.; Branduardi, D.; Camilloni, C.; Bussi, G. Comput. Phys.Commun. , , 604–613. 16 he graphics shows two possible reaction pathways in a SN2 reaction that can be explored by meansof multiclass linear discriminant analysis metadynamics. upporting InformationMetadynamics with Discriminants: a Tool forUnderstanding Chemistry GiovanniMaria Piccini Dan Mendels Michele Parrinello a r X i v : . [ phy s i c s . c o m p - ph ] S e p Computational details
Nucleophilic substitution of a Cl atom in CH2Cl2
Electronic structure calculation: • CP2K code[3] • PM6 Hamiltonian • SCF convergence threshold 10E-6 hartree • × ×
10 ˚A box • Total charge − • CP2K code[3] • Temperature 300K • NVT Bussi’s velocity rescaling thermostat[2] • Timestep 0.5 fsMC-HLDA details: • Reference run 5 ps • − Cl distances, see Fig. 1 main text • s HLDA coefficients = 0.554,-0.811,0.188 • s HLDA coefficients = 0.602,0.237,-0.762MC-HLDA metadynamics calculation: • Plumed2[4] code patched with CP2K • Single walker • Temperature 300K • Well-Tempered MetaD[1] with γ = 25 • / mol Gaussian height • C 5.09459 4.91798 4.93038Cl 5.00420 4.99636 2.06235Cl 3.82940 6.01611 4.33492H 4.89620 3.89061 4.56146H 6.08372 5.25871 4.56148Cl 5.09189 4.92023 6.86383 lectrophilic addition of HBr to propene
Electronic structure calculation: • CP2K code[3] • PM6 Hamiltonian • SCF convergence threshold 10E-7 hartree • × ×
10 ˚A boxMolecular dynamics calculation: • CP2K code[3] • Temperature 300K • NVT Bussi’s velocity rescaling thermostat[2] • Timestep 1.0 fsMC-HLDA details: • Reference run 85 ps • • s HLDA coefficients = -0.523,0.726,0.401,-0.029,-0.193 • s HLDA coefficients = 0.573,-0.339,-0.183,-0.636,-0.345MC-HLDA metadynamics calculation: • Plumed2[4] code patched with CP2K • • • Temperature 300K • Well-Tempered MetaD[1] with γ = 505 / mol Gaussian height • C -1.777893 0.103815 0.254504C -0.440198 0.405579 0.380877H -1.946432 -0.713641 -0.163503H -2.556681 0.529777 0.591753C 0.790177 -0.266775 -0.193057H -0.126084 1.089115 1.184146H 0.556427 -1.152008 -0.809697H 1.517651 -0.609212 0.509388H 1.274498 0.412571 -0.908895Br -1.131277 2.976166 -0.082215H -2.104167 3.470361 0.850613
EB calculations at 0 K
NEB details: • CP2K code •
60 images per reaction path (Markovnikov and anti-Markovnikov) • Algorithm Climbing Image NEB • K spring = 0 . • Optimization algorithm: DIISFig. 1 reportes the total energy profiles of the NEB calculations. In Fig. 5 of themain text the NEB images distance has been adjusted and rescaled in order to matchthe position of the metadynamics peaks and shifted to the reactants energy. Thisdoes not imply any loss of generality as long as what we are comparing are the barrierheights. 8igure 1: NEB energy profiles for the independent Markovnikov and anti-Markovnikov reaction paths. 9
C-HLDA code availability
The MC-HLDA code providing the coefficients of the linear combination has beenfirst developed using Python by GM. Piccini. A development version to be used inPLUMED2 has been recently developed by Dr. Y. I. Yang and is available at thefollowing URL: https://github.com/helloyesterday/plumed2 HLDA
We kindly acknowledge Dr. Y. I. Yang for implementing this first version of thecode. 10 eferences [1] Alessandro Barducci, Giovanni Bussi, and Michele Parrinello. Well-temperedmetadynamics: A smoothly converging and tunable free-energy method.
Phys.Rev. Lett. , 100(2), 2008.[2] Giovanni Bussi, Davide Donadio, and Michele Parrinello. Canonical samplingthrough velocity rescaling.
J. Chem. Phys. , 126(1):14101, 2007.[3] J¨urg Hutter, Marcella Iannuzzi, Florian Schiffmann, and Joost VandeVondele.cp2k: atomistic simulations of condensed matter systems.
WIREs Comput MolSci , 4(1):15–25, 2014.[4] Gareth A Tribello, Massimiliano Bonomi, Davide Branduardi, Carlo Camilloni,and Giovanni Bussi. PLUMED 2: New feathers for an old bird.