[PDF] Exact Topology of Dynamic Probability Surface of an Activated Process by Persistent Homology

Abstract

To gain insight into reaction mechanism of activated processes, we introduce an exact approach for quantifying the topology of high-dimensional probability surfaces of the underlying dynamic processes. Instead of Morse indexes, we study the homology groups of a sequence of superlevel sets of the probability surface over high-dimensional configuration spaces using persistent homology. For alanine-dipeptide isomerization, a prototype of activated processes, we identify locations of probability peaks and connecting-ridges, along with measures of their global prominence. Instead of a saddle-point, the transition state ensemble (TSE) of conformations are at the most prominent probability peak after reactants/products, when proper reaction coordinates are included. Intuition-based models, even those exhibiting a double-well, fail to capture the dynamics of the activated process. Peak occurrence, prominence, and locations can be distorted upon subspace projection. While principal component analysis account for conformational variance, it inflates the complexity of the surface topology and destroy dynamic properties of the topological features. In contrast, TSE emerges naturally as the most prominent peak beyond the reactant/product basins, when projected to a subspace of minimum dimension containing the reaction coordinates. Our approach is general and can be applied to investigate the topology of high-dimensional probability surfaces of other activated process.

Full PDF

EExact Topology of Dynamic ProbabilitySurface of an Activated Process by PersistentHomology

Farid Manuchehrfar , Huiyu Li , Wei Tian , Ao Ma ∗ , and Jie Liang ∗ February 9, 2021Center for Bioinformatics and Quantiative Biology and Department of Bioengneering, Uni-versity of Illinois at Chicago, Chicago, IL 60607. These authors contributed equally, ∗ Corresponding authors.1 a r X i v : . [ m a t h - ph ] F e b bstract To gain insight into reaction mechanism of activated processes, we introduce anexact approach for quantifying the topology of high-dimensional probability surfacesof the underlying dynamic processes. Instead of Morse indexes, we study the homologygroups of a sequence of superlevel sets of the probability surface over high-dimensionalconﬁguration spaces using persistent homology. For alanine-dipeptide isomerization,a prototype of activated processes, we identify locations of probability peaks andconnecting-ridges, along with measures of their global prominence. Instead of a saddle-point, the transition state ensemble (TSE) of conformations are at the most prominentprobability peak after reactants/products, when proper reaction coordinates are in-cluded. Intuition-based models, even those exhibiting a double-well, fail to capture thedynamics of the activated process. Peak occurrence, prominence, and locations can bedistorted upon subspace projection. While principal component analysis account forconformational variance, it inﬂates the complexity of the surface topology and destroydynamic properties of the topological features. In contrast, TSE emerges naturally asthe most prominent peak beyond the reactant/product basins, when projected to asubspace of minimum dimension containing the reaction coordinates. Our approach isgeneral and can be applied to investigate the topology of high-dimensional probabilitysurfaces of other activated process.

Keywords— surface topology; landscape analysis; persistent homology; active process; tran-sition state; energy ﬂow Introduction

Activated processes are ubiquitous in molecular systems, ranging from chemical reactions of smallmolecules to dynamic conformational changes and enzymatic reactions of proteins. In proteins, allfunctionally important processes are activated processes, which provide well-deﬁned rates essentialfor proteins to carry out their roles in the cellular context, as proper timing is required for properfunction.The prevalent picture describing an activated process is that of a transition between two meta-stable basins on the free energy landscape separated by a barrier, whose height is large comparedto thermal energy [1]. The slow time scale of activation arises from the fact that the molecularsystem can rarely accumulate suﬃcient energy in the relevant degrees of freedom (DoFs) to surpassthe transition barrier. This simple and elegant picture originates from reaction rate theories, suchas the well-known transition state theory and Kramer’s theory [1–7] developed in studies of thedynamics of chemical reactions of small molecules.A key concept in reaction rate theories is that of reaction coordinates: a few special coordinatesexists that can fully determine the progress of a reaction process [8–10]. A requirement for reactioncoordinates is that they must accurately locate the transition barrier. Accordingly, the numerousdegree of freedoms (DoFs) in a complex molecular system ( e.g. , a protein molecule, a system ofsolute and solvent) can be divided into reaction coordinates and heat bath. Reaction coordinatesplay central roles as they determine both the mechanism and the rate of activation. For example, tomodify the activity of an enzyme, one should modify residues involved in the reaction coordinatesof the enzyme activities [11, 12], as this will modify both the reaction pathway and the barrierheight for activation. In contrast, modifying residues that belong to the heat bath will not alterthe enzymatic activity, as the role of the heat bath is to provide energy to the reaction coordinatesto cross the activation barrier during rare ﬂuctuations, which is largely a non-speciﬁc process.Given such signiﬁcance, it is important to develop a rigorous and quantitative criterion fordetermining the correct reaction coordinates. This task was accomplished with the development ofthe procedure of committor test , which is characterized by the committor value p B [8, 9, 13, 14]: the robability that a dynamic trajectory initiated from a given conﬁguration to reach the product basinbefore visiting the reactant basin. By deﬁnition, the reactant and product states have committorvalues of 0 and 1, respectively, whereas the optimal transition state coincides with p B = 0 . p B therefore provides a rigorous parameterization of the reaction process.Thus, the intuitive, albeit qualitative, notion of reaction coordinates translates into a rigorousdeﬁnition of the few coordinates that are suﬃcient for determining the committor value of anygiven conﬁguration. Du et. al ﬁrst adopted this rigorous deﬁnition of reaction coordinates in thecontext of protein folding [9]; Chandler and co-workers established its usage as a standard practicein the general context of activated processes [8].While this rigorous criterion has been well accepted, identifying the correct reaction coordi-nates turns out to be rather diﬃcult, even for systems of modest complexity. One example is the C eq → C az isomerization reaction of the alanine dipeptide in vacuum, a prototype for studyingbiomolecular conformational transitions. Alanine dipeptide is the smallest molecule that satisﬁesthe criterion that distinguishes complex molecules from small molecules: the non-reaction coordi-nates in the system constitute a large enough thermal bath to provide the reaction coordinates withadequate energy to cross the activation barrier. As a result, the C eq → C az transition displaysfeatures of activated dynamics that are unique to complex molecules but absent in small molecules.It was ﬁrst found by Bolhuis et. al [15] that the conventional Ramachandran torsional angles φ and ψ , while suﬃcient for distinguishing the two stable basins, are inadequate for locating the transi-tion state. Instead, another torsional angle θ was found to be an essential reaction coordinate––arather counter-intuitive ﬁnding [16,17]. The counter-intuitive nature of reaction coordinates turnedout to be more often the norm than exception in complex systems [15, 18, 19], posing a formidablechallenge, as sans intuition, there appears no guidance in sight.The challenge in identifying reaction coordinates had motivated eﬀorts in developing rigorousmethods for their identiﬁcation in complex systems since the early 2000’s [8,9,15,18,20,21]. Beyondunreliable intuition and trial-and-error, the ﬁrst systematic method was that of machine-learning,in which a neural network was used to automatically identify the optimal reaction coordinatesfrom a prepared pool of candidates [21]. This method was used to successfully identify the key olvent coordinate that controls the isomerization dynamics of an alanine dipeptide in solution,which had deﬁed prior intuition-based trial-and-error eﬀorts. The success of this machine learningbased approach lead to a series of further developments along similar lines [10, 18, 19, 22–28].However, a major deﬁciency of machine learning-based methods is that they cannot answerthe real question concerning reaction coordinates – why some coordinates are more important foractivation than the others? Instead, these methods can only inform us empirically which coordinatesappear to be important based on well-deﬁned criteria. Recently, Ma and co-workers developed arigorous theory for mapping out the ﬂow of potential energy through individual coordinates [17,29].It was found that the reaction coordinates are the coordinates that carry high energy ﬂows duringthe activation process. This result suggested an appealing physical picture: energy ﬂows from fastcoordinates into slow coordinates during activation, so that adequate energy can accumulate in theslow coordinates, enabling them to cross the activation barrier on path of these slow coordinates.This physical picture also suggested that reaction coordinates are preferred channels of energyﬂows and are encoded in the protein structure. Through analysis of energy ﬂow, one can obtaineda prioritized list of coordinates that likely play most signiﬁcant roles in the activation process.The most celebrated concept in reaction rate theories is that of the transition state (TS), whichis the dynamic bottleneck of an activated process. Conventional thoughts are that they are locatedat a critical point with Mores index of 1 on the high-dimensional potential energy landscape ofthe molecule. If all the reaction coordinates of an activated process are known, the TS will bean index-1 critical point on the free energy surface of the reaction coordinates, the sole directionwith negative Hessian pointing along the ideal one-dimensional reaction coordinate valid at the topof the activation barrier. Based on this picture, the surface of the multi-dimensional probabilitydistribution of reaction coordinates constructed from an ensemble of reactive trajectories should behighly structured, with the important dynamic states ( e.g. TS) manifesting as critical topologicalfeatures.To gain insights into reaction mechanism, a common practice is to study features of the freeenergy surface. However, all relevant information of an activated process is contained in the re-active trajectories. Instead of the free energy surface, one can construct the dynamic probability urface of the transition state. For certain systems, this can be achieved using the transition pathsampling (TPS) method. Once a large ensemble of reactive trajectories are generated, an ensembleof conﬁgurations that the system sampled during the transition process can be harvested. Fromthis ensemble, one can generate a dynamic probability landscape or transition state surface, whichis usually high-dimensional.The focus of this work is the analysis of the exact topology of the high-dimensional dynamicprobability landscape, and the establishment of its relationship with the transition state of the activeprocess. Instead of Morse index, we characterize the high-dimensional transition state surface by itstopological structures in homology groups. We analyze topological changes in the superlevel set ofthe probability surface at diﬀerent probability levels. Using the technique of persistent homology,we identify the locations of probability peaks in the high-dimensional conﬁguration space and ridgesconnecting them, along with measures of their global prominence.We apply this approach to study the active process of C eq → C ax isomerization of the alaninedipeptide, a well-characterized model system [15, 17, 29]. After the exact topological structuresof the dynamic probability surface are constructed, we identify the location of the ensemble oftransition state conformations. Instead of a saddle point with a Morse index of 1, the transitionstate ensemble (TSE) is found to be located at the top of the most prominent peak after thoseof the reactant and product. In addition, the dynamically important topological structures areretained when the surface is projected onto the 2-dimensional plane of ( φ − θ ), which are known tobe the reaction coordinates [17]. In contrast, when projected to the intuition-derived ( φ − θ )-plane,the topological features of the probability surface no longer contain dynamic information on thetransition state. Furthermore, we ﬁnd that PCA dimension reductions distort surface topologies,such that the transition state ensemble cannot be recovered from PCA-derived topological features:Instead of simpliﬁcation, PCA destroys the dynamic properties of topological features of the originaltransition state surface.Overall, we introduce a novel approach for quantifying the exact topology of high-dimensionalsurfaces. With this approach, we have characterized the precise topology of the transition statesurface of the active process of alanine dipeptide isomerization. We have also established that he TSE is located at the most prominent probability peak beyond those of the reactant andthe product, when the subspace of projection contain the proper reaction coordinates. The newlanguage of homology group and the technique of persistent homology on superlevel sets of high-dimensional probability surfaces introduced here are general and can be applied to investigations ofthe topology of high-dimensional surfaces over conﬁguration space encountered in other problemsof activated process. We brieﬂy discuss how the dynamic probability surface can be constructed for the molecular systemof our interests. We then discuss the problem of understanding 118the topological structures of the probability surface over the relevant conﬁguration space. Ourfocus will be the introduction of the method of homology groups for analyzing the exact topologicalstructures of the dynamic probability surface. This approach is based on the homological structuresof the sequence of the superlevel sets of a probability surface, and diﬀers from previous eﬀorts thatare based on critical points, Morse theory, and Euler characteristics.

We use the transition path sampling (TPS) method [8] to generate a suﬃciently large ensemble ofreactive trajectories for the active process we study. To ensure the transition process is fully coveredwithout bias, the duration of the trajectories is much longer than that of the transition process.Along each trajectories, we further harvested a number of system conﬁguration with a speciﬁc timeintervals to generate an ensemble of the relevant conﬁgurations that the system samples duringthe transition process. From this ensemble, we construct a dynamic probability landscape over a d -dimensional subspace, namely, the joint probability distribution over the d -coordinates. .2 Conﬁguration space and probability surface. We now discuss the general problem of analyzing probability surface over a conﬁguration space of ar-bitrary dimension. We ﬁrst introduce a few relevant concepts, and then review current approaches.This is followed by an exposition of the basics of homology groups and persistent homology. Wefocus on developing these concepts in the setting of cubics and superlevel sets of the probabilitysurface. We will also discuss the key concept of ﬁltration and its construction, and how it can beused to uncover the exact high-dimensional topological features.

Conﬁguration space.

We begin our discussion in a general setting. We use d number offeatures that describe the conﬁgurations of a molecule. These can be bond lengths, bond angles,and torsional angles describing the structure of the molecule. For alanine dipeptide, there are atotal of 60 features that fully describes the conﬁguration of the molecule. For a molecule of ﬁnitesize, the conﬁguration space M ⊂ R d is compact. It is likely that M lies in a subspace of theEuclidean space R d , as there are coupling between diﬀerent degrees of freedom. Probability surface.

Each conﬁguration x = ( x , x , · · · , x d ) ∈ M has a probability f ( x ) ∈ [0 ,

1] associated with it. Namely, we have a function f : M → R [0 , that assigns the probabilityvalue p ( x ) to each conﬁguration x . Superlevel sets and sublevel sets.

For a value 0 ≤ a ≤

1, we can identify all x whoseprobability values f ( x ) ≥ a , which is called the superlevel set M f ≥ a : M f ≥ a ≡ { x ∈ M | f ( x ) ≥ a } = f − ([ a, . Similarly, we can have the the sublevel set M f ≤ a : M f ≤ a ≡ { x ∈ M | f ( x ) ≤ a } = f − ((0 , a ]) . .3 Topology of probability surface: Critical points and Morseindices. It is of great importance to understand the topological structures of the probability surface f ( x ).The approach of analyzing the critical points is well-practiced for exploring the topological struc-tures of high-dimensional surfaces. Critical Points.

The critical points of f ( M ) are where all ﬁrst derivatives of f vanishes: ∂f ( x ) ∂x = 0 , ∂f ( x ) ∂x = 0 , · · · , ∂f ( x ) ∂x d = 0 . Critical points are coordinates independent and can be further classiﬁed into diﬀerent types bythe secondary derivatives. We can organize the secondary derivatives into a d × d Hessian matrix H f ( x ), whose entries are: ( H f ) i, j = ∂ f∂x i ∂x j . At non-degenerative critical points, where the Hessian matrices are non-singular, the Hessian willhave a mixture of positive and negative eigenvalues. The number of negative eigenvalues η is the Morse index of the critical point.

Topology of surface by critical points and Morse theory.

At a critical point, thetopology of sublevel sets changes. For a critical point x with f ( x ) = a , consider the sublevel sets M f

0. The topologyof M f

The topol-ogy of surfaces over conﬁguration space has been the subject of many investigations [41–45] . Forexample, thermodynamic phase transitions have been explored from the viewpoint of topologicalchanges of submanifolds of the conﬁguration space. Changes in Euler characteristics have beenfound to signal the occurrence of phase transitions for certain systems [42, 43].However, it is diﬃcult in practice to understand the global surface topology using criticalpoints and Morse indices. There are only a few model systems where all critical points are knownanalytically [46]. Numerical computation faces numerous challenges. First, it is diﬃcult to identifyall critical points. Methods based on Newton-Ralphson and other techniques require initial guessesand do not guarantee identiﬁcations of all critical points. Furthermore, many initial guesses fall intothe same large basins of attraction and yield no new information. Second, as the probability surfacecan only be constructed approximated from points sampled in the high-dimensional conﬁgurationspace, the degree of sampling may not be suﬃciently detailed to capture the original topology ofthe conﬁguration space. Third, sampling has to be suﬃciently detailed to accurately measure the rst and second derivatives along each coordinate direction at all locations. Fourth, as derivativesreﬂect local properties of the probability surface, there can be numerous critical points that maybe trivial and of little importance if the probability surface is rugged. It is diﬃculty to distinguishthose reﬂecting the global features of the surface from those reﬂecting local dimples or even thosethat are due to noise in sampling. While there have been eﬀorts in visualizations of potential surfaceof molecules and other heuristics (see [44] and references within), to our knowledge, it is not yetpossible to characterize all critical points and the homotopy type of the probability or potentialsurface of a molecule in three dimensional space. Background and overview.

Instead of analyzing the critical points, we adopt a diﬀerentapproach. We are interested in global features such as the occurrence of diﬀerent peaks, andhow they are connected. This can be achieved by examining globally the structures of holesof various dimension and how such structures changes at diﬀerent sublevel or superlevel sets ofthe probability landscape. We adopt an approach based on the theory of homology group andpersistent homology [47–49]. Homology group studies holes in topological spaces and is a classictopic in algebraic topology [50, 51]. Persistent homology computes these holes and measures theirscales at diﬀerent spatial resolutions [48, 49]. Compared to homotopy, homology groups are moreamenable to computation. Our approach is only feasible due to recent progress in computationaltopology and topological data analysis [47–49, 52, 53].To provide an intuitive picture illustrating this approach, we envision a sea level on top of theprobability landscape over the conﬁguration space (Fig 1). We are interested in how mountainpeaks emerge from the sea when the sea level is lowered gradually, and how independent mountainpeaks become connected by land-ridges when the sea level is lowered further. These are related to0-dimensional holes, which are connected components.

Sea levels of the probability landscape f ( x ) on the conﬁguration space M . Thesuperlevel sets M i = M f ≥ a i = f − ( ≥ a i ) at diﬀerent sea level (white regions) can have diﬀerenttopology, with diﬀerent number of components shown in white. Complex and chain.

We ﬁrst discuss how to represent the d -dimensional conﬁguration space M [54]. In this study, we use cubic complexes [53, 54]. A d -dimensional cubic complex K is con-structed from a union of points, line segments, squares, cubes, and their k -dimensional counterpartsglue together properly, where k ≤ d and all have unit length (except points, which have no lengths).We call each of these a k -cell or a k -cube (see Fig 2a for a 3-cell). While the topology of M is invari-ant whether it is represented by cubic complexes or other complexes such as simplicial complexes,the nature of grid representation of the molecular conﬁgurations makes this choice convenient [53].We can build up our cubic complexe K from cubes to represent the conﬁguration spaces.Consider a set of k -cells, we can sum them up. We call the total summation of a set of k cells a k -chain . Fig 2b shows an example where two 3-cells are summed up to form a 3-chain. Fig 2c showshow nine 2-cells are summed up to form a 2-chain. Here the binary operation of summing over two k -cells is orientation sensitive: two k -cells of the same underlying space but opposite orientationcancel out each other when summed up. If a set is equipped with a binary operation satisfyingcertain requirements, it is called a group mathematically. The set of k -chains from the K complexwith our binary operation of summation therefore form a chain group C k ( K ). Boundaries.

We now set boundaries. The boundary of an individual k -cell is the set of its( k − k − An illustration of cubic complex. a). A 3-cubic cell, with the orientation of its 2-facesshown. b). Two 3-cubes are summed to form a 3-chain. The internal square is contributed twicefrom the two cubes. As each surface is oriented ( i.e., counter-clock-wise by the outward surfacenormal), these two squares have opposite orientations and cancel each other when summed. c).An example of a 2-chain formed by 2-cells and its boundary. cell is shown in Fig 2a, which is the set of the six oriented squares. The boundary of a k -chainis the sum of the boundaries of its element k -cells. Because of the nature of our sum operation,internal structures cancel out. Consider the boundary of the two neighboring three-dimensionalcubes in Fig 2b. The interfacial square is contributed twice, once from each cube, but with oppositeorientation as both are counter clock-wise around their outwards normals. When these two cubesare glued together, these two boundary squares are summed up, and they cancel each other out.The overall outcome of this summation is indeed the outer boundary of the union of the twoneighboring cubes. Fig 2c shows a 2-chain and its boundary. This holds true in other dimensionsas well, namely, a ( k − k -cells have opposite orientationsand cancels each other out upon summation.With this summation, we obtain the boundary of a k -chain from K by applying the boundaryoperator ∂ k : ∂ k : C k ( K ) → C k − ( K ) . (1) ycles. There are certain k -chains that have no boundaries. They are called k -dimensionalcycles or k -cycles . With the binary operation of summation discussed earlier, the set of k -cyclesfrom K form the cycle group Z k ( K ): Z k ( K ) ≡ { c ∈ C k ( K ) | ∂ k c = ∅} . As an example, we consider the three-dimensional cube again (Fig 2a). We take its six surfacesquares that fully enclose the solid cube. These square form a 2-chain. As a whole, this 2-chainitself does not have boundaries, as the six squares are glued together along the borders and thereare no openings.

Kernel and image.

Analogous to the null space or kernel in linear algebra, the cycle group Z k ( K ) is the kernel of the operator ∂ k , as each of its member k -chain has no boundary by deﬁnitionand ∂ k will send it to null: Z k ( K ) = ker C k ( K ) ≡ { c ∈ C k ( K ) | ∂ k c = ∅} . We now move one dimension up and consider boundaries of ( k + 1)-chains in K . Boundaries areone dimension lower and therefore the boundaries of ( k + 1)-chains are k -chains. They are calledthe k -boundaries of K and form the k -boundary group B k ( K ). As each k -boundary is obtainedwhen ∂ k +1 is applied to a ( k + 1)-chain, collectively they are the image of ∂ k +1 : B k ( K ) = im ∂ k +1 ( C k +1 ( K )) ≡ { c ∈ C k ( K ) | ∂ k +1 ( c (cid:48) ) = c, c (cid:48) ∈ C k +1 ( K ) } . It turns out that all k -boundaries themselves have no ( k − k ≥ ∂ k − ◦ ∂ k = ∅ . (2) rom our previous example in Fig 2a, the 2-chain of the six squares that enclose the solid cube isthe boundary of a 3-chain (a lone 3-cell in this case). They themselves do not have any opening,and hence the boundary of this 2-chain is ∅ . A consequence of this general property is that we have B k ( K ) ⊆ Z k ( K ) ⊆ C k ( K ). Homology group and Betti number.

There are two types of k -cycles: Those enclose ( k +1)-bodies and those enclose ( k + 1)-holes (Fig 3a). The former are boundaries of the enclosed bodiesof ( k + 1)-chains and can be collapsed into a point (Fig 3a, k -cycles in dark brown/brown enclosing( k + 1)-chains shaded in light brown ). The latter are not boundaries of ( k + 1) bodies and cannotbe collapsed into a point (Fig 3a, other cycles) . We will ﬁrst distinguish these two types of cycles.Furthermore, among cycles that do not enclose a body, they may be so due to diﬀerent reasons, asthe holes they contain may be diﬀerent (Fig 3a). We will distinguish these diﬀerent situations aswell.We consider all cycles containing the same hole essentially the same and group them intoone equivalence class of cycles. As an illustration, green/light green k -cycles in Fig 3a form anequivalence class [ h α ] as they all contain hole h α . So do the purple/light purple k -cycles (class[ h β ] containing hole h β ), and the blue/light blue k -cycles (class [ h γ ] containing hole h γ ). A specialequivalence class are cycles containing the ∅ hole or no hole (Fig 3a, brown/dark brown cycles, class[ ∅ ]). We call cycles in each equivalence class homologous to each other. If they encircle diﬀerentholes, they belong to diﬀerent equivalence classes.We elaborate on this. Among all elements of Z k ( K ), which are k -cycles, we identify all k -boundaries, which contain ( k + 1) bodies and are elements of B k ( K ). Because of Eqn (2), they haveno ( k +1)-holes. We put them into a class denoted as [ ∅ ] as they contain no holes (or ∅ -hole) (Fig 3a,brown/dark brown cycles ∈ [ ∅ ]). For the remaining k -cycles of Z k ( K ), they are not in [ ∅ ] but maycontain diﬀerent holes. We identify those contain hole h a , and put them into the equivalence classdenoted as [ h a ] (Fig 3a, green/light green cycles ∈ [ h a ]). Remaining k -cycles that contain hole h b are put into the class [ h b ], and so on (Fig 3a, purple/light purple cycles ∈ [ h b ], and blue/light bluecycles ∈ [ h c ]). Each element of the set { [ ∅ ] , [ h a ] , [ h b ] , · · · } is an equivalence class . An illustration of homology classes of k -cycles in a ( k + 1) -manifold M k +1 . a). Thetwo k -cycles in brown and dark brown enclose a ( k + 1) -body ( k -chains in light brown). Theycontain no holes and are boundary cycles. They belong to the same equivalence class of [ ∅ ] . The k -cycles in green/light green, purple/light purple, and blue/light blue each contain k -holes h α , h β and h γ , respectively, and are part of equivalence classes of [ h α ] , [ h β ] and [ h γ ] , respectively. b).The k -cycle l contains a hole. The boundary cycle l contains no hole but a ( k + 1) body. Notethat l and l share a common piece of boundary but in opposite orientations. c). When l and l are summed up, we obtain the k -cycle l , which contains the same hole as l . Both l in b)and l in c) belong to the same equivalence class of k -cycles containing this hole. As these equivalence classes themselves form a set, and the outcome of the binary operation ofsummation on elements of Z k is preserved, this set form a new group. This new group is called a quotient group , as it is obtained from Z k ( K ) after factoring out the boundaries B k ( K ). The k -th omology group H k ( K ) is this quotient group: H k ( K ) ≡ Z k ( K ) / B k ( K ) . The elements of H k ( K ) are equivalence classes of homologous cycles representing the holes (or lackof) they encloses.Two k -cycles are homologous to each other if they contain the same hole, or equivalently, if onecan be obtained from another by adding a k -boundary (Fig 3b and Fig 3c). To illustrate this, notethat the cycle labeled l in Fig 3c can be obtained by adding the k -boundary of a ( k + 1)-bodylabeled l to the k -cycle labeled l enclosing a hole (Fig3b). Due the cancellation nature of oursummation, the commonly shared piece of the boundaries is cancelled out and we have the larger k -cycle l in Fig 3c enclosong the same hole. Upto the diﬀerence of a boundary of a solid ( k + 1)-body, these two k -cycles are the same and belong to the same equivalence class. It is not diﬃcult tosee that we can repeat this operations of adding certain k -boundaries and convert any homologous k -cyles between one another.The number of the equivalent classes, or the number of independent k -dimensional holes, iscounted by the dimension of the homology group. It is called the k -th Betti number β k ( K ): β k ( K ) = dim( H k ( K )) Filtration.

We now examine the topological structures of holes in the probability landscapeon the conﬁguration space, when we restrict to conﬁgurations all with probabilities above certainvalue. By gradually adjusting this value, we will be able to trace out the details of topologicalchanges. For an illustration, envision a sea on top of the probability landscape (Fig 4). At thelevel of f ( x ) = 1, it covers the whole landscape. The domain of the part of the landscape abovethe sea level is ∅ . We gradually lower the sea level to value b , when the ﬁrst peak emerges fromthe sea (birth of the ﬁrst peak). At this time, we have the superlevel set M f ≥ b , which are theset of points { x ∈ M | f ( x ) ≥ b } . They form the white region(s) in Fig 4. We further lower the The probability landscape f ( M ) and the topology of its superlevel set M f ≥ . a) Thelandscape and a sea level. The superlevel sets M f ≥ are the regions of the domain M ⊂ R (shownas a plane) whose landscape value is above the sea. b) At f ( x ) = 1 , all is below the sea leveland M f ≥ = ∅ . At f ( x ) = b , b and d , the topology of M f ≥ (shown in white) changes. At f ( x ) = 0 , all is above sea level and we have M f ≥ = M . c) The persistent diagram of the birthand death value of f ( x ) for the -th homology group representing the two peaks. The sublevelsets below the sea M f< are shown in blue. sea level to b when another peak emerges (birth of the second peak), at which time we have thesuperlevel set M f ≥ b . Suppose we continue this process until sea level reaches d where the twopeaks are merged together (death place of the second peak) by a land ridge that has just emergedabove the sea level. At this sea level, we have M f ≥ d . At each of these levels, the topology of the uperlevel set changes, namely, one component, two components, and then one component again.These changes are captured by the changing homology groups and the Betti numbers.We now generalize. We have a descending sequence of probability values corresponding to thelowering sea level: 1 = a > a > a > · · · > a n = 0 , and the corresponding superlevel sets, or the domains of the part of the landscape above the sealevel, which are subspaces of M : ∅ = M ⊂ M ⊂ M · · · ⊂ M n = M . Recall we have the full conﬁguration space M represented by a cubic complex K . Each superlevelset M i is represented by a subcomplex K i ⊂ K , which can be derived from the original full complex K . We then have the corresponding sequence of subcomplexes: ∅ = K ⊂ K ⊂ K · · · ⊂ K n = K. This sequence of subcomlexes is called a ﬁltration .We are interested in how the topology of M f ≥ a i evolves at diﬀerent a i , i = 0 , · · · , n . This isrepresented by the corresponding sequence of homology groups connected by linear maps:0 = H k ( K ) → H k ( K ) → · · · → H k ( K n ) = H k ( K ) . Persistence and Persistent diagram.

As we move from K i − to K i , we may gain a newequivalence class, ( e.g. , a new peak for 0-th homology as in our example), or we may lose one ( e.g. when a peak is merged with another one). We say that an equivalence class of a k -cycle [ α i ] is born at a i if its equivalence class is present in K i but absent in K i − for any value of a i − < a i .The class dies at a i if it is present in K i − for any value of a i − < a i but not at a i . We recordthe location and the value of a i , namely, the corresponding k -cube and its probability value whose nclusion lead to the birth and death events.The prominence of the topological feature of a k -cycle is encoded in its life-time or persistence .Denote the birth value and the death value of class [ α i ] as b i and d i , respectively. The persistence of class [ α i ] is then b i − d i .In the example shown in Fig 4, the equivalence class of 0-cycles (components) associated withthe ﬁrst peak is born at f ( x ) = b . The equivalent class associated with the second peak is bornat f ( x ) = b . At f ( x ) = d , these two components merge together. We say that the second peakdies at d , and its persistence is b − d . The ﬁrst peak dies at f ( x ) = 0, and its persistence is b − b .We record the birth and death events of homology classes in a two-dimensional plot, which iscalled the persistent diagram . Each homology class is represented by a point in this diagram, wherethe birth value b i and the death value d i are taken as its coordinates ( b i , d i ). Fig 4c shows thepersistent diagram of our illustrative example.In general, we have the k -th persistent diagram P k ( f ) of k -cycles for our probability function f : M → R [0 , . It is the set of points such that each point ( x , x ) represents a distinct topologicalfeature of k -cycle, which is present in H k ( M f ≥ a ) = H k ( f − ([ a, a ∈ [ x , x ). Computation.

The key to study homology groups in high-dimensional space is the constructionof the K -complex to represent the conﬁguration space M . In this study, we use the cubic algorithmof [53] with modiﬁcations, so it can be applied to higher dimensions. We consider only the 0-thpersistent homology groups, which records the birth and death of probability peaks. The locations x where birth and death events occur, namely, the corresponding k -cubes are also computed. Results

Ensemble of reactive trajectories and conformations.

The isomerization of alaninedipeptide in vacuum provides a tractable system for understanding the process of activation indetails, and has been well studied as a model for understanding protein conformational changes [21,29, 55].

Figure 5:

The isomerization reaction of alanine dipeptide. (a) Conformations from the reactantand product basins before and after the isomerization. (b) The six reaction coordinates of theisomerization process of alanine peptide examined in this study.

Using transition path sampling [8], we harvested 6 million reactive trajectories. Each trajectoryis of 2.5 ps duration. We further collect conformations every 50 steps at 1 fs/step along eachtrajectory. Altogether, we have a total of 1 . × conformations. All simulations are conductedusing the molecular dynamics software suite GROMACS4.5.4 [56]. Amber94 force ﬁeld was used tofacilitate the comparison with previous results. The simulation was performed with constant totalenergy 36 KJ/mol, such that the averaged temperature is 300K for the transition path ensemble.Note that the transition portion of each reactive trajectory is around 0 . . φ, ψ ) ∈ [( − . , − . × ( − . , . φ, ψ ) ∈ [(0 . , . × ( − . , onstructing dynamic probability surface of transition state. We then construct thedynamic probability surface of the isomeriztion reaction from the sampled 1 . × conformations.With a balanced consideration of the available MD simulation trajectories and the dimensionality,we construct a 5-dimensional conﬁguration space for this study. Based on previous analysis usingthe energy ﬂow theory [17], we selected the top 5 coordinates ( φ, θ , ψ, α, β ) that contribute most tothe activation dynamics. The original 60 dimensional space is then projected onto this 5 dimensionalspace, where each dimension is divided into 15 bins. This leads to 15 = 759 ,

375 5-dimensionalhypercubes.

Computing topological structures of the dynamic probability landscape.

We thencarry out persistent homology analysis. Computations are conducted on a machine with a 20-coreXeon E5-2670CPU of 2.5 GHz, with a cache size of 20 MB and memory of 128 GB Ram. Thecomputing time for ﬁnding the signiﬁcant peaks and ridges connecting them is ≈

30 seconds.

Committor test for conformations at selected locations of the conﬁguration space.

We carry out committor test for conﬁgurations of the dipeptide identiﬁed by persistent homology.The committor value of a conﬁguration is deﬁned as the probability that a dynamic trajectoryinitialed from this conﬁguration, with initial momenta drawn from the Boltzmann distribution,reaches the product basin before the reactant basin. A conﬁguration with committor value p B = 0 . e.g. , φ , θ ) that correspond to the location of the selected peak, with the othercoordinates sampling the equilibrium distribution. For this, we add harmonic restraint potentialson the selected coordinates to the system potential energy function. The minima of the harmonicrestraints are at the target values. Equilibrium MD simulations are then carried out. Conforma-tions harvested from such simulations are ﬁltered to generate an ensemble of conformations that all hare the same target values for the selected coordinates. The restraint potential is used to enrichconformations that satisfy this criterion. Dynamic probability surface on conﬁguration space of ( φ, θ , ψ, α, β ). We examinethe topological structures of this 5-d dynamic probability surface. There are four peaks, eachlocated in a 5-d cube (Fig. 6a, red dots for birth locations of peaks and blue dots for their ridgesor death locations, and values are listed in SI Table 1). The most prominent peak with the largestpersistence is peak b (see persistent diagram in Fig. 6b), which corresponds to the product basin.The second most prominent b is the reactant basin. The third prominent peak at b (Fig 9a) hasroughly the same probability as the reactant basin b , but a shorter persistence, namely, it doesnot stand out from the surrounding landscape as much. It subsequently merges with the peak atthe product basin b . As expected, peaks have all negative eigenvalues for their Hessian matrices,and ridges have one positive and four negative eigenvalues (see Supplementary Info).We then take conformations from the four peaks and the three ridges (SI Table 1), and carry outcommittor tests. We ﬁnd that the most prominent peak b beyond the reactant and product basinsfully captures the transition state ensemble ( p B centered at 0.5, Fig 6c): Trajectories initiated fromconﬁgurations at this location have equal probability towards the reactant or the product basin.In contrast, all committor values p B for locations of b , d , and d are found to be 0. Reactiontrajectories starting from conformations at these locations fall back to the reactant basin. Thecommittor values for peak b are all 1.0: Trajectories from this location all go to the product basin.The committor values for conformations at the ridges d and d follow one-sided distributions(Fig 6d-6e, respectively). Only a negligible amount of conformations have p B = 0 . The 5-d dynamic probability surface on the ( φ − θ ) plane, its topological structures,and the committor values. (a) The 5-d dynamic probability surface p ( φ, ψ, θ , α, β ) shown onthe ( φ − θ ) plane. Red and blue dots are locations of probability peaks and ridges (see also SITable 1). (b) The persistent diagram recording the birth and death probabilities p ( b i ) and p ( d i ) of the peaks in y and x , respectively. (c-e) Distributions of committor values p B for trajectoriesfrom locations of b , d , and d , respectively. (c) The transition state ensemble is located at b . instead of a saddle point. Dynamic probability surface on conﬁguration space ( φ, ψ, α, β, θ ). To examine theimportance of proper choice of the coordinates, we construct another 5-d probability surface by mitting the coordinate θ and replacing it with θ , which is on the other end of the molecule ina position symmetric to θ . Fig. 7a shows the projection of the 5-d surface in − ln p ( x ) on the( φ − θ ) plane.There are four signiﬁcant peaks (Fig. 7a, red and blue dots), which are also shown in thepersistent diagram (Figure 7b), each located in a 5-d cube (see SI Table 1). Peaks b and b arethe most and second most prominent peaks, corresponding to the product and the reactant basins,respectively. In this projection, peaks are separated only in φ and they have almost identical θ values. This diﬀers from the 5-d surface containing θ (Fig 6).We then sample conformations from the location of each of the four peaks and the threeridges (SI Table 1) and perform committor tests. The committor values for peak b are all 1.0,where trajectories starting here all go to the product basin. All committor values for conforma-tions at d , b , and d are 0.0. Reaction trajectories from these locations all go to the reactantbasin. The committor values for d and d follow one-sided distributions (Fig. 7d-7e). Only atiny amount of the conformations have p B = 0 .

5, indicating that the transition state ensemble arelocated elsewhere.The committor values for b has a ﬂat distribution. While there are conformations with p B values close to 0.5, their frequency is similar to any other p B values. This indicates that the 5-dcube where b is located contains some transition state conformations as its φ value is correct,but also a mixture of other conformations with diverse dynamic properties. Overall, without thereaction coordinate θ , this 5-d dynamic probability surface does not describe the activated processadequately, and cannot be used to identify the transition state ensemble. As it is diﬃcult to directly study the topology of a high-dimensional surface, a common practice isto project the surface to a lower dimensional subspace and analyze the topology of the projectedsurface instead. The caveat of this practice is that the original topological features may be lost,

A diﬀerent 5-d dynamic probability surface with θ replacing θ on the ( φ − θ ) plane,its topological structure represented in persistent diagram, and distributions of committor values.(a) The 5-d dynamic probability surface p ( φ, ψ, α, β, θ ) projected onto the ( φ − θ ) plane. Redand blue dots are locations of probability peaks and ridges. (b) The persistent diagram recordingthe birth and death probabilities p ( b i ) and p ( d i ) of the peaks in y and x , respectively. (c-e)Distribution of the committor values p B for trajectories from locations of b , d , and d . new features that are artifacts may arise due to the marginalization of the probability distributions.To assess how well the dynamic properties is retained in a subspace, we project the 5-d surfaceon ( φ , θ , ψ , α , β ) to 2-d planes. We then analyze the topological structures of the projected urfaces, and assess the dynamic behavior of the identiﬁed topological features. We carried out thisanalysis using the 2-d planes of ( φ − θ ) and ( φ − ψ ). Projecting 5-d dynamic probability surface to the ( φ − θ ) plane. After projection,there are three signiﬁcant probability peaks (Fig. 8a, Fig 9a, red and blue dots, and SI Table 2).The most and the next prominent peaks b and b shown in the persistent diagram of Fig. 8b arethe product and reactant basins, respectively.It is informative to compare peak locations on the 5-d surface to that on the projected surface(SI Table 1 and SI Table 2). The φ coordinate for the product basin b is altered from 1.25 to 0.84after projection, while the θ coordinate is unchanged. The ( φ, θ ) coordinates of the reactant basin b are also changed from ( − . , − .

18) to ( − . , . θ of the ridge d is changed from − .

18 to0 .

0, while φ is unchanged. The fourth prominent peak becomes undetectable after projection. Thepersistence diagram (Fig. 8b) is also signiﬁcantly diﬀerent: The third peak is much less prominentwith reduced persistence compared to that in Fig. 6b (see also SI Table 2).We then carry out committor test on conformations from locations of these topological features.The distribution of p B sampled from trajectories from b is centered around 0.5 (Fig 8c) and exhibitssigniﬁcant enrichment of transition state conformations. This is similar to peak b on the originalsurface over the 5-d conﬁguration space (Fig 6c), although the distribution has a broader width.The committor values of trajectories from the product basin b are all 1.0, as they all fall backto the product basin. Similarly, trajectories from b all fall back to the reactant basin with a p B value of 0.0. The committor values for the bridges at d and d follow one sided distributions, butonly a small amount of conformations have p B = 0 . φ - θ ), which is formedby the two dominant reaction coordinates, the dynamic probability surface retain essential dynamicproperties of the transition state surface, and contain rich information such that the transition stateconformations can be recovered. The ( φ - θ )-projection of the 5-d dynamic probability surface, its topological structure,and distributions of committor values. (a) The 5-d dynamic probability surface projected onto the( φ − θ ) plane. Red and blue dots are locations of probability peaks and ridges, respectively. (b)The persistent diagram recording the birth and death probabilities p ( b i ) and p ( d i ) of the peaksin y and x , respectively. (c-e) Distributions of committor values p B for trajectories from peaksand ridges b , d , and d , respectively. (c) Transition state conformations are at b . Projecting 5-d dynamic probability surface to the ( φ − ψ ) plane. φ and ψ anglesare the standard parameters to describe protein secondary structures. After projection to the( φ − ψ ) plane, there are three signiﬁcant probability peaks (Fig. 10a, red and blue dots, and SITable 2). The most and the next most prominent peaks b and b as shown in the persistent iagram (Fig. 10b) are the product and reactant basins, respectively. Similar to projection to the( φ - θ ) plane, locations of both basins are altered upon projection (SI Table 2). Peak 3 becomesvery minor after projection to ( φ − ψ ) plane, and its location is also altered from that of the 5-dsurface. This can be seen in Fig. 9b, where the location of peak 3 on the 5-d surface projectedonto ( φ − ψ ) plane (red point), and the changed location of the new peak after projecting onto the( φ − ψ ) plane (blue dot) are shown.These observations demonstrate that with projection, locations of topological features of prob-ability peaks may change, and their prominence as measured by persistence may also change dra-matically. Fig. 9c explains why the projection to ( φ - ψ ) results in the peak on the 5-d surface (reddot in Fig. 9a) changing location (Fig. 9b, peak location shown as a blue dot, red dot no longer atthe peak). First, as shown in Fig. 9c (beginning bar plot), the probability at the location of thered dot is larger than that at the blue dot on the 5-d surface. The probability p ( φ, ψ ) at each pointon the ( φ - ψ ) plane (Fig. 9b) is the sum of all points on the 5-d surface with the same φ and ψ butwith diﬀerent values in any of the other three coordinates ( θ , α and β ). Thus, the probability ofeach point on the ( φ − ψ ) plane of Fig. 9b is a 3-d hyper-surface of p ( θ , α, β ). When we sum up the3-d hyper-surface along one direction ( e.g. , α ), we obtain a 2-d probability surface ( e.g. p ( θ , β )).The 2-d surfaces for the red and blue dots are shown in the middle panel of Fig. 9c. Reducing thedimension further, we sum up the 2-d probability surface over β . The resulting distributions are 1-dprobability distributions along the θ direction, shown in Fig. 9c for the blue and red dots. Fromthese 1-d distributions, we can see that when details in θ direction are retained, the probabilityat the red dot (actual peak in 5-d) is still higher than the probability at the blue dot. We nextsum up the probability along the θ direction (Fig. 9c, ﬁnal bar plots). The summed values arethe probability values over the 2-d ( φ − ψ ) plane shown in Fig. 9b. As shown by the ﬁnal bar plotin Fig. 9c, the total probability mass for the red dot is now less than that for the blue dot, eventhough the red dot has a higher probability on the original 5-d surface. Hence, the location of peak3 changes when projecting onto ( φ − ψ ) plane.We then carried out committor tests (Fig. 10c). None of the topological features are wherethe transition state ensemble are located: All trajectories starting from b , b , and d go to the The 5-d probability surface projected onto two diﬀerent 2-d planes. (a) The probabilitysurface projected onto the ( φ − θ ) plane. The red dot is the location of the third probability peakafter reactant and product peaks, which is where the transition state conformations are located.(b) The probability surface projected onto the ( φ − ψ ) plane. Conformations with the correcttransition state value of φ are at a non-peak location, which is on a slope below the new peakshown in blue. (c) The process of projecting the 5-d probability surface onto the 2-d ( φ − ψ )plane for the blue and red dots shown in (b). While the probability at the location of the red dotis larger on the 5-d probability surface (left bars), after projection onto the ( φ − ψ ) plane, theprobability at the red dot is smaller than that at the blue dot (right bar plots). As a consequence,the projected probability surface on the ( φ − ψ ) plane does not capture the actual location ofthe peak in the 5-d surface. reactant basin ( p B = 0), and all trajectories starting at b go to the product basin ( p B = 1 . b follow a one-sided distribution around p B = 0 (Fig. 10c). Trajectories tarting there mostly fall back to the reactant basin. Overall, none of the topological features afterprojection to the ( φ - ψ ) plane retain the dynamic properties of the transiton state conformations asthe original 5-d surface. Figure 10:

The ( φ - ψ )-projection of the 5-d dynamic probability surface, its topological structure,and distribution of committor values. (a) The 5-d dynamic probability surface projected onto the( φ - ψ ) plane. Red and blue dots are locations of probability peaks and ridges, respectively. (b)The persistent diagram recording the birth and death probabilities p ( b i ) and p ( d i ) of the peaksin y and x , respectively. (c) Distribution of committor values p B for peak b . Principal Component Analysis (PCA) is a widely used technique for dimension reduction. It hasfound broad applications in molecular simulations [57–61]. However, whether such reduction retainsthe essential dynamics of the activation process and whether the surface topology on the PCAspace can uncover the transition state conformations are not known. Here we assess the dynamicproperties of probability surfaces after PCA dimension reduction.

Projection of p ( φ, ψ, θ , α, β, ) onto ( P C , P C ) by dPCA. We ﬁrst applied dihedralprincipal component analysis (dPCA) to the 5-d probability surface [60, 61]. dPCA is widely usedfor dimension reduction on periodic dimensions. It ﬁrst maps each periodic dimension of circularangle to two new dimensions using the sin and cos functions. Regular PCA is then applied for imension reduction. We use the dPCA procedure and obtain the ﬁrst two principal componentsfrom the variance matrix of the 1 . × conformations. Collectively, they account for 80 .

4% ofthe variance.

Figure 11:

The dynamic probability surface on PCA space, its topological structure, and com-mittor values on principal components space. (a) The projection of the 5-d dynamic probabilitysurface p ( φ, ψ, θ , α, β ) to the plane of ( P C - P C ). (b) The persistent diagram exhibits sixprobability peaks. (c-e) Distribution of committor values p B for trajectories from bridge d , peak b , and bridge d , respectively. The dynamic probability surface after projection to the (

P C P C

2) plane is dramatically more omplex (Fig. 11a and SI Table 3) than the surfaces when projected to either the ( φ - θ ) plane(Fig. 8) or the ( φ - ψ ) plane (Fig. 10). The persistent diagram (Fig. 11b) is also very diﬀerent fromthat of the original 5-d probability surface (Fig. 6b).The committor tests show that all committor values for conformations at b are 1.0 with tra-jectories going to the product basin. All committor values for conformations at b − and d − are0.0: trajectories from there all go to the reactant basin. Committor values at ridge d and peak b follow a one-sided distribution at p B = 0 (Fig. 11c). Committor values at ridge d has highervalues both at p B = 0 . p B = 1 .

0. Conformations at this location are a mixture of those closeto the reactant basin and those close to the product basins. There are few conformations from thetransition state ensemble.Overall, our results show that the dPCA procedure for dimension reduction removes dynamicsrelevant information from the topological features of the probability surface. No conformationsin the transition state of this active process are captured by the topological features of the PCAsurface.

Projection from 39-d angular space to 5-d dPCA subspace.

We also applied dPCAto the original full-dimensional dynamic probability surface. After removal of the 21 bond lengths,we apply dPCA to reduce the remaining 39-dimensional conﬁguration space of angles to 5 principalcomponents. The contour plot of the 5-d PCA surface projected onto the ﬁrst two principalcomponents is shown in Fig. 12a. Persistent homology analysis identiﬁes 16 peaks (Fig. 12b), eachlocated in a 5-d cube. The location and probability of the ﬁrst 6 most dominant peaks and ridgesconnecting them are listed in SI Table 3. This 5-d persistent diagram is very diﬀerent from thatshown in Fig. 6b, exhibiting a signiﬁcantly more complex surface topology.The committor tests show that all committor values for conformations located at the peaksand the ridges are either 0 or 1.0. The two exceptions are ridges d and d (Fig. 12c-d). None ofthe topological features in the dPCA reduced 5-d dynamic probability surface retain the essentialdynamics of the transition state ensembles. Overall, our results demonstrate that dimension reduc-tion by dPCA destroy dynamic properties inherent in the original surface topology of the dynamic The dynamic probability surface on 5-d principal components space reduced by dPCAfrom the 39-d conﬁguration space, its topological structure, and committor values. (a) the 5-d dynamic probability surface p ( P C , · · · , P C ) shown on to the ( P C , P C ) plane. (b) Thepersistent diagram exhibits 16 peaks. (c-d) Distribution of committor p B values for trajectoriesstarted at ridges d and d , respectively. probability surface. Direct PCA projections.

Results using direct PCA to project p ( φ, ψ, θ , α, β ) onto ( P C , P C he dynamic properties in their topological features (see SI for details). In this study, we have introduced a novel approach for characterizing the exact topological featuresof dynamic probability surfaces. Instead of examining critical points and Morse indexes, ours isbased on homology groups of a series of superlevel sets of the probability surface. With quantiﬁ-cation of the scales of these topological features by persistent homology, we are able to uncoverthe relationship between the topology of the dynamic probability surface and the dynamics of theactivation process of the alanine-dipeptide isomerization reaction.This approach allows us to deﬁne the topological properties of the high-dimensional dynamicprobability surface that is associated with the transition state conformations. The probabilitysurface over the transition state region is the most prominent peak after the reactant and productbasins. Instead of a Morse index of 1 as conventionally thought [5,62], transition state ensemble is onthe top of a dynamic probability peak and goes downhill in all directions. As seen when projectedto the ( φ, θ )-plane (Fig 9a), it appears as a small peak rather than a saddle point commonlyassociated with transition state on multi-dimensional free energy surface. This is because thesystem undergoes certain amount of correlated wandering motions at the barrier top, before itgoes down towards the product basin. Our ﬁnding is against the conventional wisdom that the C eq → C ax transition is a ballistic process, as it is a small peptide and the transition occurs invacuum.The dynamic probability surface was constructed from naturally occurring reactive trajectoriesconnecting the reactant and product basins. These trajectories are unbiased and faithfully reﬂecthow the C eq → C ax transition occurs. They contain all the relevant information about thedynamic process of the activation. Unlike the free energy surface commonly used in examining themechanism of an activated process, this probability surface contains additional information thatreﬂect the non-equilibrium nature of the transition dynamics.A common practice in the studies of protein conformational dynamics is to extract mechanistic nsights from the geometry of two-dimensional free energy surface of a double-well along certaincollective variables, which are often chosen based on heuristics or by intuition. For example, onewould associate a saddle region with the transition states. Our results illustrate the caveats ofsuch procedures. All three probability surfaces shown in Figs. 6, 8, and 10 exhibit the canonicaldouble-well feature. In Fig. 8, the 5-d surface on the ( φ, θ )-plane has both the product andreactant peaks and the third peak laying out along the φ -direction alone, indicating that θ is nota reaction coordinate. This is indeed veriﬁed by the committor test: conﬁgurations correspondingto the peak in the saddle region all share the correct value of φ but samples randomly along the θ direction, as illustrated by the ﬂat committor distribution in Fig. 7c. Detialed examination showsthat the 5-d cube containing the transition state conformations (red dot in Fig. 13) also containconformations with other θ values, which are not at the transition state: some go to the productbasin ( e.g. , green dot, Fig. 13b), and others to the reactant basin ( e.g. , blue dot, Fig. 13b). Figure 13:

The horizontal band of the probability surface of Fig. 7a centered at θ = 0 ( − . <θ < . ) expanded to show distribution in θ . (a) This band containing all peaks in Fig. 7ais expanded in the θ direction. The ( φ - θ )-square containing peak b in Fig. 7a is expanded in θ and is shown as a vertical strip between the two dashed lines. This strip contains a mixtureof conformations. The red dot shows the location of the transition state conformations. (b)The distributions of committor values for conformations at the blue, red, and green dots in (a),respectively. Trajectories from the blue and green dots fall back to the reactant and productbasins, respectively. n contrast, projection of the 5-d surface on ( φ, θ )-plane (Fig 6) has the two basins along the φ -direction alone, but the transition region aligned along both φ and θ . This second feature isconsistent with the importance of θ in determining the barrier crossing dynamics [1–4, 15, 17, 21].Interestingly, the ( φ, ψ )-surface (Fig 10) has basins and the saddle region arranged along bothdirections. Conventional wisdom would have led to the conclusion that ψ is important in deﬁningboth reactant and product basins as well as the barrier crossing process. However, the double-wellstructure exhibited on the ( φ - ψ ) plane is profoundly misleading. None of the topological featureson the surface over the ( φ - ψ ) plane retain the dynamic properties of the original 5-d surface, asthe committor test showed that conﬁgurations corresponding to the peak at the transition regioncompletely fall into the reactant basin. This demonstrates that the correlation between φ and ψ ,due to minor roles of ψ to the transition process [1,2], distorted the probability distribution along φ ,such that the ridge/saddle extended into the reactant peak, leading to incorrect φ value that marksthe location of the peak in the transition region. In contrast, θ did not impact the distributionof φ , so the peak in the transition region on the ( φ - θ ) plane still bears the correct φ value for thetransition states.In general, the dynamic properties of topological features of the probability surface are verysensitive to the subspace of projection. This is illustrated by the diﬀerent locations of the transitionstate conformations, which are at a peak location on the ( φ - θ ) plane (Fig 9a, red dot), but are ata slope location below a new peak when projected to the ( φ - ψ ) plane (Fig 9b, blue dot). While theprobability at the correct ( φ - θ ) square for the transition state conformations (Fig 9c, left, red) isat a peak, a diﬀerent location in the ( φ - ψ ) plane (Fig. 9b, blue dot) has higher probability, as inthis projection the probability mass distributed along the dimension of θ is integrated over all θ values (Fig. 9c, higher bar, right), resulting in the concentration of probability peak at this newlocation.Our results show that without the inclusion of the correct reaction coordinates, probabilitysurface of the same dimension no longer correctly characterizes the dynamic properties of theactive process. Without θ , the 5-d dynamic probability surface over ( φ , θ , ψ , α , β ) fail to capturethe dynamics of this active process. ogether, our results showed that intuition-based projection (such as φ - ψ ) or other arbitraryprojection cannot be relied upon for understanding the dynamic properties of activated processes.without rigorous examination such as the committor test, directly assigning mechanistic signiﬁcanceto features of free energy surface is prone to mistakes, misinterpretations, and misunderstanding.Finally, our results show that there are dramatic changes in the topological properties of theprobability surface after dimension reduction, when techniques such as dPCA are applied. Whilethe simple probability surface on the properly constructed 2-d ( φ − θ ) plane contains rich dynamicinformation and is suﬃcient to uncover the transition state conformations, the topological featureson PCA-reduced surfaces can become more complex and no longer reﬂect essential dynamics andcannot be used to identify the transition state conformations.The approach of homology group and the technique of analyzing the persistent homology of theﬁltration of the superlevel sets of high-dimensional probability surfaces introduced here are general.We envision they can be applied to investigate topology of high-dimensional probability surfacesencountered in other physical problems of activated process. Acknowledgement

We thank Drs. Herbert Edelsbrunner and Hubert Wagner for discussion and for generous help inextending the cubic complex algorithm. This work is supported by grants NIH R35 GM127084 andNSF CHE-1665104.

Conﬂict of Interest Statement

There are no conﬂict of interests. eferences [1] David Chandler. Statistical mechanics of isomerization dynamics in liquids and the transitionstate approximation. The Journal of Chemical Physics , 68(6):2959–2970, 1978.[2] H.A. Kramers. Brownian motion in a ﬁeld of force and the diﬀusion model of chemical reactions.

Physica , 7(4):284 – 304, 1940.[3] Philip Pechukas.

Statistical Approximations in Collision Theory , pages 269–322. Springer US,Boston, MA, 1976.[4] E. Wigner. The transition state method.

Trans. Faraday Soc. , 34:29–41, 1938.[5] Peter H¨anggi, Peter Talkner, and Michal Borkovec. Reaction-rate theory: ﬁfty years afterkramers.

Rev. Mod. Phys. , 62:251–341, Apr 1990.[6] Bruce J. Berne, Michal Borkovec, and John E. Straub. Classical and modern methods inreaction rate theory.

The Journal of Physical Chemistry , 92(13):3711–3725, 1988.[7] Eli Pollak and Peter Talkner. Reaction rate theory: What it was, where is it today, and whereis it going?

Chaos: An Interdisciplinary Journal of Nonlinear Science , 15(2):026116, 2005.[8] Peter G. Bolhuis, David Chandler, Christoph Dellago, and Phillip L. Geissler. Transition pathsampling: Throwing ropes over rough mountain passes, in the dark.

Annual Review of PhysicalChemistry , 53(1):291–318, 2002. PMID: 11972010.[9] Rose Du, Vijay S. Pande, Alexander Yu. Grosberg, Toyoichi Tanaka, and Eugene S.Shakhnovich. On the transition coordinate for protein folding.

The Journal of ChemicalPhysics , 108(1):334–350, 1998.[10] Wenjin Li and Ao Ma. Recent developments in methods for identifying reaction coordinates.

Molecular Simulation , 40(10-11):784–793, 2014.

11] Schramm Vern L. and Schwartz Steven D. Promoting vibrations and the function of enzymes.emerging theoretical and experimental convergence.

Biochemistry , 57(24):3299–3308, 2018.PMID: 29608286.[12] Schramm Vern L Schwartz Steven D. Enzymatic transition states and dynamic motion inbarrier crossing.

Nature Chemical Biology , 5(8):551–558, 2009.[13] D. Ryter. On the eigenfunctions of the fokker-planck operator and of its adjoint.

Physica A:Statistical Mechanics and its Applications , 142(1):103 – 121, 1987.[14] L. Onsager. Initial recombination of ions.

Phys. Rev. , 54:554–557, Oct 1938.[15] Peter G. Bolhuis, Christoph Dellago, and David Chandler. Reaction coordinates of biomolec-ular isomerization.

Proceedings of the National Academy of Sciences , 97(11):5877–5882, 2000.[16] Huiyu Li and Ao Ma. Kinetic energy ﬂows in activated dynamics of biomolecules.

The Journalof Chemical Physics , 153(9):094109, 2020.[17] Wenjin Li and Ao Ma. Reaction mechanism and reaction coordinates from the viewpoint ofenergy ﬂow.

The Journal of Chemical Physics , 144(11):114103, 2016.[18] Robert B. Best and Gerhard Hummer. Reaction coordinates and rates from transition paths.

Proceedings of the National Academy of Sciences , 102(19):6732–6737, 2005.[19] Dimitri Antoniou and Steven D. Schwartz. Toward identiﬁcation of the reaction coordinatedirectly from the transition state ensemble using the kernel pca method.

The Journal ofPhysical Chemistry B , 115(10):2465–2469, 2011. PMID: 21332236.[20] Jie Hu, Ao Ma, and Aaron R. Dinner. A two-step nucleotide-ﬂipping mechanism enableskinetic discrimination of dna lesions by agt.

Proceedings of the National Academy of Sciences ,105(12):4615–4620, 2008.

21] Ao Ma and Aaron R. Dinner. Automatic method for identifying reaction coordinates incomplex systems.

The Journal of Physical Chemistry B , 109(14):6769–6779, 2005. PMID:16851762.[22] Baron Peters and Bernhardt L. Trout. Obtaining reaction coordinates by likelihood maxi-mization.

The Journal of Chemical Physics , 125(5):054108, 2006.[23] Dimitri Antoniou and Steven D. Schwartz. The stochastic separatrix and the reaction coordi-nate for complex systems.

The Journal of Chemical Physics , 130(15):151103, 2009.[24] H. Jung Covino R. and G. Hummer. Automatic reaction coordinate discovery in artiﬁcial in-telligence guided computer simulations.

Abstracts of Papers of the American Chemical Society ,257, 2019.[25] Hythem Sidky, Wei Chen, and Andrew L. Ferguson. Machine learning for collective variable dis-covery and enhanced sampling in biomolecular simulation.

Molecular Physics , 118(5):e1737742,2020.[26] Luigi Bonati, Yue-Yu Zhang, and Michele Parrinello. Neural networks-based variationallyenhanced sampling.

Proceedings of the National Academy of Sciences , 116(36):17641–17647,2019.[27] Marcelo Lamim Wang Yihang, Ribeiro Jo˜ao and Tiwary Pratyush. Past–future informationbottleneck for sampling molecular reaction coordinate simultaneously with thermodynamicsand kinetics.

Nature Communications , 10(1):3573, 2019.[28] Yihang Wang, Jo˜ao Marcelo Lamim Ribeiro, and Pratyush Tiwary. Machine learning ap-proaches for analyzing and enhancing molecular dynamics simulations.

Current Opinion inStructural Biology , 61:139 – 145, 2020. Theory and Simulation - Macromolecular Assemblies.[29] Wenjin Li and Ao Ma. A benchmark for reaction coordinates in the transition path ensemble.

The Journal of Chemical Physics , 144(13):134104, 2016.

30] Yukio Matsumoto.

An introduction to Morse theory , volume 208. American MathematicalSoc., 2002.[31] Jie Liang, Clare Woodward, and Herbert Edelsbrunner. Anatomy of protein pockets andcavities: Measurement of binding site geometry and implications for ligand design.

ProteinScience , 7(9):1884–1897, 1998.[32] Jie Liang, Herbert Edelsbrunner, Ping Fu, Pamidighantam V. Sudhakar, and Shankar Sub-ramaniam. Analytical shape computation of macromolecules: I. molecular area and volumethrough alpha shape.

Proteins: Structure, Function, and Bioinformatics , 33(1):1–17, 1998.[33] Herbert Edelsbrunner, Michael Facello, and Jie Liang. On the deﬁnition and the constructionof pockets in macromolecules.

Discrete Applied Mathematics , 88(1):83 – 102, 1998. Computa-tional Molecular Biology DAM - CMB Series.[34] Jie Liang, Herbert Edelsbrunner, Ping Fu, Pamidighantam V. Sudhakar, and Shankar Subra-maniam. Analytical shape computation of macromolecules: Ii. inaccessible cavities in proteins.

Proteins: Structure, Function, and Bioinformatics , 33(1):18–29, 1998.[35] T Andrew Binkowski, Larisa Adamian, and Jie Liang. Inferring functional relationships of pro-teins from local sequence and spatial surface patterns.

Journal of molecular biology , 332(2):505–526, 2003.[36] T Andrew Binkowski, Andrzej Joachimiak, and Jie Liang. Protein surface analysis for functionannotation in high-throughput structural genomics pipeline.

Protein Science , 14(12):2972–2981, 2005.[37] Yan Y Tseng and Jie Liang. Estimation of amino acid residue substitution rates at localspatial regions and application in protein function inference: a bayesian monte carlo approach.

Molecular biology and evolution , 23(2):421–436, 2006.

38] Yan Yuan Tseng, Joseph Dundas, and Jie Liang. Predicting protein function and bindingproﬁle via matching of local evolutionary and geometric surface patterns.

Journal of molecularbiology , 387(2):451–464, 2009.[39] Alan Perez-Rathke, Monifa A Fahie, Christina Chisholm, Jie Liang, and Min Chen. Mechanismof ompg ph-dependent gating from loop ensemble and single channel studies.

Journal of theAmerican Chemical Society , 140(3):1105–1115, 2018.[40] Wei Tian, Chang Chen, Xue Lei, Jieling Zhao, and Jie Liang. Castp 3.0: computed atlas ofsurface topography of proteins.

Nucleic acids research , 46(W1):W363–W367, 2018.[41] Lando Caiani, Lapo Casetti, Cecilia Clementi, and Marco Pettini. Geometry of dynamics,lyapunov exponents, and phase transitions.

Physical review letters , 79(22):4361, 1997.[42] L Angelani, G Ruocco, and F Zamponi. Relationship between phase transitions and topologicalchanges in one-dimensional models.

Physical Review E , 72(1):016122, 2005.[43] Michael Kastner. Phase transitions and conﬁguration space topology.

Reviews of ModernPhysics , 80(1):167, 2008.[44] David J Wales. Exploring energy landscapes.

Annual review of physical chemistry , 69:401–425,2018.[45] David Cimasoni and Robin Delabays. The topological hypothesis for discrete spin models.

Journal of Statistical Mechanics: Theory and Experiment , 2019(3):033216, 2019.[46] Michael Kastner and Dhagash Mehta. Phase transitions detached from stationary points ofthe energy landscape.

Physical review letters , 107(16):160602, 2011.[47] Herbert Edelsbrunner and John L Harer.

Computational topology: an introduction . AmericanMathematical Society, Providence, RI, 2010.[48] Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological persistence andsimpliﬁcation.

Discrete and Computational Geometry , 2002.

49] Gunnar Carlsson. Topology and data.

Bulletin of the American Mathematical Society ,46(2):255–308, 2009.[50] James R Munkres.

Elements of algebraic topology . CRC Press, 2018.[51] Allen Hatcher.

Algebraic topology . 2005.[52] David Cohen-Steiner, Herbert Edelsbrunner, John Harer, and Dmitriy Morozov.

PersistentHomology for Kernels, Images, and Cokernels , pages 1011–1020.[53] Hubert Wagner, Chao Chen, and Erald Vu¸cini. Eﬃcient computation of persistent homologyfor cubical data. In

Topological methods in data analysis and visualization II , pages 91–106.Springer, 2012.[54] Tomasz Kaczynski, Konstantin Mischaikow, and Marian Mrozek.

Computational homology ,volume 157. Springer Science & Business Media, 2006.[55] Peter G. Bolhuis, Christoph Dellago, and David Chandler. Reaction coordinates of biomolec-ular isomerization.

Proceedings of the National Academy of Sciences , 97(11):5877–5882, 2000.[56] van der Spoel David Hess Berk, Kutzner Carsten and Lindahl Erik. Gromacs 4: Algorithmsfor highly eﬃcient, load-balanced, and scalable molecular simulation.

Journal of ChemicalTheory and Computation , 4(3):435–447, 2008. PMID: 11972010.[57] RM Levy, AR Srinivasan, WK Olson, and JA McCammon. Quasi-harmonic method for study-ing very low frequency modes in proteins.

Biopolymers: Original Research on Biomolecules ,23(6):1099–1112, 1984.[58] Angel E Garc´ıa. Large-amplitude nonlinear motions in proteins.

Physical review letters ,68(17):2696, 1992.[59] Laura Riccardi, Phuong H. Nguyen, and Gerhard Stock. Free-energy landscape of rna hair-pins constructed via dihedral angle principal component analysis.

The Journal of PhysicalChemistry B , 113(52):16660–16668, 2009. PMID: 20028141.

60] Yuguang Mu, Phuong H. Nguyen, and Gerhard Stock. Energy landscape of a small peptiderevealed by dihedral angle principal component analysis.

Proteins: Structure, Function, andBioinformatics , 58(1):45–52, 2005.[61] Alexandros Altis, Phuong H. Nguyen, Rainer Hegger, and Gerhard Stock. Dihedral angleprincipal component analysis of molecular dynamics simulations.

The Journal of ChemicalPhysics , 126(24):244111, 2007.[62] Baron Peters, Andreas Heyden, Alexis T. Bell, and Arup Chakraborty. A growing stringmethod for determining transition states: Comparison to the nudged elastic band and stringmethods.

The Journal of Chemical Physics , 120(17):7877–7886, 2004. upporting Information Table 1:

Locations of birth and death of the probability peaks and the ridges connecting themfor dynamic probability surface projected to ( φ, ψ, θ , α, β ) and ( φ, ψ, α, β, θ ) . φ, ψ, θ , α, β ) 5-dimensional subspace of ( φ, ψ, α, β, θ )Label ( φ, ψ, θ , α, β ) Coordinate Probability Label ( φ, ψ, α, β, θ ) Coordinate Probability b (1 . , − . , . , . , .

00) 1 . × − b ( 1 . , − . , . , . , .

00 ) 1 . × − b ( − . , . , − . , . , .

94 ) 1 . × − b ( − . , . , . , . , .

00 ) 1 . × − d ( − . , − . , . , . , .

15 ) 5 . × − d ( − . , − . , . , . , .

00 ) 4 . × − b ( 0 . , − . , . , . , .

10 ) 9 . × − b ( 0 , − . , . , . , .

00 ) 8 . × − d ( 0 . , − . , − . , . , .

15 ) 7 . × − d ( 0 . , . , . , . , .

00 ) 6 . × − b ( − . , − . , . , . , .

94 ) 1 . × − b ( − . , − . , . , . , .

00 ) 1 . × − d ( − . , − . , − . , . , .

34 ) 6 . × − d ( − . , − . , . , . , .

00 ) 5 . × − Table 2:

Locations of birth and death of the probability peaks and the ridges connecting themfor the projected 5-d probability landscape to ( φ − θ ) plane and ( φ − ψ ) plane. p ( φ, ψ, θ , α, β ) projected onto ( φ − θ ) plane p ( φ, ψ, θ , α, β ) projected onto ( φ − ψ ) planeLabel ( φ, θ ) Coordinate Probability Label ( φ, ψ ) Coordinate Probability b (0 . , .

00) 6 . × − b ( 1 . , − .

83 ) 6 . × − b ( − . , .

00 ) 3 . × − b ( − . , .

00 ) 4 . × − d ( − .

42 0 .

19 ) 1 . × − d ( − . , − .

41 ) 8 . × − b ( 0 . , .

00 ) 1 . × − b ( − . , − .

93 ) 3 . × − d ( 0 . , .

00 ) 1 . × − d ( − . , − .

68 ) 1 . × − Locations of the probability peaks and ridges connecting them for a) the5-d probability landscape projected to 2 principal components by dPCA. The twomost important eigenvectors ν and ν of the covariance matrix of the conforma-tions are ν = [ − . , − . , − . , . , − . , − . , . , . , . , . T and ν = [0 . , − . , . , . , − . , . , − . , − . , − . , − . T for( cos( φ ) , sin( φ ) , cos( ψ ) , sin( ψ ) , cos( θ ) , sin( θ ) , cos( α ) , sin( α ) , cos( β ) , sin( β ) ), with eigenval-ues of r = 1 . and r = 0 . , respectively.b) The 39-d probability landscape projected to the 5 principal components by dPCA. d shown in text is at ( P C , · · · , P C ) = ( − . , − . , . , − . , − . , andthe persistence is b − d = 1 . × − − . × − = 1 . × − . d is at ( − . , . , . , − . , − . and the persistence is b − d = 1 . × − − . × − = 1 . × − . p ( φ, ψ, θ , α, β ) projected onto ( P C , P C ) plane p ( x ∈ R ) projected onto ( P C i : i ∈ { , · · · , } ) planeLabel ( P C , P C ) Coordinate Probability Label ( P C , · · · , P C ) Coordinate Probability b ( − . , .

5) 6 . × − b ( − . , . , . , − . , − .

86) 3 . × − b (0 . , . . × − b ( − . , − . , . , − . , − .

86) 2 . × − d ( − . . . × − d ( − . , − . , . , − . , − .

86 ) 2 . × − b ( 1 . , . . × − b ( − . , − . , − . , . , − .

86 ) 2 . × − d ( 1 . , . . × − d ( − . , − . , . , . , − .

68 ) 5 . × − b ( 1 . , . . × − b ( − . , . , − . , . , − .

86 ) 2 . × − d ( 1 . , . . × − d ( − . , . , . , . , − .

51 ) 5 . × − b ( 1 . , − . . × − b ( − . , − . , − . , . , .

86 ) 2 . × − d ( 1 . , . . × − d ( − . , − . , . , − . , − .

51 ) 7 . × − b ( − . , . . × − b ( − . , . , . , − . , .

86 ) 1 . × − d ( − . , . . × − d ( − . , . , . , − . , − .

68 ) 4 . × − Dimension reduction by PCA

Projection of p ( φ, ψ, θ , α, β ) onto ( P C , P C ) by PCA. Here we applied direct PCAwithout using dihedral PCA to the 5-d probability surface. We ﬁrst shift each direction to its meanvalue and then perform PCA to obtain the ﬁrst two principal components from the covariancematrix of the 1 . × conformations.The dynamic probability surface after projection to the ( P C , P C

2) plane is shown in Fig. 14a.This probability surface is dramatically diﬀerent than that of the 5-d surface when projected onto( φ − θ ) plane or ( φ − ψ ) plane. The persistence diagram also shows diﬀerent peaks with diﬀerentbirth and death places (Fig. 14b). Committor tests show that all committor values for the con-formations at b − , and d − are 0.0 with trajectories going to the reactant basin. All committor The dynamic probability surface on PCA space, its topological structure, and com-mittor values on principal components space. (a) The projection of the 5-d dynamic probabilitysurface p ( φ, ψ, θ , α, β ) to the plane of ( P C - P C ). (b) The persistent diagram exhibits sixprobability peaks. (c-d) Distribution of committor values p B for trajectories from bridge d , peak d , respectively. values for conformations at b are 1.0: Trajectories from the location of b all go to the productbasin. Committor values for ridges d and d follow a one-sided distribution. Trajectories startingfrom these locations mostly fall back to the reactant basin. Overall, non of the topological fea-ture after PCA retain dynamic properties of the transition state conformations as the oroginal 5-d urface. Projecting from 39-d space to 5-d PCA subspace.

We applied direct PCA to theoriginal full dimensional dynamic probability surface, after removal of 21 bond lengths, to reducethe remaining 39-dimensional subspace to 5 principal components. The contour plot of the 5-dPCA surface projected onto the ﬁrst two principal components is shown in Fig. 15a. Persistencehomology analysis identiﬁes 9 peaks (Fig. 15b), each located in a 5-d cube. These 5-d persistentdiagram is very diﬀerent from the persistence diagram shown for the original 5-d subspace.The committor tests show that all committor values for conformations located at the peaks b − are 0.0. That is all the trajectories starting from these locations fall back to the reactantbasin. In addition, all committor values for the conformations located at the peaks b − , b , andthe ridges d − , d are 1.0. Trajectories from these locations move forward to the product basin.Committor values for ridges d − follow one-sided distributions with the peak at p B = 0 (Fig. 15c-e).Trajectories starting from these ridges mostly fall back to the reactant basin.Overall, These results demonstrate that dimension reduction by PCA destroys dynamic prop-erties inherent on the original surface topology of the dynamic probability surface. The dynamic probability surface on 5-d principal components space reduced by dPCAfrom the 39-d conﬁguration space, its topological structure, and committor values. (a) the 5-d dynamic probability surface p ( P C , · · · , P C ) shown on to the ( P C , P C ) plane. (b) Thepersistent diagram exhibits 9 peaks. (c-e) Distribution of committor p B values for trajectoriesstarted at ridges d , d , and d , respectively., respectively.