[PDF] Deep Generative Model Driven Protein Folding Simulation

Abstract

Full PDF

DDeep Generative Model Driven Protein Folding Simulations

Heng Ma , Debsindhu Bhowmik , Hyungro Lee , Matteo Turilli , Michael T. Young ,Shantenu Jha , Arvind Ramanathan Abstract

Significant progress in computer hardware and software have enabled moleculardynamics (MD) simulations to model complex biological phenomena such as proteinfolding. However, enabling MD simulations to access biologically relevant timescales(e.g., beyond milliseconds) still remains challenging. These limitations include (1)quantifying which set of states have already been (sufficiently) sampled in an ensembleof MD runs, and (2) identifying novel states from which simulations can be initiated tosample rare events (e.g., sampling folding events). With the recent success of deeplearning and artificial intelligence techniques in analyzing large datasets, we posit thatthese techniques can also be used to adaptively guide MD simulations to model suchcomplex biological phenomena. Leveraging our recently developed unsupervised deeplearning technique to cluster protein folding trajectories into partially foldedintermediates, we build an iterative workflow that enables our generative model to becoupled with all-atom MD simulations to fold small protein systems on emerging highperformance computing platforms. We demonstrate our approach in folding Fs-peptideand the ββα (BBA) fold, FSD-EY. Our adaptive workflow enables us to achieve anoverall root-mean squared deviation (RMSD) to the native state of 1.6 ˚ A and 4.4 ˚ A respectively for Fs-peptide and FSD-EY. We also highlight some emerging challenges inthe context of designing scalable workflows when data intensive deep learningtechniques are coupled to compute intensive MD simulations. Multiscale molecular simulations are widely used to model complex biologicalphenomena, such as protein folding, protein-ligand (e.g., small molecule, ligand/ drug,protein) interactions, and self-assembly [6, 14]. However, much of these phenomenaoccur at timescales that are fundamentally challenging for molecular simulations toaccess, even with advances in both hardware and software technologies [3]. Hence, thereis a need to develop scalable, adaptive simulation strategies that can enable sampling oftimescales relevant to these biological phenomena.Many adaptive sampling techniques [4, 12, 13, 19, 24, 27] have been proposed. Allthese techniques share some similar characteristics, including (a) the need for efficientand automated approaches to identify a small number of relevant conformational 1/11 a r X i v : . [ q - b i o . B M ] A ug oordinates (either through clustering and/or dimensionality reductiontechniques) [9, 23, 25], and (b) the identification of the ‘next’ set of simulations to runsuch that more trajectories are successful in attaining a specific end goal (e.g., proteinthat is well folded, protein bound to its target ligand, etc.) [19, 24].These adaptive simulations present methodological and infrastructral challenges.Ref. [13] provides important validation of the power of adaptive methods over traditional“vanilla” molecular dynamics (MD) simulations or “ensemble” simulations. Ref. [1]highlights challenges of such workflows on high-performance computing platforms.We recently developed a deep learning based approach that uses convolutions and avariational autoencoder (CVAE) to cluster simulations in an unsupervised manner [2].We have shown that our CVAE can discover intermediate states from protein foldingpathways; further, the CVAE-learned latent dimensions cluster conformations intobiophysically relevant features such as number of native contacts, or root mean squareddeviation (RMSD) to the native state.We posit that the CVAE learned latent features can be used to drive adaptivesampling within MD simulations, where the next set of simulations to run are decidedbased on a measure of ‘novelty’ of the simulation/ trajectory frame observed.Integrating CVAE concurrently with large-scale ensemble simulations onhigh-peformance computing platforms entails the aforementioned complexity of adaptiveworkflows [1], while introducing additional infrastructural challenges. These arise fromthe concurrent and adaptive execution of heterogeneous simulations and learningworkloads requiring sophisticated workload and performance balancing, inter alia.In this paper, we implement a baseline version of our deep learning driven adaptivesampling workflow with multiple concurrent instances of MD simulations and CVAEs.Our contributions can be summarized as follows: • We demonstrate that deep learning based approaches can be used to driveadaptive MD simulations at scale. We demonstrate our approach in folding smallproteins, namely Fs-peptide and the β - β - α -fold (BBA) protein and show that it ispossible to fold them using deep learning driven adaptive sampling strategy. • We highlight parallel computing challenges arising from the unique characteristicsof the worklfow, viz., training of deep learning algorithms can take almost asmuch time as running simulations, necessitating novel developments to deal withheterogeneous task placement, resource management and scheduling.Taken together, our approach demonstrates the feasibility of coupling deep learning (DL)and artificial intelligence (AI) workflows with conventional all-atom MD simulations.

Two key components of the workflow include the MD simulation module and thedeep-learning based CVAE module, which are described below.

Molecular dynamics (MD) simulations:

The MD simulations are performed onGPUs with OpenMM 7.3.0 [7]. Both the Fs-peptide and BBA systems were modeledusing the Amberff99SB-ildn force field [16] in implicit Onufriev-Bashford-Case GBSAsolvent model [20]. The non-bonded interactions are cut off at 10.0 ˚ A and no periodicboundary condition was applied. All the bonds to hydrogen are fixed to theirequilibrium value and simulations were run using a 2 fs time step. Langevin integratorwas used to maintain the system temperature at 300 K with a friction coefficient at 91ps − . The initial configuration was optimized using L-BFGS local energy minimizerwith tolerance of 10 kJ/mol and maximum of 100 iterations. The initial velocity isassigned to each atom from a Boltzmann distribution at 300 K. We also added a new2/11eporter to calculate the contact matrix of C α atoms in the protein (using a distancecut-off of 8 ˚ A in hdf5 format using the MDAnalysis module [10, 18] that could be usedas inputs to the deep learning module (described below). Each simulation run outputs aframe every 50 ps. Convolutional Variational Autoencoder (CVAE):

Autoencoder is a deepneural network architecture that can represent high dimensional data in a lowdimensional latent space while retaining the key information [5]. With its uniquehourglass shaped architecture, an autoencoder compresses input data into a latent spacewith reduced dimension and reconstructs it to the original data. We use the CVAE tocluster the conformations from our simulations in an unsupervised manner [2, 21].Currently in our workflow, we use the number of latent dimensions as a hyperparameter(varying between 3 , , ,

6) and use the CVAE that most accurately reconstructs theinput contact maps [2, 21]. CVAE was implemented using Keras/TensorFlow andtrained on a V100 GPU for 100 epochs.

MD Simulation 1(OpenMM) MD Simulation 2(OpenMM) MD Simulation K(OpenMM) … Data collection (trajectories + contact maps [.h5])

Simulation tasks Machine Learning/ Deep Learning tasks E x e c u t i on t i m e CVAE training 1(Tensorflow) CVAE training 2(Tensorflow) CVAE training M(Tensorflow) … Choose best CVAE model for inference

Hyperparameter optimization/ training

CVAE inference(Tensorflow) collect 100,000 conformations for training

MD Simulation 1(OpenMM) MD Simulation 2(OpenMM) MD Simulation K(OpenMM) Cluster conformational states novelstates? No MD Simulation K+1(OpenMM) MD Simulation K+2(OpenMM)

Yes

Spawn new trajectories with novel statesTerminate simulations

GPU1 GPU 2 GPU K iterate until new training cycle is needed or protein is folded outlier detection

Figure 1.

Deep generative model driven proteinfolding simulation workflow.

Assembling our workflow:

As illustrated in Figure1, our prototype workflow couplesthe two components. In the firststage, the objective is to initiallytrain the CVAE to determinethe optimal number of latentdimensions required to faithfullyreconstruct the simulation data. Wecommence our runs as an ensembleof equilibrium MD simulations.Ensemble MD simulations areknown to enable better sampling ofthe conformational landscape, and also can be run in an embarrassingly parallel manner.The simulation data is converted into a contact map representation (to overcome issueswith rotation/translation within the simulation box) and are streamed at regularintervals into the CVAE module. The output from the first stage is an optimallylearned latent representation of the simulation data, which organizes the landscape intoclusters consisting of conformations with similar biophysical features (e.g., RMSD to thenative state). Note that this is an emergent property of the clustering and the RMSD tothe native state is not used as part of training data.In the second stage, our objective is to identify the most viable/ promising next setof starting states for propagating our MD simulations towards the folded state. Weswitch the use of CVAE to infer from newly generated contact maps (from simulations)and observe how they are clustered. Based on their similarity to the native state(measured by the RMSD), a subset of these conformations are selected for propagatingadditional MD runs. The workflow is continued until the protein is folded (i.e.,conformations reach a user-defined RMSD value to the native state).

We used the Celery software to implement the aforementioned workflow. Celery is anasynchronous task scheduler with a flexible distributed system to process messages andmanage operations, which enables real-time task processing and scheduling. The taskscan be executed and controlled by the Celery worker server asynchronously orsynchronously. Celery applications use callables to represent the modules that are part3/11f the workflow. Once called, the task client adds to the task queue a message where itsunique name is referred so that the worker can locate the right function to execute. Theflexibility of Celery framework enables real-time interfacing to manage resource andexcise control over the task scheduling and execution. All tasks can be monitored andcontrolled directly by the object functions. By calling the tasks at different stages oftheir program, we simply build multi-task workflows, which supports a large volume ofconcurrent tasks with real-time interfacing. The use of Celery framework allows us toestablish a baseline for estimating the compute requirements of our workflow. Ourworkflow comprises of two callables, namely that of MD simulations, and the CVAEused either in training or inference mode.We tested our deep learning driven adaptive simulation framework on the NVIDIADGX-2 system at Oak Ridge National Laboratory (ORNL). The DGX-2 systemprovides more than 2 petaflops of computational power from a single node thatleverages its 16 interconnected NVIDIA Tesla V100-SXM3-32GB GPUs. This enablesus to distribute the MD simulations and CVAE training onto 12 and 4 GPUsrespectively. All the components in the workflow are encapsulated within a Pythonscript that manages the various tasks through Celery. It first initializes the Celeryworker along with the selected broker, RabbitMQ. All 16 GPUs are then employed forMD simulations to first generate 100,000 conformers as the initial training data forCVAE. With 5 minute interval between iterations, the trained CVAEs continuouslycompress C α contact map of conformers from MD trajectories into data points in latentspace, which are subsequently evaluated with density-based spatial clustering ofapplications with noise (DBSCAN) for identifying outlier conformations [8]. We usedDBSCAN for its relative simplicity and also to establish a baseline implementation ofour code. For Fs-peptide, outliers were collected all four trained CVAE models and onlyCVAE with 6 dimensional latent space was applied for BBA outlier searching. In eachiteration, the MD runs are examined for outliers. Simulations that pass an initialthreshold of 20,000 frames (1 µ s) for Fs-peptide and 10,000 (0.5 µ s) for BBA, but donot produce any outliers for the last 5000 frames (250 ns of simulation time) are purged.With the available GPUs from such MD runs, new MD simulations are launched fromthe the outliers to ensure appropriate resource management and usage. In previous work [2], we have shown that the CVAE can learn a latent space from theFs-peptide simulations such that the conformations from the simulations cluster intodistinct clusters consisting of folded and unfolded states. When parameters such as theRMSD (to the native sztate) and the fraction of native contacts are used to annotatethe latent dimensions [11], we showed that these latent representations correspond toreaction coordinates that describe how a protein may fold (beginning with the unfoldedstate ensemble). Thus, we posit that we can propagate the simulations along theselow-dimensional representations and can drive simulations to sample folded states of theprotein in a relatively short number of iterations.Figure 2 summarizes the results of our folding simulations of Fs-peptide. Thepeptide consists of 21 residues – Ace-A (AAARA) A-NME – where Ace and NMErepresent the N- and C-terminal end caps of the peptide respectively, and A representsthe amino acid Alanine, where as R represents the amino acid Arginine. It is often usedas a prototypical system for protein folding and adopts a fully helical structure as partof its native state ensemble [17]. Previous simulations used implicit solvent simulationsusing the GBSA-OBC potentials and the AMBER-FF99SB-ILDN force-field with anaggregate simulation time of 14 µ s at 300K [17]. We used the same settings for our MDsimulations and initiated our workflow. Summary statistics of the simulations areprovided in Table 1. A total of 90 iterations of the workflow was run to obtain a total4/11 z z A D native statepartially folded / intermediate statesunfolded states z z z B native stateunfolded states partially folded / intermediate states FC E na t i v e s t a t e pa r t i a ll y f o l ded / i n t e r m ed i a t e s t a t e s un f o l ded s t a t e s unp r odu c t i v e t r a j e c t o r i e s Figure 2.

CVAE-driven folding simulations of Fs-peptide.(A) Root mean squareddeviation (RMSD) with respect to the native/ folded state from the 31 trajectoriesgenerated using our adaptive workflow for the Fs-peptide system. Only productivesimulations – i.e., simulations that achieve a RMSD cut-off of 4.5 ˚ A or less are highlightedfor clarity. The rest of the simulations are shown in light gray. (B) A histogram of theRMSD values in panel (A) depicting the RMSD cut-off for identifying folded, partiallyfolded, and unfolded ensembles from the data. The corresponding regions are also markedin panel (A). (C) Using the RMSD to the native state as a measure of foldedness of thesystem, we project the simulation data onto a three dimensional latent representationlearned by the CVAE. Note that the folded states (low RMSD values highlighted indeeper shades of blue) are separated from the folding intermediate (shades of greenand yellow) and the unfolded states (darker shades of red).(D) A zoomed in projectionof the last 0.5 µ s of simulations generated along with the original projections (shownin pale gray, subsampled at every 100 th snapshot). (E) highlights the same but justshowing the samples from the last 0.5 µ s to highlight the differences between foldedand unfolded states. (F) shows representative snapshots from our simulations withrespect to the unfolded, partial folded, and native state ensembles. Note that the cartoonrepresentation shown in orange represents the native state (minimum RMSD of 1.6 ˚ A toreference structure) determined from our simulations. 5/11ystem Total no.simulations Totalsimulationtime ( µ s) (Shortest*,Longest)simulations( µ s) Iterations Min.RMSD (˚ A )Fs-peptide 31 54.198 1.01, 3.4 90 1.6BBA (FSD-EY) 45 18.562 0.517,0.873 100 4.44 Table 1.

Summary statistics of simulations. *Only considering the simulations thatpass the initial threshold.sampling of 54.198 µ s. Note that the sampling time of the MD simulations is anaggregate measure similar to the ones reported in previous studies.We began by examining the RMSD with respect to the native state from all of oursimulations. As shown in Figure 2A, 13 of the total of 31 simulations are unproductive –i.e., they do not sample the native state consisting of the fully formed α -helix. This isnot entirely surprising given that the starting state consists of a nearly linear peptidewith no residual secondary structures. Based on this observation, we posited that ourCVAE model can be used to identify partially folded states from the simulations. Wealso examined the histogram of the RMSD values computed for each conformation withrespect to the native state ensemble (Figure 2B). Based on the histograms, we canreasonably choose a threshold of 3.1˚ A or less to depict the folded state ensemble,followed by 4.6 ˚ A for partially folded states, and 8.3 ˚ A for the unfolded states. Anytrajectory that shows RMSD values beyond 8.3 ˚ A are only sampling the unfolded stateof the protein.The projections of all the 31 simulations onto the learned CVAE is depicted inFigure 2C. Collectively, z - z provide a description of the Fs-peptide folding process.Notably, much of the folded conformational states (highlighted in blue, indicating lowRMSD to the native state) are clustered together. Similarly, the unfolded conformations(conformations colored in darker shades of red with higher RMSD to the native stateensemble) are also clustered together. Taking this further, we examined if the similarityin the conformations hold even with a smaller partition of the data (see Figures 2D andE), namely the last 10% of the overall simulation data. This can be treated as a test setfrom which new simulations are initiated. Notably, from these simulations we observethe presence of roughly three arms in the projections (Figure 2E) consisting of: (1)partially folded highlighted in shades of green/yellow, (2) unfolded state ensemblehighlighted in shades of red, and (3) a much smaller ensemble of folded states(highlighted in blue).For each of these states, we can also extract the structural characteristics withrespect to the folded state (Figure 2F). Many of the unfolded states do not consist ofany secondary structural features (top and bottom left panels). The partially foldedstates consist of partial turns/ helical structures. The final folded state (with RMSD of1.6 ˚ A ) consists of most (if not all) helical turns in the protein. The BBA protein namely, FSD-EY is a designed protein that adopts a β - β - α -fold in itsnative state; however this protein tends to be dynamic in solution [15, 22]. Similar toother zinc-finger proteins, the structure of the protein can potentially vary, andrepresents a challenging use-case for testing our workflow. As shown in Figure. 3, oursimulations do start with a completly unfolded state of the protein (average RMSD tonative state is about 12 ˚ A . Using an aggregated MD sampling time of 18 µ s, we notethat we reach a RMSD value of 4.44 ˚ A .Although we do not sample the native state of the protein consisting of the β - β - α -fold, we are still able to sample regions of the landscape that consist of a defined6/11 ear native statepartially folded / intermediate statesunfolded states na t i v e s t a t e pa r t i a ll y f o l ded / i n t e r m ed i a t e s t a t e s un f o l ded s t a t e s unp r odu c t i v e t r a j e c t o r i e s A DBC z z z Y3Y7 F12F21F25

Figure 3.

CVAE-driven folding simulations of BBA-fold, FSD-EY. (A) RMSD plotswith respect to the native state of FSD-EY depicting the near-native state (blue),partially folded states (green) and unfolded (red) trajectories similar to Figure 2. (B)A histogram of the RSMD values to the native state. (C) The learned projectionsfrom the CVAE for the trajecotries; similar to the Fs-peptide system, we can observethe clustering of conformations based on their RMSD values to the native state. Wehave used a RMSD cut-off of 10˚ A to highlight states closer to the native state. (D)Although we could not fully fold the protein, we do observe the presence of a well-formedhydrophobic core except for one residue (F25) at the C-terminal end of the protein.hydrophobic core consisting of the highlighted residues in Figure 3D. Except for thedynamic C-terminal end, where the hydrophobic interactions between F21 and F25 arenot entirely stable, the conformations that exhibit low RMSD values to the native statedepict the presence of this hydrophobic core. We expect that extending thesesimulations further using the CVAE-driven protocol will enhance these interactionsallowing it to fold completely. As artificial intelligence (AI) and deep learning (DL) techniques become more pervasivefor analyzing scientific datasets, there is an emerging need for supporting AI/DLcoupled workflows to traditional HPC applications such as MD simulations. Ourapproach provides a proof-of-concept for how we can guide MD simulations to samplefolded state ensemble of small proteins using DL techniques. The approach that wechose was based on building a generative model for protein conformations andidentifying new starting conformations for additional MD sampling. Although thegenerative model was only used to identify novel conformations for extending our MDsimulations, it nevertheless allowed us to guide the MD simulations towards samplingfolded conformations of the protein systems we considered.Although DL approaches can take significantly longer time to train, we deliberatelychose a prototypic DL approach, namely CVAE, to train on our MD simulation data(Table 2). As can be seen from the table, the computational cost of training andinference times for the CVAE model is on par with the cost for running our MDsimulations. That is, within the time required to train our CVAE model, our MDsimulations progress only by about a nanosecond. Thus, starting up of new MD 7/11ystem DL training(100 epochs;minutes) Time perepoch (seconds) Inference time(ms/frame) MD simulations(ns per minute)Fs-peptide 7 5 5.13 1.25BBA 11 7 1.27 1.20

Table 2.

Acknowledgements

We thank Vivek Balasubramanian and Jumana Dakka for helpful discussions and earlycontributions. We also thank Chris Layton for helping set up our runs on the NVIDIA8/11GX-2 compute systems. We also acknowledge support by NSF DIBBS 1443054 andNSF RADICAL-Cybertools 1440677. This manuscript has been authored byUT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Departmentof Energy. The United States Government retains and the publisher, by accepting thearticle for publication, acknowledges that the United States Government retains anon-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce thepublished form of the manuscript, or allow others to do so, for United StatesGovernment purposes. The Department of Energy will provide public access to theseresults of federally sponsored research in accordance with the DOE Public Access Plan(http://energy.gov/downloads/doe-public-access-plan). This research used resources ofthe Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory,which is supported by the Office of Science of the U.S. Department of Energy underContract No. DE-AC05- 00OR22725.

References

1. V. Balasubramanian, T. Jensen, M. Turilli, P. M. Kasson, M. R. Shirts, andS. Jha. Implementing adaptive ensemble biomolecular applications at scale.

CoRR , abs/1804.04736, 2018.2. D. Bhowmik, S. Gao, M. T. Young, and A. Ramanathan. Deep clustering ofprotein folding simulations.

BMC Bioinformatics , 19(18):484, 2018.3. G. R. Bowman, K. A. Beauchamp, G. Boxer, and V. S. Pande. Progress andchallenges in the automated construction of markov state models for full proteinsystems.

J Chem Phys , 131(12):124101, 2009.4. S. Doerr, I. Ariz-Extreme, M. J. Harvey, and G. De Fabritiis. Dimensionalityreduction methods for molecular simulations.

ArXiv e-prints , Oct. 2017.5. C. Doersch. Tutorial on variational autoencoders. arXiv preprintarXiv:1606.05908 , 2016.6. R. O. Dror, R. M. Dirks, J. Grossman, H. Xu, and D. E. Shaw. Biomolecularsimulation: a computational microscope for molecular biology.

Annu Rev Biophys ,41(1):429–452, 2012.7. P. Eastman, J. Swails, J. D. Chodera, R. T. McGibbon, Y. Zhao, K. A.Beauchamp, L.-P. Wang, A. C. Simmonett, M. P. Harrigan, C. D. Stern, R. P.Wiewiora, B. R. Brooks, and V. S. Pande. Openmm 7: Rapid development ofhigh performance algorithms for molecular dynamics.

PLOS ComputationalBiology , 13(7):1–17, 07 2017.8. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm fordiscovering clusters in large spatial databases with noise. pages 226–231. AAAIPress, 1996.9. G. Fox, J. A. Glazier, J. Kadupitiya, V. Jadhao, M. Kim, J. Qiu, J. P. Sluka,E. Somogyi, M. Marathe, A. Adiga, and S. Jha. Learning everywhere: Pervasivemachine learning for effective high-performance computation.

High-PerformanceBig Data Computing, IPDPS Workshop, Rio de Janeiro , 2019.10. R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler,J. Domanski, D. L. Dotson, S. Buchoux, I. M. Kenney, and O. Beckstein.MDAnalysis: A python package for the rapid analysis of molecular dynamics9/11imulations. In Sebastian Benthall and Scott Rostrup, editors,

Proceedings of the15th Python in Science Conference , pages 98 – 105, 2016.11. J. Gsponer and A. Caflisch. Molecular dynamics simulations of protein foldingfrom the transition state.

Proceedings of the National Academy of Sciences ,99(10):6719–6724, 2002.12. N. S. Hinrichs and V. S. Pande. Calculation of the distribution of eigenvalues andeigenvectors in markovian state models for molecular dynamics.

The Journal ofChemical Physics , 126(24):244101, 2007.13. E. Hruska, V. Balasubramanian, J. R. Ossyra, S. Jha, and C. Clementi.Extensible and scalable adaptive sampling on supercomputers. arXiv preprintarXiv:1907.06954 , 2019.14. E. H. Lee, J. Hsin, M. Sotomayor, G. Comellas, and K. Schulten. Discoverythrough the computational microscope.

Structure , 17(10):1295–1306.15. K. Lindorff-Larsen, P. Maragakis, S. Piana, M. P. Eastwood, R. O. Dror, andD. E. Shaw. Systematic validation of protein force fields against experimentaldata.

PLOS ONE , 7(2):e32131–, 02 2012.16. K. Lindorff-Larsen, S. Piana, R. O. Dror, and D. E. Shaw. How fast-foldingproteins fold.

Science , 334(6055):517–520, 2011.17. R. T. McGibbon. Fs MD Trajectories. 5 2014.18. N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. Mdanalysis:A toolkit for the analysis of molecular dynamics simulations.

J Comput Chem ,32(10), 2011.19. S. Mittal and D. Shukla. Recruiting machine learning methods for molecularsimulations of proteins.

Molecular Simulation , 44(11):891–904, 2018.20. A. Onufriev, D. Bashford, and D. A. Case. Exploring protein native states andlarge-scale conformational changes with a modified generalized born model.

Proteins: Structure, Function, and Bioinformatics , 55(2):383–394, 2004.21. R. Romero, A. Ramanathan, T. Yuen, D. Bhowmik, M. Mathew, L. B. Munshi,S. Javaid, M. Bloch, D. Lizneva, A. Rahimova, A. Khan, C. Taneja, S.-M. Kim,L. Sun, M. I. New, S. Haider, and M. Zaidi. Mechanism of glucocerebrosidaseactivation and dysfunction in gaucher disease unraveled by molecular dynamicsand deep learning.

Proceedings of the National Academy of Sciences ,116(11):5086–5095, 2019.22. C. A. Sarisky and S. L. Mayo. The ββα fold: explorations in sequencespace11edited by m. f. summers.

Journal of Molecular Biology , 307(5):1411 –1418, 2001.23. A. J. Savol, V. M. Burger, P. K. Agarwal, A. Ramanathan, and C. S.Chennubhotla. QAARM: quasi-anharmonic autoregressive model revealsmolecular recognition pathways in ubiquitin.

Bioinformatics , 27(13):52–60, Jul2011.24. Z. Shamsi, K. J. Cheng, and D. Shukla. Reinforcement learning based adaptivesampling: Reaping rewards by exploring protein conformational landscapes.

TheJournal of Physical Chemistry B , 122(35):8386–8395, 09 2018. 10/115. M. R. Shirts and V. S. Pande. Mathematical analysis of coupled parallelsimulations.

Phys Rev Lett , 86(22):4983–4987, May 2001.26. M. Turilli, V. Balasubramanian, A. Merzky, I. Paraskevakos, and S. Jha.Middleware building blocks for workflow systems.

Computing in Science &Engineering (CiSE) special issue on Incorporating Scientific Workflows inComputing Research Processes , https://arxiv.org/abs/1903.10057.27. J. K. Weber and V. S. Pande. Characterization and rapid sampling of proteinfolding markov state model topologies.