Artificial intelligence techniques for integrative structural biology of intrinsically disordered proteins
Arvind Ramanathan, Heng Ma, Akash Parvatikar, Chakra S. Chennubhotla
GGraphical Abstract
Artificial intelligence techniques for integrative structural biology of intrinsically disordered pro-teins
Arvind Ramanathan,Heng Ma,Akash Parvatikar,S. Chakra Chennubhotla a r X i v : . [ q - b i o . B M ] D ec ighlights Artificial intelligence techniques for integrative structural biology of intrinsically disordered pro-teins
Arvind Ramanathan,Heng Ma,Akash Parvatikar,S. Chakra Chennubhotla• Recent successes of artificial intelligence (AI) and machine learning (ML) techniques can be leveraged to obtain quan-titative insights into how intrinsically disordered proteins function.• Review highlights the use of AI/ML techniques to characterize the intrinsic statistical coupling in IDP atomistic fluc-tuations involved in coupled folding and binding processes using linear, non-linear, and hybrid approaches.• AI/ML methods can also be used to learn force-field parameters from long time-scale simulations as well as used toautomatically coarse-grain IDP simulations.• Bayesian inference methods in conjunction with AI/ML methods can be used to integrate sparse experimental observ-ables to obtain a comprehensive picture of how IDPs function. rtificial intelligence techniques for integrative structural biology ofintrinsically disordered proteins
Arvind Ramanathan a,b , ∗ ,1 , Heng Ma a , Akash Parvatikar c and S. Chakra Chennubhotla c a Data Science & Learning Division, Argonne National Laboratory, Lemont, IL, 60439, United States b Consortium for Advanced Science and Engineering (CASE), University of Chicago, Hyde Park, IL, United States c Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, 15260, United States
A R T I C L E I N F O
Keywords :artificial intelligencestatistical inferenceintrinsically disordered proteinsensembles
A B S T R A C T
We outline recent developments in artificial intelligence (AI) and machine learning (ML) techniquesfor integrative structural biology of intrinsically disordered proteins (IDP) ensembles. IDPs challengethe traditional protein structure-function paradigm by adapting their conformations in response tospecific binding partners leading them to mediate diverse, and often complex cellular functions suchas biological signaling, self organization and compartmentalization. Obtaining mechanistic insightsinto their function can therefore be challenging for traditional structural determination techniques.Often, scientists have to rely on piecemeal evidence drawn from diverse experimental techniques tocharacterize their functional mechanisms. Multiscale simulations can help bridge critical knowledgegaps about IDP structure function relationships - however, these techniques also face challenges inresolving emergent phenomena within IDP conformational ensembles. We posit that scalable statis-tical inference techniques can effectively integrate information gleaned from multiple experimentaltechniques as well as from simulations, thus providing access to atomistic details of these emergentphenomena.
1. Introduction
Our current understanding of protein structure-functionrelationships have been largely driven by the ability to vi-sualize high-resolution three-dimensional (3D) structures ofproteins with the aid of structure determination techniquesincluding X-ray crystallography, nuclear magnetic resonance(NMR), and cryo-electron microscopy (cryo-EM) [57]. Thesetraditional structure determination techniques have often beensupported with evidence from biochemical/biophysical meth-ods to map out the functional consequences of perturbingprotein structures through mutations and/or other modifica-tions and for drug-discovery, protein design and other appli-cations. However, the discovery of intrinsically disorderedproteins (IDPs), and proteins with intrinsically disorderedregions (IDRs) have challenged this traditional structure-functionrelationship paradigm [61]. In particular, IDPs/IDRs adapttheir 3D structures exquisitely in response to their substratesas well as post-translational modifications (such as phospho-rylation) and/or based on other physiological conditions (suchas pH, crowding, etc.) and can mediate context-specific func-tions within cells [46]. Indeed, IDPs/IDRs are known to beequally sensitive to perturbations to their primary sequence,where mutations can have devastating effects including mis-folding, protein aggregation (e.g., Parkinsons, Alzheimersand other “conformational diseases”) and dysregulation ofsignaling pathways (e.g., cancer, diabetes, cardiovascular dis-eases) [52]. Given their central role in mediating complexbiological functions within cells, understanding the structure-function paradigm of IDPs/IDRs remains an important chal- ∗ Corresponding author [email protected] (A. Ramanathan) https://ramanathanlab.org (A. Ramanathan)
ORCID (s): lenge for modern biophysics.The remarkable plasticity of IDPs/IDRs is enabled bytheir ability to undergo folding upon binding – one of thekey mechanistic processes whereby an IDP/IDR adopts dis-tinct secondary or even tertiary structure upon binding to aspecific substrate (Fig. 1). This coupled folding and bind-ing processes occur at diverse length- and time-scales – be-ginning with finer conformational changes involving partialfolding within IDR segments (e.g., helix-coil transitions) todisorder-to-order conformational transitions (e.g., formationof 𝛼 − helix) upon binding to a particular substrate [33]. Theselocal interactions can then drive the formation of higher-order interactions, whereby repeated “segments” of hydrophoic/polaramino-acid residues can transiently interact (albeit specif-ically) to their target substrates. These multivalent inter-actions in turn lead to coacervation or liquid-liquid phaseseparation (LLPS), which has important biological implica-tions, including compartmentalization (e.g., membranelessorganelles) [55]. One of the key challenges then is to eluci-date the mechanisms by which IDPs undergo coupled fold-ing and binding processes leading to such diverse functions.Although there has been tremendous progress in usingtraditional structure determination techniques in extendingthe length- and time-scales for studying IDPs [65], thesetechniques alone cannot fully describe the range of confor-mational flexibility of IDPs/IDRs. Further, given the intrin-sic limitations in the length- and time-scales that these tech-niques can access, often multiple experiments are needed toprobe the mechanisms by which IDPs/IDRs function, lead-ing to a piecemeal approach in interpreting IDPs/IDRs en-sembles [51]. Molecular dynamics (MD) simulations, eithervia all-atom simulations or enhanced sampling techniques ormultiscale coarse-grained methods provide a much needed A. Ramanathan et al.:
Preprint submitted to Elsevier
Page 1 of 9
I/ML methods for coupled folding and binding in disordered protein ensembles
Figure 1:
Role of AI/ML techniques in IDP/IDR biology.
Conformational fluctuations within IDPs occur at a wide range of time-(top panel) and length- (middle panel) scales. Further IDP systems are sensitive to physiological conditions, presence of biologicalmodulators, and other mechanisms such as post-translational modifications. Solution scattering (X-ray/ neutron), smFRET andNMR techniques provide access to probe IDP fluctutations over a wide range of length- and time-scales; while X-ray and cryo-electron microscopy/ tomography provide access to static snapshots across longer length scales. It is notable that even withincryo-EM and TEM datasets, inherent limitations in resolution can result in a lot of the flexible regions missing, leading to theuse of multiscale molecular simulations to fill in the gaps. However, even with improvements in enhanced/ adaptive samplingtechniques, computational methods and computer hardware, it has been difficult to access details beyond 𝑂 ( 𝜇 m ) length-scalesand 𝑂 ( ms ) . We posit that AI/ML approaches will act as a ‘glue’ that can enable integrating insights from simulations withexperiments while providing a platform to interpret mechanisms of IDP/IDR function. ‘boost’ in terms of sampling IDP conformational landscapes,allowing one to obtain insights into complex phenomena suchas LLPS [14]. Synergy between experiments and simula-tions have been quite successful in quantitatively probinghow IDPs/IDRs function; however, such studies find it chal-lenging when different experiments provide seemingly con-flicting evidence that are not necessarily explained by simu-lations [42]. Motivating the need for AI/ML approaches in integra-tive IDP structural biology.
Advances in machine learn-ing (ML) and artificial intelligence (AI) techniques have re-cently made strides in a number of scientific disciplines in-cluding molecular biophysics [40]. We posit that AI/MLtechniques can effectively act as a ‘glue’ to integrate dis-parate sources of experimental and simulation data and toinfer functional mechanisms of IDP/IDRs. In this review,we include a broad definition of how AI/ML methods are ap-plied, where traditional statistical inference methods can becombined with methods that include neural networks. We examine how AI/ML techniques are being utilized in ad-dressing the aforementioned challenges in IDP integrativestructural biology, namely: (1) characterizing the conforma-tional heterogeneity of IDP ensembles (Sec. 2), (2) multi-scaling (length- and time-scales) IDP ensembles to modelemergent phenomena such as LLPS (Sec. 3), and (3) in-tegration of sparse experimental observations with simula-tions to infer mechanisms of IDP function (Sec. 4). Ourreview seeks to complement recent developments in AI/MLapplications geared towards protein folding/ dynamics [40].Further, we seek to bridge these advances in the context ofsimulation techniques for studying emergent behavior [14].We finally conclude with a perspective on how AI/ML tech-niques can be integral in elucidating structure-function rela-tionships of IDP/IDRs (Sec. 5).
2. AI/ML for characterizing IDP ensembles
The range of conformations that IDPs can adapt is pri-marily attributed to the distribution of amino-acid residues
A. Ramanathan et al.:
Preprint submitted to Elsevier
Page 2 of 9I/ML methods for coupled folding and binding in disordered protein ensembles along their primary sequences, where the ratio of chargedresidues to hydrophobic residues gives rise to specific pat-terning enabling them to vary their secondary (tertiary, andsupra-molecular) structures in solution [37, 23]. Since se-quence based approaches by themselves are not sufficientto fully characterize IDP conformational landscapes, MD(and/or Monte Carlo) simulations are widely used to probemechanisms of their functions, typically accessing timescalesranging 𝑂 (10-100 𝜇 s) [49]. Dimensionality reduction methods to organize IDP con-formational landscapes.
ML/AI methods are necessaryto quantify the statistical dependencies in atomistic fluctua-tions to obtain biophysically-relevant low-dimensional rep-resentations spanned by IDP landscapes. Dimensionality re-duction methods summarize IDP ensembles in terms of asmall number of collective variables or latent dimensions,where projections of the conformations from the simulationscapture significant events along these dimensions [59]. Theseprojections are referred to as embeddings , where each con-formation is represented by the latent dimensions. An im-plicit requirement of these embedding techniques is that theygroup conformations in terms of biophysically-relevant ob-servables (e.g., root-mean squared deviations/RMSD, radiusof gyration/R 𝑔 ). Most dimensionality reduction techniquesare unsupervised – they exploit the intrinsic statistical struc-ture within the data to discover dependencies without theneed for explicit labels (for e.g., within an IDP ensemble,there is no explicit notion of what constitutes a folded/ par-tially folded/ unfolded state made available to the ML algo-rithm). Dimensionality reduction techniques can leveragelinear, non-linear, or hybrid methods to learn low-dimensionalembeddings and here we provide a succinct summary of howthey have been used to characterize IDP ensembles [8].Principal component analysis (PCA) is one such linearembedding method widely popular in analyzing simulationtrajectory datasets [59]. However, PCA and its derivativemethods lack the ability to characterize conformational di-versity purely based on covariance in positional fluctuationsalone. One key observation from several MD simulationsas well as experimentally determined IDP ensembles is thattheir positional fluctuations exhibit long-tail distributions – anatural consequence of their ability to undergo large confor-mational fluctuations and access rare states away from theirmean positions. These anharmonic fluctuations within IDPsare posited to be functionally relevant, since such fluctua-tions enable them to access conformational states relevantfor binding to their specific substrate. The anharmonicityalso gives rise to non-orthogonal correlations between indi-vidual atoms/amino-acid residues (depending on the resolu-tion at which the data is being analyzed) [6].ML techniques such as anharmonic conformational anal-ysis (ANCA) provide a convenient framework to analyze IDPensembles especially in the context of disorder-to-order tran-sitions [44]. ANCA uses fourth-order statistics to describethe atomic fluctuations and summarizes the internal motionsusing a small number of dominant anharmonic modes. In a recent study, time-resolved ANCA was used to characterizedisorder-to-order transitions in the BCL2 homology 3 do-main, BECN1 (BCL2-interacting coiled-coiled protein) asit binds to the murine 𝛾 -herpesvirus 68 (M11) B-cell lym-phoma 2 (BCL2) protein [47]. This approach identified asmall number of conformational states that acted as interme-diates in enabling M11-BCL2 to undergo partial unfoldingin response to BECN1 binding. It identified a network ofhydrophobic interactions, some farther than 10 Å from theBH3D binding cleft that underwent specific conformationalchanges upon binding. These interactions were validated us-ing mutagenesis and isothermal calorimetry demonstratingthat perturning the intrinsic anharmonicity within M11 canadversely affect both protein stability and BECN1 binding. Deep-learning methods in analyzing IDP ensembles.
Long-tailed fluctuations in IDP ensembles is a characteris-tic indicator of multiscale behavior (Fig. 1). Further, thelinearity assumptions in PCA and ANCA can be limiting inextracting multiscale features from the conformational land-scape, especially when such embeddings are non-trivial. Deeplearning methods that leverage neural networks have provento be successful in progressively extracting multiscale fea-tures from raw inputs [27].Deep neural networks such as autoencoders employ anhourglass shaped architecture where data is compressed intoa low-dimensional latent space in the early layers and thenreconstructed back [15]. The latent space learns to capturemost essential information required for accurate reconstruc-tion in the original dimensional space. Variational autoen-coders (VAE) is one such instantiation of autoencoders thatenforce the latent space to be normally distributed. Sev-eral variations of the VAE neural network architecture havebeen used to characterize latent representations from proteinfolding trajectories, such as variational dynamics encoder(VDE [22]), variational approaches for Markov processes(VAMP) [34], reweighted autoencoded variational Bayes forenhanced sampling (RAVE) [48], and the convolutional vari-ational autoencoder (CVAE) [3, 63]. Although the concep-tual use of the VAE is similar, their implementations canvary based on the essential features that they are used tolearn. For example, within VDE, the loss function includes aterm capturing the slowest processes in the simulation datasets,whereas the EncoderMap [29] utilizes a loss term that cap-tures the proximity of conformations in the free-energy land-scape. Complementary to these approaches, recurrent neu-ral networks (RNNs) can serve as effective methods to learntime-dependent embeddings from MD simulations. RNNs,which are used extensively in natural language processingand image processing applications can be used to embed MDsimulations to capture Boltzmann statistics from the systembut also accurately reproduce the kinetics across multipletimescales [60]. Another approach by Noe and colleaguesused a deep learning approach that is trained on a potentialenergy function and builds a generative model for conforma-tional ensembles that respects Boltzmann statistics [39]. Theuniqueness of this approach is that it is ‘one-shot’, meaning
A. Ramanathan et al.:
Preprint submitted to Elsevier
Page 3 of 9I/ML methods for coupled folding and binding in disordered protein ensembles that it does not need any reaction coordinates and can pro-duce unbiased samples, circumventing the expensive aspectsof MD/Monte-Carlo simulations.IDRs often function as linkers between several foldeddomains (in multidomain proteins). This gives rise to an ex-ponential number of states that they can sample, making itfurther challenging to characterize such complex landscapes.Dynamic graphical models (DGM) propose to address thisproblem by considering multidomain proteins as assembliesof coupled subsystems where each system is governed bythe states it can access as well as the states its neighborscan access [41]. Although DGMs use fewer parameters thantheir deep learning counterparts, it is difficult to incorporateprior experimental knowledge and recover atomistic config-urations from its encoded representations.
3. AI/ML for multiscale simulations of IDPensembles
In the previous section, we described some of the recentdevelopments applying ML approaches to characterize fold-ing conformational landscapes. In this section, we examinehow AI/ML methods can (1) inform efficient sampling oftheir conformational landscapes and (2) enable multiscalesimulations of emergent phenomena such as LLPS.
Determining reaction coordinates and enabling efficientsampling of IDP conformational landscapes.
The la-tent representations learned from MD simulations provideinformation relevant to reaction coordinates (RCs; also re-ferred to as collective variables, or order parameters) thatcorrespond to conformational changes along biophysicallyrelevant observables (for e.g., 𝑅 𝑔 values, or helicity, etc.).In a recent paper, Romero and colleagues demonstrated thatthe CVAE-learned embeddings can be used to cluster con-formations from long time-scale simulations of the lysoso-mal enzyzme glucocerebrosidase-1 (GCase) and its facilita-tor protein saposin C (SAPC) along several reaction coor-dinates [50]. The proposed conformational changes alongthe CVAE-determined RCs provided insights into key loopmovements at the entrance of the substrate-binding site withinGCase that are stabilized by direct interactions with SAPC.Note that this approach only used the raw simulation trajec-tories of GCase to infer the RCs and did not use any priorinformation (such as distance between residues or other fea-tures within GCase or SAPC). Similar insights can be drawnfrom other approaches as well [54, 58, 17]; however, the con-sequences of selecting a particular method versus what RCsthey extract, and how they represent interpretable (biophys-ically meaningful) RCs remains an open question.RCs extracted from the analyses of MD simulations canbe used to drive additional sampling of the conformationallandscape. This is the basis for many adaptive and enhancedsampling approaches [25]. Techniques such as variationalenhanced sampling (VES) [4], VAMPnets [34], and RAVE [48]already include approaches for enhanced sampling. BothVES and VAMPnets utilize the variational approximation to enhance the sampling based on some set of reaction coordi-nates that can be determined by analyzing the MD simula-tions (see Sec. 2). However, RAVE utilizes the predictive in-formation bottleneck principle as an RC, where it can predictthe most likely future trajectory given a molecule’s past tra-jectory. This principle, combined with the estimates for themost informative RCs (automatically determined from theinformation gain associated with sampling along subsets ofRCs), the associated metastable states and equilibrium prop-erties provides simultaneous access to uncover the unbiasedkinetics for moving between different metastable states [26].Generative adversarial networks (GAN) [20] have alsobeen used for enhanced sampling, where on-the-fly trainingis used to modify the potential energy surface in order todrive the system to a user-defined target distribution wherethe free-energy barrier is lowered. This approach, calledtargeted adversarial learning optimized sampling (TALOS)uses MD simulations (for ‘generating’ protein conformations)and a discriminator (differentiate samples generated by thebiased sampler from those drawn from the desired target dis-tribution) to automatically guide the sampling process [69].This approach is inspired from actor-critic reinforcement learn-ing ideas and is complementary to approaches such as rein-forcement based adaptive sampling (REAP) [56].While AI/ML-driven MD simulations have been demon-strated for smaller peptide/protein systems, there is a needfor effective middleware that can orchestrate complex work-flows and manage resources efficiently [25]. Conventional(non deep learning) ML approaches take perhaps between afew seconds to may be a couple of hours to run and can easilybe run concurrently with MD simulation jobs as long as thedata is made available for analysis. Training deep learningmodels on the other hand, can potentially take several hours(and even days) similar to the same timeline as MD simu-lations, which means resource management and schedulinghas to be managed to make use of available compute time ef-fectively. To address these issues, DeepDriveMD [28] cou-ples the CVAE [3] with adaptive MD simulations to acceler-ate folding of small proteins (up to 45 residues) on emergingsupercomputers. DeepDriveMD’s adaptive protocol couldaccelerate the sampling by at least 2.3x compared to tradi-tional approaches. The adaptive sampling protocols usedwithin simulation frameworks can be cast more generallyas an optimization problem for balancing the cost of explo-ration (i.e., searching the IDP landscape) versus exploitation(i.e., utilizing existing knowledge to accelerate the search).The AdaptiveBandit [45] technique uses a reinforcement learn-ing based approach where an action-value function and anupper confidence bound selection algorithm allows for sub-stantial improvement of the sampling strategy. AI/ML approaches for learning force-field parametersand multiscale approaches.
Sampling IDP landscapes im-plies the need to access a wide range of conformations, eventhose with relatively low probabilities. While enhanced andadaptive sampling techniques provide an opportunity to ac-cess such low-probability conformational states, the timescales
A. Ramanathan et al.:
Preprint submitted to Elsevier
Page 4 of 9I/ML methods for coupled folding and binding in disordered protein ensembles that simulations can access is still limited [2]. Another po-tential challenge that limits the scale of sampling IDP/IDRlandscape arises from the force field parameters used for thesesimulations. Several recent advances in force field parame-ter development do address these limitations specifically forIDP/IDR systems (see [68, 66, 11]); however, artifacts re-lated to how they are parameterized and how they end upcapturing interfacial dynamics between IDPs and water (orother solvent conditions) still affect the overall quality ofsampling [67, 1].A complementary approach to this strategy is to use ML/AIto iteratively fit and refine force-field parameters in a data-driven fashion. One such approach, called ForceBalance-SAS [12] (1) uses an initial ‘best’ set of parameters, (2) com-putes ensemble averaged small-angle scattering intensitiesfrom MD simulations, (3) measures the residual with respectto experimental data, along with the gradient and Hessian ofthe residual, and (4) optimizes this fitting process from (1-3)iteratively until convergence criteria are achieved. This pro-cess continues with the newly updated set of parameters andsimulations, completing the cycle. ForceBalance-SAS canoptimize parameters for IDPs with varying molecular weightand different charge-hydrophobicity characteristics, albeit ina system-specific manner. While ForceBalance-SAS fits tothe global small-angle scattering profiles, the force field pa-rameters also resulted in better agreements with NMR chem-ical shifts (local observables). Further, the learned param-eters could be transferred and applied to other systems par-tially (for shorter time-scale simulations).For simulating emergent phenomena such as LLPS, coarse-graining is an essential step for making simulations tractable.While there are many approaches to coarse-grain simula-tions for LLPS (see review by Mittal and colleagues [14]),ML/AI approaches can aid in the development of data-drivenrepresentations from all-atom simulations for parameters neededat the coarse-grained resolution. One recent approach calledlattice simulation engine for sticker and spacer interactions(LASSI) utilizes Boltzmann inversion, non-linear regressionand a Gaussian process Bayesian optimization approach toparameterize the coarse-grained method for modeling sequence-specific phase-behaviors [10]. Additionally, force-field pa-rameters simulating sequence-specific phase behavior couldbe enabled by an approach such as CAMELOT [53].Deep learning approaches can also be used to automati-cally infer coarse-grained representations from all-atom sim-ulations [70, 64]. Advances in graph neural networks areaiding the development of accurate coarse-grained force fieldparameters [24]. It however remains to be seen how theseapproaches can be in turn generalized for IDP systems[38].Similarly, the Multiscale Machine-learned Modeling Infras-tructure (MuMMI) [13] was developed to couple a contin-uum model with coarse-grained MD simulations using MLapproaches to characterize how the oncogene RAS interactswith complex biological membranes. Complementary to thisapproach, adversarial autoencoders were coupled to multi-scale simulations of the severe acute respiratory coronavirus2 (SARS-CoV-2) Spike protein in complex with the angiotensin- converting enzyme 2 (ACE2) receptor protein to probe themechanisms of its infectivity [7]. Automatic coarse-grainingapproaches using AI approaches can be really attractive fortuning the scale of coarse-graining that needs to be performedsuch that IDP/IDR landscapes can be adaptively sampled toobtain precise atomistic scale information about LLPS. Fur-ther, the ability to simulate self-consistent ensembles at mul-tiple resolutions (continuum → coarse-grained → all-atom)will be critical for integrative structural biology applicationsin the context of combining information from diverse exper-imental techniques (see Sec. 4.)
4. Statistical inference for integratingexperimental data with simulations
The previous sections outlined the use of AI/ML for char-acterizing IDP/IDR ensembles. But the true power of ob-taining insights into the mechanisms of how IDPs functionand how their functions can be exploited for therapeutic de-sign [52], novel material discovery [16], and synthetic bi-ology applications (e.g., membraneless organelles for trans-port) [57] comes from the integration of theory and simu-lations with experimental data. The challenge with experi-mental data, however is that it can be noisy, sparse, and oftenprovide only partial information when investigating a partic-ular phenomenon [42]. For example, solution scattering datafor IDPs are usually summarized using the scattering inten-sities against a coarse structural measure such as 𝑅 𝑔 , [31]and in the case of single molecular Forster resonance energytransfer (sm-FRET) experiments, a set of distances is mea-sured across the IDP structure [36]. Simulations on the otherhand, represent a full-scale system with all degrees of free-dom (e.g., 𝑁 , where N represents the individual atoms)implying a mismatch with the intrinsic dimensionality ofexperimental data. In such cases, how can one fit sparseexperimental observables with simulation datasets? A sec-ond challenge arises when experimental datasets are unableto resolve flexible regions in a protein (e.g., cryo-electronmicroscopy)[32]. Given that often such flexible regions holdkey insights in terms of understanding ensembles of multi-domain proteins, simulations can fill in the gaps by providingprobable states that these regions occupy. But the intrinsicgap in terms of timescales that can be accessed by simula-tions often ends up making it difficult to extract such infor-mation. Thus, AI approaches, augmented with Bayesian ap-proaches can be quite helpful in bridging the gaps betweenexperiments and simulations [5, 42].There are two broad strategies for fitting simulation datasetswith experiments. One strategy involves the use of unbi-ased simulations and then reweighting the generated ensem-bles using either maximum parsimony/ entropy approachesor with Bayesian strategies that uses information known fromsimulations as a prior before the introduction of experimen-tal observables. The complementary strategy involves theuse of a biased simulations that are parameterized from ex-periments or using iterative approaches outlined in [12] torefine the force field parameters to sample the IDP landscape A. Ramanathan et al.:
Preprint submitted to Elsevier
Page 5 of 9I/ML methods for coupled folding and binding in disordered protein ensembles of interest. Similarly, integrated experimental and computa-tional simulations are also being used to understand ener-getics of interactions between an IDP and its binding part-ner [71]. Recent work by Lincoff and colleagues [30] alsoextends the experimental inferential structure determinationusing a Bayesian formulation that calculates the maximumlog-likelihodd of a conformational ensemble by accountingfor the uncertainties across a variety of experimental dataand back-calculation models. A similar integrated modelingapproach by Gomes and colleagues [19] demonstrated howconformational restraints imposed using NMR, SAXS, andsm-FRET approaches could reach agreement in the ensem-bles of Sic1 and phosphorylated Sic1.
5. Challenges and outlook
From quantitatively probing the complex conformationallandscapes of IDPs to identifying disorder-to-order transi-tions or modeling emergent phenomena such as LLPS, ML/AIapproaches are proving to be an indispensable tool for bothexperimental biophysicists as well as modelers. However,ML/AI approaches for MD simulations still face some chal-lenges that need to be addressed.Current AI/ML applications (barring a few like [44, 39])tend to use fitting procedures in a blind manner, withoutmuch physical bearing, or paying attention to the underlyingstatistical physics of the system of interest. The resultingfitting procedure can end up overfitting and may not gen-eralize to fully leverage the power of ML/AI in other do-mains [43, 21]. In particular, transferring a ML/AI modellearned across simulations can be challenging. ML/AI meth-ods may also get stuck in regimes that are not entirely phys-ical – leading to issues in how appropriate (weighted) sam-pling can be achieved. Although techniques such as cross-validation and regularization alleviate these problems, thereis a need to develop rigorous statistical techniques as well asinteractive tools that can assess the performance of ML/AImodels. A second challenge arises when force field param-eters are designed using ML/AI. Here, the challenge is inmaintaining control over the versions of the force-field de-signed by ML/AI approaches – where initial conditions ordatasets used for training, the inherent stochastic nature ofhow deep learning approaches work, and even program im-plementations (differences between how TensorFlow and Py-Torch modules are implemented) – can result in highly di-vergent results, even if the physically represented parametersmay in fact lie reasonably within the same range. While theML/AI community already does similar activities throughrigorous benchmarking applications [35], a similar effort fromthe IDP/IDR community is needed to ensure robustness, reusabil-ity, and reproducibility of models across multiple studies.Efforts such as the IDP ensemble [62] database provide forsuch an opportunity; however, there is a need for the community-wide engagement to assess these intrinsic issues.Further, many of the ML/AI results are considered blackbox , meaning that it is difficult to reason how the ML/AImodel made its inference. Even though there have been some advances in enabling interactivity with the outputs from theML/AI models [9], there is still the challenge of making itinteractive when large datasets are streamed. Developmentsin interactive data analysis and virtual reality can aid this,although significant developments are needed to make theseapproaches practical for emerging datasets.Finally, computational infrastructure to support ML/AIworkflows in concert with simulations has been a long-standingchallenge [18]. Traditional approaches run MD simulationscontinuously, store these large datasets and eventually ana-lyze them with ML/AI methods. However, in the Exascalecomputing era, such approaches will become infeasible asthe sheer volume of data generated by these machines canfar exceed the capabilities of analyses that needs to be done(and occasionally, computing resources for AI/ML can ex-ceed that of simulations). Approaches such as DeepDriveMD [28]are examining such emerging needs of complex workflows;however, we believe there is much research that needs tobe done in order to understand how AI/ML workloads willinteract with future simulation workloads. With newer de-velopments in AI techniques, there is an opportunity to ac-celerate our understanding of how IDPs play a role in dis-ease, developing novel means to design small-molecule in-hibitors, designing new bio-materials, and engineering self-assembling systems for synthetic biology applications. Webelieve these represent exciting opportunities for the future.
Acknowledgements
A.R. thanks Anda Trifan for assistance in editing andproof-reading the manuscript. This research was supportedby Argonne Laboratory Directed Research and DevelopmentComputing Expedition project (A.R.) and NIH/NIGMS GM105978(S.C.C.).
References [1] Best, R.B., 2020. Emerging consensus on the collapse of unfoldedand intrinsically disordered proteins in water. Current Opinion inStructural Biology 60, 27 – 38. URL: , doi: https://doi.org/10.1016/j.sbi.2019.10.009 . folding and Binding Proteins.[2] Bhattacharya, S., Lin, X., 2019. Recent advances in computationalprotocols addressing intrinsically disordered proteins. Biomolecules9. URL: , doi: .[3] Bhowmik, D., Gao, S., Young, M.T., Ramanathan, A., 2018. Deepclustering of protein folding simulations. BMC Bioinformatics 19,484. URL: https://doi.org/10.1186/s12859-018-2507-5 , doi: .[4] Bonati, L., Zhang, Y.Y., Parrinello, M., 2019. Neural networks-based variationally enhanced sampling. Proceedings of the Na-tional Academy of Sciences 116, 17641–17647. URL: , doi: , .[5] Bottaro, S., Lindorff-Larsen, K., 2018. Biophysical ex-periments and biomolecular simulations: A perfect match?Science 361, 355–360. URL: https://science.sciencemag.org/content/361/6400/355 , doi: , arXiv:https://science.sciencemag.org/content/361/6400/355.full.pdf .[6] Burger, V.M., Ramanathan, A., Savol, A.J., Stanley, C.B., Agarwal,P.K., Chennubhotla, C.S., 2011. Quasi-anharmonic analysis reveals A. Ramanathan et al.:
Preprint submitted to Elsevier
Page 6 of 9I/ML methods for coupled folding and binding in disordered protein ensembles intermediate states in the nuclear co-activator receptor binding do-main ensemble. in: Biocomputing 2012. Pacific Symposium on Bio-computing 1. doi: .[7] Casalino, L., Dommer, A., Gaieb, Z., Barros, E.P., Sztain, T.,Ahn, S.H., Trifan, A., Brace, A., Bogetti, A., Ma, H., Lee,H., Turilli, M., Khalid, S., Chong, L., Simmerling, C., Hardy,D.J., Maia, J.D.C., Phillips, J.C., Kurth, T., Stern, A., Huang,L., McCalpin, J., Tatineni, M., Gibbs, T., Stone, J.E., Jha, S.,Ramanathan, A., Amaro, R.E., 2020. Ai-driven multiscalesimulations illuminate mechanisms of sars-cov-2 spike dynam-ics. bioRxiv URL: , doi: , .[8] Ceriotti, M., 2019. Unsupervised machine learning in atomistic simu-lations, between predictions and understanding. The Journal of Chem-ical Physics 150, 150901. URL: https://doi.org/10.1063/1.5091842 ,doi: , arXiv:https://doi.org/10.1063/1.5091842 .[9] Chae, J., Bhowmik, D., Ma, H., Ramanathan, A., Steed, C., 2019. Vi-sual analytics for deep embeddings of large scale molecular dynam-ics simulations, in: 2019 IEEE International Conference on Big Data(Big Data), pp. 1759–1764.[10] Choi, J.M., Dar, F., Pappu, R.V., 2019. Lassi: A lattice model forsimulating phase transitions of multivalent proteins. PLOS Compu-tational Biology 15, 1–39. URL: https://doi.org/10.1371/journal.pcbi.1007028 , doi: .[11] Choi, J.M., Pappu, R.V., 2019. Experimentally derived and compu-tationally optimized backbone conformational statistics for blockedamino acids. Journal of Chemical Theory and Computation 15, 1355–1366. URL: https://doi.org/10.1021/acs.jctc.8b00572 , doi: .[12] Demerdash, O., Shrestha, U.R., Petridis, L., Smith, J.C., Mitchell,J.C., Ramanathan, A., 2019. Using small-angle scattering data andparametric machine learning to optimize force field parameters for in-trinsically disordered proteins. Frontiers in Molecular Biosciences 6,64. URL: , doi: .[13] Di Natale, F., Bhatia, H., Carpenter, T.S., Neale, C., Kokkila-Schumacher, S., Oppelstrup, T., Stanton, L., Zhang, X., Sundram, S.,Scogland, T.R.W., Dharuman, G., Surh, M.P., Yang, Y., Misale, C.,Schneidenbach, L., Costa, C., Kim, C., D’Amora, B., Gnanakaran,S., Nissley, D.V., Streitz, F., Lightstone, F.C., Bremer, P.T., Glosli,J.N., Ingólfsson, H.I., 2019. A massively parallel infrastructure foradaptive multiscale simulations: Modeling ras initiation pathway forcancer, in: Proceedings of the International Conference for High Per-formance Computing, Networking, Storage and Analysis, Associa-tion for Computing Machinery, New York, NY, USA. URL: https://doi.org/10.1145/3295500.3356197 , doi: .[14] Dignon, G.L., Zheng, W., Mittal, J., 2019. Simulation methodsfor liquid–liquid phase separation of disordered proteins. Cur-rent Opinion in Chemical Engineering 23, 92 – 98. URL: ,doi: https://doi.org/10.1016/j.coche.2019.03.004 . frontiers ofChemical Engineering: Molecular Modeling.[15] Doersch, C., 2016. Tutorial on variational autoencoders. arXiv:1606.05908 .[16] Dzuricky, M., Roberts, S., Chilkoti, A., 2018. Convergence of artifi-cial protein polymers and intrinsically disordered proteins. Biochem-istry 57, 2405–2414. URL: https://doi.org/10.1021/acs.biochem.8b00056 , doi: .[17] Fakharzadeh, A., Moradi, M., 2016. Effective riemannian dif-fusion model for conformational dynamics of biomolecular sys-tems. The Journal of Physical Chemistry Letters 7, 4980–4987. URL: https://doi.org/10.1021/acs.jpclett.6b02208 , doi: .[18] Fox, G., Glazier, J.A., Kadupitiya, J., Jadhao, V., Kim, M., Qiu, J.,Sluka, J.P., Somogyi, E., Marathe, M., Adiga, A., Chen, J., Beckstein,O., Jha, S., 2019. Learning everywhere: Pervasive machine learningfor effective high-performance computation. arXiv:1902.10810 . [19] Gomes, G.N.W., Krzeminski, M., Namini, A., Martin, E.W., Mittag,T., Head-Gordon, T., Forman-Kay, J.D., Gradinaru, C.C., 2020. Con-formational ensembles of an intrinsically disordered protein consis-tent with nmr, saxs, and single-molecule fret. Journal of the Ameri-can Chemical Society 142, 15697–15710. URL: https://doi.org/10.1021/jacs.0c02088 , doi: .[20] Goodfellow, I., 2016. Nips 2016 tutorial: Generative adversarial net-works. arXiv preprint arXiv:1701.00160 .[21] Goolsby, C., Moradi, M., 2020. Addressing the embeddabilityproblem in transition rate estimation. bioRxiv URL: , doi: , .[22] Hernández, C.X., Wayment-Steele, H.K., Sultan, M.M., Husic, B.E.,Pande, V.S., 2018. Variational encoding of complex dynamics.Phys. Rev. E 97, 062412. URL: https://link.aps.org/doi/10.1103/PhysRevE.97.062412 , doi: .[23] Horvath, A., Miskei, M., Ambrus, V., Vendruscolo, M., Fuxreiter, M.,2020. Sequence-based prediction of protein binding mode landscapes.PLOS Computational Biology 16, 1–19. URL: https://doi.org/10.1371/journal.pcbi.1007864 , doi: .[24] Husic, B.E., Charron, N.E., Lemm, D., Wang, J., Pérez, A., Krämer,A., Chen, Y., Olsson, S., de Fabritiis, G., Noé, F., Clementi, C.,2020. Coarse graining molecular dynamics with graph neural net-works. arXiv:2007.11412 .[25] Kasson, P.M., Jha, S., 2018. Adaptive ensemble simulationsof biomolecules. Current Opinion in Structural Biology 52,87 – 94. URL: , doi: https://doi.org/10.1016/j.sbi.2018.09.005 . cryo electron microscopy: the impact of the cryo-EM revolutionin biology • Biophysical and computational methods - Part A.[26] Lamim Ribeiro, J.M., Tiwary, P., 2019. Toward achieving effi-cient and accurate ligand-protein unbinding with deep learning andmolecular dynamics through rave. Journal of Chemical Theory andComputation 15, 708–719. URL: https://doi.org/10.1021/acs.jctc.8b00869 , doi: .[27] LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521,436–444. URL: https://doi.org/10.1038/nature14539 , doi: .[28] Lee, H., Turilli, M., Jha, S., Bhowmik, D., Ma, H., Ramanathan, A.,2019. Deepdrivemd: Deep-learning driven adaptive molecular simu-lations for protein folding, in: 2019 IEEE/ACM Third Workshop onDeep Learning on Supercomputers (DLS), pp. 12–19.[29] Lemke, T., Peter, C., 2019. Encodermap: Dimensionality reduc-tion and generation of molecule conformations. Journal of ChemicalTheory and Computation 15, 1209–1215. URL: https://doi.org/10.1021/acs.jctc.8b00975 , doi: .[30] Lincoff, J., Haghighatlari, M., Krzeminski, M., Teixeira, J.M.C.,Gomes, G.N.W., Gradinaru, C.C., Forman-Kay, J.D., Head-Gordon,T., 2020. Extended experimental inferential structure determinationmethod in determining the structural ensembles of disordered proteinstates. Communications Chemistry 3, 74. URL: https://doi.org/10.1038/s42004-020-0323-0 , doi: .[31] Lipfert, J., Doniach, S., 2007. Small-angle x-ray scatter-ing from rna, proteins, and protein complexes. Annual Re-view of Biophysics and Biomolecular Structure 36, 307–327. URL: https://doi.org/10.1146/annurev.biophys.36.040306.132655 , doi: , arXiv:https://doi.org/10.1146/annurev.biophys.36.040306.132655 .pMID: 17284163.[32] Lyumkis, D., 2019. Challenges and opportunities incryo-em single-particle analysis. Journal of BiologicalChemistry 294, 5181–5197. URL: , doi: , .[33] Majumdar, A., Dogra, P., Maity, S., Mukhopadhyay, S., 2019.Liquid–liquid phase separation is driven by large-scale con-formational unwinding and fluctuations of intrinsically dis-ordered protein molecules. The Journal of Physical Chem- A. Ramanathan et al.:
Preprint submitted to Elsevier
Page 7 of 9I/ML methods for coupled folding and binding in disordered protein ensembles istry Letters 10, 3929–3936. URL: https://doi.org/10.1021/acs.jpclett.9b01731 , doi: , arXiv:https://doi.org/10.1021/acs.jpclett.9b01731 . pMID:31260322.[34] Mardt, A., Pasquali, L., Wu, H., Noé, F., 2018. Vampnets fordeep learning of molecular kinetics. Nature Communications 9,5. URL: https://doi.org/10.1038/s41467-017-02388-1 , doi: .[35] Mattson, P., Reddi, V.J., Cheng, C., Coleman, C., Diamos, G., Kanter,D., Micikevicius, P., Patterson, D., Schmuelling, G., Tang, H., Wei,G., Wu, C., 2020. Mlperf: An industry standard benchmark suite formachine learning performance. IEEE Micro 40, 8–16.[36] Metskas, L.A., Rhoades, E., 2020. Single-moleculefret of intrinsically disordered proteins. Annual Re-view of Physical Chemistry 71, 391–414. URL: https://doi.org/10.1146/annurev-physchem-012420-104917 ,doi: , arXiv:https://doi.org/10.1146/annurev-physchem-012420-104917 .pMID: 32097582.[37] Miskei, M., Horvath, A., Vendruscolo, M., Fuxreiter, M., 2020.Sequence-based prediction of fuzzy protein interactions. Jour-nal of Molecular Biology 432, 2289 – 2303. URL: ,doi: https://doi.org/10.1016/j.jmb.2020.02.017 .[38] Noé, F., 2020. Machine Learning for Molecular Dynamics on LongTimescales. Springer International Publishing, Cham. pp. 331–372. URL: https://doi.org/10.1007/978-3-030-40245-7_16 , doi: .[39] Noé, F., Olsson, S., Köhler, J., Wu, H., 2019. Boltzmann gen-erators: Sampling equilibrium states of many-body systems withdeep learning. Science 365. URL: https://science.sciencemag.org/content/365/6457/eaaw1147 , doi: , arXiv:https://science.sciencemag.org/content/365/6457/eaaw1147.full.pdf .[40] Noé, F., De Fabritiis, G., Clementi, C., 2020. Machine learning forprotein folding and dynamics. Current Opinion in Structural Biology60, 77 – 84. URL: , doi: https://doi.org/10.1016/j.sbi.2019.12.005 . folding and Binding Proteins.[41] Olsson, S., Noé, F., 2019. Dynamic graphical mod-els of molecular kinetics. Proceedings of the NationalAcademy of Sciences 116, 15001–15006. URL: , doi: , .[42] Orioli, S., Larsen, A.H., Bottaro, S., Lindorff-Larsen, K., 2020. Chap-ter three - how to learn from inconsistencies: Integrating molec-ular simulations with experimental data, in: Strodel, B., Barz, B.(Eds.), Computational Approaches for Understanding DynamicalSystems: Protein Folding and Assembly. Academic Press. volume170 of Progress in Molecular Biology and Translational Science ,pp. 123 – 176. URL: , doi: https://doi.org/10.1016/bs.pmbts.2019.12.006 .[43] Pant, S., Smith, Z., Wang, Y., Tajkhorshid, E., Tiwary, P., 2020.Confronting pitfalls of ai-augmented molecular dynamics usingstatistical physics. bioRxiv URL: , doi: , .[44] Parvatikar, A., Vacaliuc, G.S., Ramanathan, A., Chen-nubhotla", S.C., "2018". "anca: Anharmonic confor-mational analysis of biomolecular simulations". "Bio-physical Journal" "114", "2040 – 2043". URL: ,doi: "https://doi.org/10.1016/j.bpj.2018.03.021" .[45] Pérez, A., Herrera-Nieto, P., Doerr, S., De Fabritiis, G., 2020. Adap-tivebandit: A multi-armed bandit framework for adaptive sampling inmolecular simulations. Journal of Chemical Theory and Computation16, 4685–4693. URL: https://doi.org/10.1021/acs.jctc.0c00205 ,doi: . [46] Phillips, A.H., Kriwacki, R.W., 2020. Intrinsic protein dis-order and protein modifications in the processing of biologi-cal signals. Current Opinion in Structural Biology 60, 1– 6. URL: , doi: https://doi.org/10.1016/j.sbi.2019.09.003 .folding and Binding Proteins.[47] Ramanathan, A., Parvatikar, A., Chennubhotla, S.C., Mei, Y., Sinha,S.C., 2020. Transient unfolding and long-range interactions in viralbcl2 m11 enable binding to the becn1 bh3 domain. Biomolecules10. URL: , doi: .[48] Ribeiro, J.M.L., Bravo, P., Wang, Y., Tiwary, P., 2018.Reweighted autoencoded variational bayes for enhanced sam-pling (rave). The Journal of Chemical Physics 149, 072301.URL: https://doi.org/10.1063/1.5025487 , doi: , arXiv:https://doi.org/10.1063/1.5025487 .[49] Robustelli, P., Piana, S., Shaw, D.E., 2020. Mechanism ofcoupled folding-upon-binding of an intrinsically disordered pro-tein. Journal of the American Chemical Society 142, 11092–11101. URL: https://doi.org/10.1021/jacs.0c03217 , doi: , arXiv:https://doi.org/10.1021/jacs.0c03217 . pMID:32323533.[50] Romero, R., Ramanathan, A., Yuen, T., Bhowmik, D., Mathew,M., Munshi, L.B., Javaid, S., Bloch, M., Lizneva, D., Rahi-mova, A., Khan, A., Taneja, C., Kim, S.M., Sun, L., New, M.I.,Haider, S., Zaidi, M., 2019. Mechanism of glucocerebrosi-dase activation and dysfunction in gaucher disease unraveledby molecular dynamics and deep learning. Proceedings of theNational Academy of Sciences 116, 5086–5095. URL: , doi: , .[51] Rout, M.P., Sali, A., 2019. Principles for integrative structural biologystudies. Cell 177, 1384 – 1403. URL: , doi: https://doi.org/10.1016/j.cell.2019.05.016 .[52] Ruan, H., Sun, Q., Zhang, W., Liu, Y., Lai, L., 2019. Target-ing intrinsically disordered proteins at the edge of chaos. DrugDiscovery Today 24, 217 – 227. URL: , doi: https://doi.org/10.1016/j.drudis.2018.09.017 .[53] Ruff, K.M., Harmon, T.S., Pappu, R.V., 2015. Camelot: A ma-chine learning approach for coarse-grained simulations of aggrega-tion of block-copolymeric protein sequences. The Journal of Chem-ical Physics 143, 243123. URL: https://doi.org/10.1063/1.4935066 ,doi: , arXiv:https://doi.org/10.1063/1.4935066 .[54] Rydzewski, J., Valsson, O., 2020. Multiscale reweighted stochasticembedding (mrse): Deep learning of collective variables for enhancedsampling. arXiv:2007.06377 .[55] Schuler, B., Borgia, A., Borgia, M.B., Heidarsson, P.O., Holm-strom, E.D., Nettels, D., Sottini, A., 2020. Binding with-out folding – the biomolecular function of disordered polyelec-trolyte complexes. Current Opinion in Structural Biology 60,66 – 76. URL: , doi: https://doi.org/10.1016/j.sbi.2019.12.006 . folding and Binding Proteins.[56] Shamsi, Z., Cheng, K.J., Shukla, D., 2018. Reinforcement learn-ing based adaptive sampling: Reaping rewards by exploring proteinconformational landscapes. The Journal of Physical Chemistry B122, 8386–8395. URL: https://doi.org/10.1021/acs.jpcb.8b06521 ,doi: .[57] Shin, Y., Brangwynne, C.P., 2017. Liquid phasecondensation in cell physiology and disease. Sci-ence 357. URL: https://science.sciencemag.org/content/357/6357/eaaf4382 , doi: , arXiv:https://science.sciencemag.org/content/357/6357/eaaf4382.full.pdf .[58] Smith, Z., Ravindra, P., Wang, Y., Cooley, R., Tiwary, P., 2020.Discovering protein conformational flexibility through artificial-intelligence-aided molecular dynamics. The Journal of Physical A. Ramanathan et al.:
Preprint submitted to Elsevier
Page 8 of 9I/ML methods for coupled folding and binding in disordered protein ensembles
Chemistry B 124, 8221–8229. URL: https://doi.org/10.1021/acs.jpcb.0c03985 , doi: .[59] Tribello, G.A., Gasparotto, P., 2019. Using dimensionality reductionto analyze protein trajectories. Frontiers in Molecular Biosciences 6,46. URL: , doi: .[60] Tsai, S.T., Kuo, E.J., Tiwary, P., 2020. Learning molecular dynam-ics with simple language model built upon long short-term memoryneural network. arXiv:2004.12360 .[61] Uversky, V.N., 2019. Intrinsically disordered proteins and their “mys-terious” (meta)physics. Frontiers in Physics 7, 10. URL: , doi: .[62] Varadi, M., Tompa, P., 2015. The Protein EnsembleDatabase. Springer International Publishing, Cham. pp. 335–349. URL: https://doi.org/10.1007/978-3-319-20164-1_11 ,doi: .[63] Varolgüneş, Y.B., Bereau, T., Rudzinski, J.F., 2020. Interpretableembeddings from molecular simulations using gaussian mixture vari-ational autoencoders. Machine Learning: Science and Technology1, 015012. URL: https://doi.org/10.1088%2F2632-2153%2Fab80b7 ,doi: .[64] Wang, J., Olsson, S., Wehmeyer, C., Pérez, A., Charron, N.E.,de Fabritiis, G., Noé, F., Clementi, C., 2019. Machine learning ofcoarse-grained molecular dynamics force fields. ACS Central Sci-ence 5, 755–767. URL: https://doi.org/10.1021/acscentsci.8b00913 ,doi: .[65] Xie, M., Yu, L., Bruschweiler-Li, L., Xiang, X., Hansen, A.L.,Brüschweiler, R., 2019. Functional protein dynamics on un-charted time scales detected by nanoparticle-assisted nmr spinrelaxation. Science Advances 5. URL: https://advances.sciencemag.org/content/5/8/eaax5560 , doi: , arXiv:https://advances.sciencemag.org/content/5/8/eaax5560.full.pdf .[66] Yang, S., Liu, H., Zhang, Y., Lu, H., Chen, H., 2019. Residue-specificforce field improving the sample of intrinsically disordered proteinsand folded proteins. Journal of Chemical Information and Modeling59, 4793–4805. URL: https://doi.org/10.1021/acs.jcim.9b00647 ,doi: .[67] Zapletal, V., Mládek, A., Melková, K., Louša, P., Nomilner, E.,Jaseňáková, Z., Kubáň, V., Makovická, M., Laníková, A., Žídek, L.,Hritz, J., 2020. Choice of force field for proteins containing structuredand intrinsically disordered regions. Biophysical Journal 118, 1621– 1633. URL: , doi: https://doi.org/10.1016/j.bpj.2020.02.019 .[68] Zerze, G.H., Zheng, W., Best, R.B., Mittal, J., 2019. Evolu-tion of all-atom protein force fields to improve local and globalproperties. The Journal of Physical Chemistry Letters 10, 2227–2234. URL: https://doi.org/10.1021/acs.jpclett.9b00850 , doi: .[69] Zhang, J., Yang, Y.I., Noé, F., 2019. Targeted adversarial learn-ing optimized sampling. The Journal of Physical Chemistry Letters10, 5791–5797. URL: https://doi.org/10.1021/acs.jpclett.9b02173 ,doi: .[70] Zhang, L., Han, J., Wang, H., Car, R., E, W., 2018. Deepcg:Constructing coarse-grained models via deep neural net-works. The Journal of Chemical Physics 149, 034101. URL: https://doi.org/10.1063/1.5027645 , doi: , arXiv:https://doi.org/10.1063/1.5027645 .[71] Zou, J., Simmerling, C., Raleigh, D.P., 2019. Dissecting the en-ergetics of intrinsically disordered proteins via a hybrid experimen-tal and computational approach. The Journal of Physical Chem-istry B 123, 10394–10402. URL: https://doi.org/10.1021/acs.jpcb.9b08323 , doi: . A. Ramanathan et al.: