Advances to tackle backbone flexibility in protein docking
AAdvances to tackle backbone flexibility in protein docking
Ameya Harmalkar a , Jeffrey J. Gray a,b, ∗ a Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA b Program in Molecular Biophysics, Institute for Nanobiotechnology, and Center for Computational Biology, JohnsHopkins University, Baltimore, MD, USA
Abstract
Computational docking methods can provide structural models of protein-protein complexes, butprotein backbone flexibility upon association often thwarts accurate predictions. In recent blind chal-lenges, medium or high accuracy models were submitted in less than 20% of the “difficult” targets (withsignificant backbone change or uncertainty). Here, we describe recent developments in protein-proteindocking and highlight advances that tackle backbone flexibility. In molecular dynamics and Monte Carloapproaches, enhanced sampling techniques have reduced time-scale limitations. Internal coordinate for-mulations can now capture realistic motions of monomers and complexes using harmonic dynamics. Andmachine learning approaches adaptively guide docking trajectories or generate novel binding site pre-dictions from deep neural networks trained on protein interfaces. These tools poise the field to breakthrough the longstanding challenge of correctly predicting complex structures with significant conforma-tional change.
Keywords: backbone flexibility | protein-protein interactions | conformational space Introduction
Protein-protein interactions are involved in nearly all of the biological processes in human health anddisease. Understanding the dynamics of binding and the structure of protein complexes at the molecularlevel can be instrumental in delineating biological mechanisms and developing intervention strategies.Computational protein-protein docking provides a route to predict the three-dimensional structures ofprotein assemblies or complexes from known structures of individual monomeric proteins [1].Docking methods are tested in the blind prediction challenge known as the Critical Assessment ofPRediction of Interactions (CAPRI) [2], which in recent rounds pushed the field by including a wide arrayof target types such as transport proteins, higher order assemblies and host-virus interactions [3, 4]. Outof the 28 protein-protein targets evaluated in CAPRI over the past four years [4, 5], predictors achievedhigh quality structures for 11 “easy” targets, defined as those with little backbone motion (unbound to ∗ Corresponding author
Email addresses: [email protected] (Ameya Harmalkar), [email protected] (Jeffrey J. Gray) a r X i v : . [ q - b i o . B M ] O c t igure 1: Performance of protein docking approaches on blind targets in CAPRI Rounds 38-46. [4, 5] Distribution of DockQscores for the best model submitted by each predictor group (points) for each individual target (x-axis). DockQ measures acombination of intermolecular residue-residue contacts, interface RMSD, and ligand RMSD on a scale of 0 (incorrect) to 1(matching the experimental structure) [5]. Targets are labelled by their CAPRI target number and, when needed, interfacenumber (after the decimal). The targets are classified into rigid (easy) targets (high-homology monomer templates and under 1.2˚A unbound-bound backbone motion, and flexible targets (poor template availability and/or over 1.2 ˚A RMSD BU ). DockQ scoresare color-coded by CAPRI model quality ranking: blue, high; green, medium; yellow, acceptable; gray, incorrect. Data graciouslyprovided by Marc Lensink [4, 5]. bound C α root mean square deviation (RMSD BU ) of less than 1.2 ˚A [6, 7]; Figure 1). The remaining 17targets were categorized as “difficult” (RMSD BU over 2.2 ˚A and/or poor monomer template availability).For these targets, predictors only achieved acceptable quality in 8 of 17 targets (47%) and high qualityin only 2 (12%) [4, 5]. Thus, the intrinsic flexibility of biomolecules still confounds the protein dockingcommunity at large.In this review, we focus on the central docking challenge of capturing larger binding-induced con-formational changes. We summarize progress by recent algorithms and frameworks, additionally aug-mented by growth in databases and computational power (CPU- and GPU-based). These new methodshave achieved greater accuracy on more challenging targets and additionally yielded insight into bindingmechanisms. We first present progress in binding site identification and then docking methods includingmolecular dynamics (MD) and Monte Carlo (MC) approaches, normal modes, and machine learning.Together, these techniques have helped better explore broader regions of conformational space and morethoroughly evaluate the energy landscape to improve protein-protein docking.2 igure 2: Reducing the degrees of freedom in protein docking. Coarse-grained models
From left to right: Some approachesuse all-atom representations (except solvent). The UNRES (united residue) model [8] represents the side chains as variable sizeellipsoids attached to the C α atom by peptide linkages and backbone N, C and O atoms are accounted with peptide-bond centers.CABS (C α ,C β and sidechains) model adds a C β atom and approximates rest of the side chain by a single sphere. The Rosettacentroid model [9] uses a CEN atom to represent the side chain while the backbone stays intact. The ATTRACT reduced proteinmodel comprises of 2-3 atoms per residue with only C α in the backbone and 1-2 atoms in the side chain [10]. Knowledge-basedmodel derived from residue pair transforms of protein motifs from bound complexes in the PDB [11, 12]**. 2. Fast manifoldFourier transforms (FMFT) : The 5D FMFT method implicitly matches protein shapes over three translations and two rotationsin Fourier space (adapted from Padhorny et al.
MaSIF identifies binding sites using interface “fingerprints” in ageometric deep learning model [14]**.
Identifying putative binding sites: a global search
To reduce the complexity of the immense conformation space of flexible proteins, coarse-grained mod-els are frequently used to reduce the degrees of freedom (Figure 2). In the extreme, global dockingapproaches typically first treat protein partners as rigid bodies by restricting to six degrees of freedom(three rotational and three translational). A prime method to exhaustively sample the global 6D spaceis enumerating and scoring different rigid-body orientations on a dense grid. Approaches such as Clus-Pro [15, 16], ZDOCK [17, 18], PIPER [19] and HexServer [20] rely on the fast Fourier transform (FFT)correlation, which projects protein binding partners on a discretized three-dimensional grid. Conven-tional FFT approaches accelerate sampling only in the translational space and require new FFTs forevery rotation. In 2015, Kazennov et al. developed fast manifold Fourier transforms (FMFT) to search3rrangements of two rigid bodies in a 5D manifold (Figure 2) [21]. Relative to traditional FFT-baseddocking, FMFT accelerates calculations ten-fold [13]*. Another shape-based approach is geometric hash-ing, which indexes point sets or curves to match geometric features under arbitrary transformations liketranslations, rotations or even scaling [22]. Local 3D Zernike descriptor-based docking (LZerD), one ofthe top methods in CAPRI, projects 3D surfaces onto spheres to efficiently capture complementarity ofprotein surfaces [23]. Some rigid-body approaches exploit data from chemical cross-linking experiments[24] or small-angle X-ray scattering (SAXS) [25] to further improve discrimination of generated struc-tures. These approaches provide fast, global exploration of the energy landscape, and in recent CAPRIrounds [4, 5], many predictors incorporated these approaches as the first step to identify putative bindingpatches, and they supplement with other refinement tools to capture backbone flexibility.
Methods accounting backbone flexibility
Molecular dynamics
Molecular dynamics (MD) is one strategy that is often used after grid-search or template-based ap-proaches for refinement (Figure 3) [26, 27]. Unbiased, all-atom MD simulations can provide a high-resolution, time-resolved microscopic model of protein-protein interactions. MD calculates Newtoniantrajectories using physics-based energy functions to simulate protein association and dissociation events.MD use for protein docking has been limited because non-native local minima trap proteins, and disso-ciation is too slow [28]. Over the past decade, two new modifications to capture conformational changesare steered molecular dynamics (SMD) [29], which utilizes external force constraints, and Markov sam-pling, which breaks a long MD simulation into multiple short trajectories [30]. To accelerate dissociationof protein partners at sub-optimal binding regions, Ostermeir et al. developed a Hamiltonian replicaexchange MD protocol (H-REMD) for protein docking [31]*. In H-REMD, biasing potentials are basedon the shortest distance between protein partner atoms (defined as “ambiguity restraints”). As the bias-ing potential and associated ambiguity restraints vary across replicas, associated protein partners in onereplica are forced to dissociate in another. Pan et al. simulated long timescales in a global search spacefor a benchmark set of five targets on the special purpose machine Anton [32, 33]. Their “tempered bind-ing” protocol updates energy function parameters throughout the simulation: a soft-core van der Waalsintermolecular potential is scaled so that long-lived states are dissociated more frequently, improving thesampling efficiency [33]**. Further, Pan et al. found that proteins often follow a repeated dissociationand association pattern rather than probing continually along the surface for the native binding site.Siebenmorgen et al. similarly scaled atomic repulsions with the vdW radii [34]**. They varied the vdWattraction energy across replicas relative to the Lennard-Jones and electrostatic interactions (owing toincreased ligand-receptor atom distance). Compared to conventional MD methods, their simulationssampled native-like states 30% more often; resulting in blind docking predictions within 5 ˚A of native4or moderately flexible targets. MD-based docking on proteins that move more than 2.2 ˚A RMSD uponbinding has not yet been reported.
Monte Carlo methods
In contrast to MD approaches that target flexibility with Newtonian dynamics; Monte Carlo (MC)methods sample by random moves often followed by minimization (MCM) [40, 41]. MC allows a widevariety of conformational move types to sample diverse conformations. MC algorithms have emulatedthe kinetic binding models, namely key-lock, conformer selection (CS) and induced-fit (IF) mechanisms[42, 43, 44]. The CS model chooses protein backbones from a pre-generated ensemble, thus this approachhas the advantage of docking one partner’s conformations at a time. However, CS docking can fail if theensemble is devoid of native-like backbone conformations [45]. For targets with RMSD BU up to 2.5 ˚A,Zhang et al. generated ensembles of 40 structures for MC-based docking [44]. This ensemble dockingapproach incorporates the ATTRACT coarse-grained protein model (Figure 2) [10] in conjunction withreplica-exchange (RE) to sample in backbone as well as rigid body space. Although the ensemble does notalways include bound-like conformations of the proteins, their REMC-ensemble docking method obtainshigher quality structures than MCM and REMC approaches. RosettaDock4.0 [12]**, a conformer selec-tion based MCM approach, modulates backbone swaps with an strategy that modulates rates of samplingof each conformer to handle ensembles of 100 structures for each protein partner (RosettaDock3.0 [43]docked from an ensemble of 10 structures). To diversify backbone conformations, the protocol generatesmonomer structures by three methods: (1) normal modes [46] (2) backrub motions [47] and (3) all-atombackbone refinement [48]. Further, to discriminate between near-native and non-native structures, theydeveloped a more accurate coarse-grained energy function with 6-dimensional residue-pair data obtainedfrom protein-protein interfaces in the Protein Data Bank (Figure 2) [11]. Marze et al. report successon 49% of moderately flexible and 31% of flexible targets, the highest local-docking success rates yetreported [12]**. Sampling backbone conformations with normal modes
Since intrinsic fluctuations in proteins contribute to conformational change, some docking approachesutilize harmonic dynamics to capture protein backbone motions [49, 50, 51]. Normal modes of vibrationrepresent internal motions of a protein based on a Hookean potential between close residues. Normalmode analysis (NMA) is incorporated in docking approaches such as ATTRACT [52], FiberDock [53],SwarmDock [54] and EigenHex [55]. To mimic induced-fit, Schindler et al. developed iATTRACT [56] bymoving interface residues in Cartesian coordinate space subject to NMA-generated harmonic potentials.iATTRACT served as a refinement stage and improved the fraction of native contacts predicted by70%. For targets with unbound to bound interface RMSD over 4 ˚A, iATTRACT can achieve acceptablequality models [56]. Population-based methods such as particle swarm optimization (PSO) have also5 igure 3:
Enhanced Sampling approaches in protein docking. Temperature replica exchange
MD/MC approaches utilizetemperature as the variable parameter across replicas [35, 36]. The smoothening of the relatively rugged energy landscape enablessampling of distinct energy basins. 2.
Umbrella Sampling methods [37] split the reaction coordinate between an unbound andbound state into multiple windows. This enables biasing molecular dynamics trajectories along the reaction coordinate drivingthe system from one thermodynamic state to another. 3.
Hamiltonian replica change approaches introduce a biasing potentialwhich can be either time-dependent, contact-dependent [31]* or geometry dependent [34]**.
Scaling methods
Top: Use ofcontact-dependent ambiguity constraints between protein partners. The weighted distance of the closest contacts of the partnersdefines bias potentials; Bottom: Bias based on increase in the effective pairwise vdW radii (an illustration to indicate the variablevdW radii across replicas for hamiltonian-based tempering);
Hamiltonian REMD:
The exchange trajectories with the biasingharmonic potential(red) and the range of potentials used across all the replicas in the system. 4.
Conformational flooding /Metadynamics utilize an exhaustive search within a local scope by introducing a funnel-shaped constraint potential [38]. Shortmetadynamics simulations have been equipped to obtain backbone conformations for ensemble-docking [39]. igure 4: Internal coordinates NMA captures larger conformational change. (a) Schematic of the bound homodimer (PDB ID:2EIA) and unbound monomer (1EIA) forms of equine infectious anemia virus (EIAV) capsid protein p26 (RMSD BU of the bindingdomain is 5.2 ˚A). (b) Model generated by internal coordinate NMA at maximum amplitude (yellow) retains realistic bond lengthsand angles. (c) iNMA with the optimal mode magnitudes yields a structure within 3 ˚A RMSD of the bound form. Panels (b)and (c) adapted from Frezza and Lavery (2019) [59]**. employed NMA. PSO is a heuristic approach that optimizes the multiple degrees of freedom using aset of multiple systems. The SwarmDock algorithm recently incorporated dynamic cross-docking [57]* ofmultiple backbone conformations within its PSO routine. It obtains an ensemble of conformational statesof individual protein partners by using elastic network normal mode calculations and samples with thefive lowest frequency non-trivial modes. SwarmDock achieved medium or high quality structures even fordifficult targets with i-RMSD between 2.2 and 6 ˚A along with a challenging prior CAPRI target (T136)[57, 4]. Extending the swarm intelligence methods, the LightDock algorithm uses a “glowworm” swarmoptimization to sample different backbone conformations in local regions of the protein surface with ananisotropic network model [58]. LightDock additionally uses multiscale modeling to combine all-atomand coarse grained scoring functions.While normal modes have typically been used on individual protein partners prior to docking, Oliwaand Shen introduced the complex NMA in docking to also sample molecular complex fluctuations [60]. Bycalculating modes of an encounter complex, this approach focuses on the binding region as it reduces thedimensionality of the search space [61]. One of the problems of NMA is that higher frequency modes oftendistort protein bonds. To overcome this limitation, Frezza and Lavery developed the internal coordinateNMA (iNMA) approach to move in the torsion angle space, that is, with fixed bond lengths and angles(Figure 4) [62]. With a reduced protein model in an internal coordinate space, they captured largerconformational changes from eigenvectors of low-frequency modes [59]**. iNMA can generate structureswithin 3 ˚A of the bound state when starting from the unbound for 39% of single-domain and 45% ofmulti-domain proteins in their benchmark. 7 achine learning methods Although protein folding has been one prime focus of deep learning methods in biology (e.g., AlphaFold[63, 64] and RaptorX [65]), in recent years, a few studies have explicitly addressed challenges relevant toprotein docking [66]. Protein binding sites can be thought of as an information-rich molecular space thatcan be mined for elucidating protein interactions [67, 68, 69].One approach is to use this information to create score functions for use with traditional dockingapproaches. For example, Geng et al. used graph representations to train a support vector machine(SVM) on native and non-native protein complex structures to develop a scoring potential (GraphRank)to rank docked poses [70]. And iScore, composed of the GraphRank and HADDOCK [71] scores, achievedtop performance in CAPRI scoring rounds (medium or high quality structures for nine out of 13 targets).Other teams have used deep learning techniques to identify protein interfaces by extrapolating im-age recognition tools to protein structures. RaptorX-ComplexContact [69] uses a deep residual neuralnetwork trained on single-chain proteins to predict contacts between binding partners, achieving the topcontact prediction scores in CASP [72]. Another approach is to characterize interaction environments.Townshend et al. created “voxels,” i.e., volumetric pixels with local atomic information for every proteinsurface residue, and with this 3D representation, they trained a deep 3D convolutional neural network(SASNet) on a curated database of bound protein complex structures [73]. Pittala et al. employed graphconvolutions with the nodes representing the amino acid residues and edges connecting residues with a C β − C β distance under 10 ˚A [74]. They placed geometric and chemical features on both nodes and edgesand used a graph neural network to predict epitopes and paratopes in antigen-antibody interfaces. Ina unique approach by Gainza et al. , a geometric deep learning model (MaSIF) used molecular interac-tion “fingerprints” calculated using geometric and chemical features of protein surfaces [14]** (2). Theirdeep network was composed from geodesic convolutional layers, and they used it to predict binding sites,evaluate alternate docked interfaces, and assess likelihood of a given protein-protein interaction. Relativeto conventional rigid docking methods on protein targets, MaSIF-search can perform ultra-fast scanningto identify true ‘binder’ with similar accuracy but significantly faster (4 CPU-minutes vs. 45 hours forPatchDock and 93 days for ZDOCK to evaluate a benchmark of 100 bound protein complexes).In a study to explore how neural networks might be used to generate structures with considerablebackbone motion, Degiacomi trained an autoencoder with conformations from MD simulations, compress-ing the protein motion into a low-dimensional latent space [75]*. By training with simulations of bothclosed (bound) and apo conformations of a target protein, the autoencoder generated an intermediateclosed-apo conformation at 0.8 ˚A RMSD [75] from the native state. However, when the autoencoder wastrained only with open conformations, the generator could only create structures far from the closed state(over 4.2 ˚A), limiting the utility of this approach for blind docking. In an approach suitable for blindcases, Cao and Shen developed a Bayesian active learning (BAL) model to quantify uncertainty in protein8tructure quality, and then they extended their model to flexible protein docking [76]*. The Bayesianframework determines the posterior probability as it samples backbone conformations [60]. Flexibilityis captured with low-frequency complex-NMA modes, and in principle it can be extended to higher fre-quencies that capture loop and hinge motions. Compared to ZDOCK [17] and PSO, BAL improves theinterface RMSD of the near-native predictions by 0.5 ˚A. Conclusions
In conjunction with experimental data, docking has advanced a range of biological and health appli-cations (e.g., Alzheimer’s disease [77], celiac disease [78], SARS-CoV-2 [79], influenza [80], cancer [81],and heart disease [82], to name just a few). Over the past few years, docking success rates have improvedon “difficult” blind prediction targets, but rates need to be higher for docking to be a reliable stand-alonetool in all cases. Clearly, a diverse and impressive array of tools has steadily advanced toward reliablycapturing large conformational changes in protein docking. Docking will be even more impactful whenthe field finally overcomes this challenge.
Acknowledgements
This work was supported by the National Institutes of Health through grant R01-GM078221. Wethank Marc Lensink for generously providing us with data from CAPRI and Sai Pooja Mahajan andSudhanshu Shanker for helpful comments on the manuscript.
Conflict of Interest
Dr. Jeffrey J. Gray is an unpaid board member of the Rosetta Commons. Under institutional par-ticipation agreements between the University of Washington, acting on behalf of the Rosetta Commons,Johns Hopkins University may be entitled to a portion of revenue received on licensing Rosetta softwareincluding applications mentioned in this review. As a member of the Scientific Advisory Board, Dr. Grayhas a financial interest in Cyrus Biotechnology. Cyrus Biotechnology distributes the Rosetta software,which may include methods mentioned in this review.
References [1] D. Ritchie, Recent Progress and Future Directions in Protein-Protein Docking, Current Protein &Peptide Science 9 (1) (2008) 1–15. doi:10.2174/138920308783565741 .[2] J. Janin, K. Henrick, J. Moult, L. T. Eyck, M. J. Sternberg, S. Vajda, I. Vakser, S. J. Wodak,CAPRI: A critical assessment of PRedicted interactions, Proteins: Structure, Function and Genetics52 (1) (2003) 2–9. doi:10.1002/prot.10381 . 93] M. F. Lensink, S. Velankar, M. Baek, L. Heo, C. Seok, S. J. Wodak, The challenge of modeling proteinassemblies: the CASP12-CAPRI experiment, Proteins: Structure, Function and Bioinformatics 86(2018) 257–273. doi:10.1002/prot.25419 .[4] M. F. Lensink, G. Brysbaert, N. Nadzirin, S. Velankar, R. A. Chaleil, T. Gerguri, P. A. Bates,E. Laine, A. Carbone, S. Grudinin, R. Kong, R. R. Liu, X. M. Xu, H. Shi, S. Chang, M. Eisenstein,A. Karczynska, C. Czaplewski, E. Lubecka, A. Lipska, P. Krupa, M. Mozolewska, (cid:32)L. Golon, S. Sam-sonov, A. Liwo, S. Crivelli, G. Pag`es, M. Karasikov, M. Kadukova, Y. Yan, S. Y. Huang, M. Rosell,L. A. Rodr´ıguez-Lumbreras, M. Romero-Durana, L. D´ıaz-Bueno, J. Fernandez-Recio, C. Christoffer,G. Terashi, W. H. Shin, T. Aderinwale, S. R. Maddhuri Venkata Subraman, D. Kihara, D. Kozakov,S. Vajda, K. Porter, D. Padhorny, I. Desta, D. Beglov, M. Ignatov, S. Kotelnikov, I. H. Moal, D. W.Ritchie, I. Chauvot de Beauchˆene, B. Maigret, M. D. Devignes, M. E. Ruiz Echartea, D. Barradas-Bautista, Z. Cao, L. Cavallo, R. Oliva, Y. Cao, Y. Shen, M. Baek, T. Park, H. Woo, C. Seok,M. Braitbard, L. Bitton, D. Scheidman-Duhovny, J. Dapk¯unas, K. Olechnoviˇc, ˇC. Venclovas, P. J.Kundrotas, S. Belkin, D. Chakravarty, V. D. Badal, I. A. Vakser, T. Vreven, S. Vangaveti, T. Bor-rman, Z. Weng, J. D. Guest, R. Gowthaman, B. G. Pierce, X. Xu, R. Duan, L. Qiu, J. Hou,B. Ryan Merideth, Z. Ma, J. Cheng, X. Zou, P. I. Koukos, J. Roel-Touris, F. Ambrosetti, C. Geng,J. Schaarschmidt, M. E. Trellet, A. S. Melquiond, L. Xue, B. Jim´enez-Garc´ıa, C. W. van Noort,R. V. Honorato, A. M. Bonvin, S. J. Wodak, Blind prediction of homo- and hetero-protein com-plexes: The CASP13-CAPRI experiment, Proteins: Structure, Function and Bioinformatics 87 (12)(2019) 1200–1221. doi:10.1002/prot.25838 .[5] M. F. Lensink, N. Nadzirin, S. Velankar, S. J. Wodak, Modeling protein-protein, protein-peptide, andprotein-oligosaccharide complexes: CAPRI 7th edition, Proteins: Structure, Function and Bioinfor-matics (2019) 1–23. doi:10.1002/prot.25870 .[6] T. Vreven, I. H. Moal, A. Vangone, B. G. Pierce, P. L. Kastritis, M. Torchala, R. Chaleil, B. Jim´enez-Garc´ıa, P. A. Bates, J. Fernandez-Recio, A. M. Bonvin, Z. Weng, Updates to the Integrated Protein-Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2,Journal of Molecular Biology 427 (19) (2015) 3031–3041. doi:10.1016/j.jmb.2015.07.016 .[7] P. J. Kundrotas, I. Anishchenko, T. Dauzhenka, I. Kotthoff, D. Mnevets, M. M. Copeland, I. A.Vakser, Dockground: A comprehensive data resource for modeling of protein complexes, ProteinScience 27 (1) (2018) 172–181. doi:10.1002/pro.3295 .[8] A. Liwo, M. Baranowski, C. Czaplewski, E. Go(cid:32)la´s, Y. He, D. Jagie(cid:32)la, P. Krupa, M. Maciejczyk,M. Makowski, M. A. Mozolewska, A. Niadzvedtski, S. O(cid:32)ldziej, H. A. Scheraga, A. K. Sieradzan,R. Slusarz, T. Wirecki, Y. Yin, B. Zaborowski, A unified coarse-grained model of biological macro-10olecules based on mean-field multipole-multipole interactions., Journal of Molecular Modeling20 (8) (2014) 2306. doi:10.1007/s00894-014-2306-5 .[9] S. Lyskov, J. J. Gray, The RosettaDock server for local protein-protein docking., Nucleic AcidsResearch 36 (Web Server issue) (2008) 233–238. doi:10.1093/nar/gkn216 .[10] M. Zacharias, ATTRACT: Protein-protein docking in CAPRI using a reduced protein model, Pro-teins: Structure, Function, and Bioinformatics 60 (2) (2005) 252–256. doi:10.1002/prot.20566 .[11] J. A. Fallas, G. Ueda, W. Sheffler, V. Nguyen, D. E. McNamara, B. Sankaran, J. H. Pereira,F. Parmeggiani, T. J. Brunette, D. Cascio, T. R. Yeates, P. Zwart, D. Baker, Computational designof self-assembling cyclic protein homo-oligomers, Nature Chemistry 9 (4) (2017) 353–360. doi:10.1038/nchem.2673 .[12] N. A. Marze, S. S. Roy Burman, W. Sheffler, J. J. Gray, Efficient flexible backbone protein-protein docking for challenging targets, Bioinformatics 34 (20) (2018) 3461–3469. doi:10.1093/bioinformatics/bty355 .**
With a novel, six-dimension, coarse-grained score function and adaptiveconformer selection, RosettaDock 4.0 succeeds in local docking on 49% ofmoderately flexible and 31% of flexible targets, the highest reported to-date. [13] D. Padhorny, A. Kazennov, B. S. Zerbe, K. A. Porter, B. Xia, S. E. Mottarella, Y. Kholodov, D. W.Ritchie, S. Vajda, D. Kozakov, Protein-protein docking by fast generalized Fourier transforms on5D rotational manifolds, Proceedings of the National Academy of Sciences of the United States ofAmerica 113 (30) (2016) E4286–E4293. doi:10.1073/pnas.1603929113 .**
While traditional FFT algorithms transform over three translational de-grees of freedom, the fast manifold Fourier transform algorithm encodes anadditional two rotational dimensions using spherical functions and radial har-monics. The approach speeds up sampling by an order of magnitude. [14] P. Gainza, F. Sverrisson, F. Monti, E. Rodol`a, D. Boscaini, M. M. Bronstein, B. E. Correia, De-ciphering interaction fingerprints from protein molecular surfaces using geometric deep learning,Nature Methods 17 (2) (2020) 184–192. doi:10.1038/s41592-019-0666-6 .**
A geometric deep learning model that computes molecular interaction ‘fin-gerprints’ — geometric and chemical features of protein surface patches —to rapidly identify binding sites (MaSIF-site, MaSIF-ligand) or scan proteininterfaces (MaSIF-search). doi:10.1093/nar/gkh354 .[16] D. Kozakov, D. R. Hall, B. Xia, K. A. Porter, D. Padhorny, C. Yueh, D. Beglov, S. Vajda, TheClusPro web server for protein-protein docking, Nature Protocols 12 (2) (2017) 255–278. doi:10.1038/nprot.2016.169 .[17] R. Chen, L. Li, Z. Weng, ZDOCK: An initial-stage protein-docking algorithm, Proteins: Structure,Function and Genetics 52 (1) (2003) 80–87. doi:10.1002/prot.10389 .[18] B. G. Pierce, K. Wiehe, H. Hwang, B. H. Kim, T. Vreven, Z. Weng, ZDOCK server: Interactivedocking prediction of protein-protein complexes and symmetric multimers, Bioinformatics 30 (12)(2014) 1771–1773. doi:10.1093/bioinformatics/btu097 .[19] D. Kozakov, R. Brenke, S. R. Comeau, S. Vajda, PIPER: An FFT-based protein docking programwith pairwise potentials, Proteins: Structure, Function, and Bioinformatics 65 (2) (2006) 392–406. doi:10.1002/prot.21117 .[20] G. Macindoe, L. Mavridis, V. Venkatraman, M. D. Devignes, D. W. Ritchie, HexServer: An FFT-based protein docking server powered by graphics processors, Nucleic Acids Research 38 (SUPPL.2) (2010) 445–449. doi:10.1093/nar/gkq311 .[21] A. M. Kazennov, A. E. Alekseenko, D. Kozakov, D. N. Padhorny, Y. A. Kholodov, Efficient search forthe possible mutual arrangements of two rigid bodies with the use of the generalized five-dimensionalFourier transform, Mathematical Models and Computer Simulations 7 (4) (2015) 315–322. doi:10.1134/S2070048215040043 .[22] G. R. Smith, M. J. Sternberg, Prediction of protein-protein interactions by docking methods, CurrentOpinion in Structural Biology 12 (1) (2002) 28–35. doi:10.1016/S0959-440X(02)00285-3 .[23] V. Venkatraman, Y. D. Yang, L. Sael, D. Kihara, Protein-protein docking using region-based 3DZernike descriptors, BMC Bioinformatics 10 (2009). doi:10.1186/1471-2105-10-407 .[24] T. Vreven, D. K. Schweppe, J. D. Chavez, C. R. Weisbrod, S. Shibata, C. Zheng, J. E. Bruce,Z. Weng, Integrating Cross-Linking Experiments with Ab Initio Protein–Protein Docking, Journalof Molecular Biology 430 (12) (2018) 1814–1828. doi:10.1016/j.jmb.2018.04.010 .[25] M. Ignatov, A. Kazennov, D. Kozakov, ClusPro FMFT-SAXS: Ultra-fast Filtering Using Small-AngleX-ray Scattering Data in Protein Docking, Journal of Molecular Biology 430 (15) (2018) 2249–2255. doi:10.1016/j.jmb.2018.03.010 . 1226] J. Dapk¯unas, K. Olechnoviˇc, ˇC. Venclovas, Modeling of protein complexes in CAPRI Round 37using template-based approach combined with model selection, Proteins: Structure, Function andBioinformatics 86 (2018) 292–301. doi:10.1002/prot.25378 .[27] C. Christoffer, G. Terashi, W. H. Shin, T. Aderinwale, S. R. Maddhuri Venkata Subramaniya,L. Peterson, J. Verburgt, D. Kihara, Performance and enhancement of the LZerD protein assemblypipeline in CAPRI 38-46, Proteins: Structure, Function and Bioinformatics (2019) 1–14 doi:10.1002/prot.25850 .[28] D. E. Shaw, P. Maragakis, K. Lindorff-Larsen, S. Piana, R. O. Dror, M. P. Eastwood, J. A. Bank,J. M. Jumper, J. K. Salmon, Y. Shan, W. Wriggers, Atomic-Level Characterization of the StructuralDynamics of Proteins, Science 330 (6002) (2010) 341–346. doi:10.1126/science.1187409 .[29] M. Kr´ol, R. A. G. Chaleil, A. L. Tournier, P. A. Bates, Implicit flexibility in protein docking: cross-docking and local refinement., Proteins 69 (4) (2007) 750–757. doi:10.1002/prot.21698 .[30] N. Plattner, S. Doerr, G. De Fabritiis, F. No´e, Complete protein-protein association kinetics inatomic detail revealed by molecular dynamics simulations and Markov modelling, Nature Chemistry9 (10) (2017) 1005–1011. doi:10.1038/nchem.2785 .[31] K. Ostermeir, M. Zacharias, Accelerated flexible protein-ligand docking using Hamiltonian replicaexchange with a repulsive biasing potential, PLoS ONE 12 (2017). doi:10.1371/journal.pone.0172072 .* Hamiltonian replica exchange (H-REMD) modifies parts of the force fieldacross different replicas. In this paper, a repulsive potential between receptorand ligand surface residues promotes transient dissociation on switching repli-cas, accelerating exploration of the protein surface to identify possible bindingsites. [32] D. E. Shaw, J. P. Grossman, J. A. Bank, B. Batson, J. A. Butts, J. C. Chao, M. M. Deneroff, R. O.Dror, A. Even, C. H. Fenton, A. Forte, J. Gagliardo, G. Gill, B. Greskamp, C. R. Ho, D. J. Ierardi,L. Iserovich, J. S. Kuskin, R. H. Larson, T. Layman, L. Lee, A. K. Lerer, C. Li, D. Killebrew, K. M.Mackenzie, S. Y. Mok, M. A. Moraes, R. Mueller, L. J. Nociolo, J. L. Peticolas, T. Quan, D. Ramot,J. K. Salmon, D. P. Scarpazza, U. B. Schafer, N. Siddique, C. W. Snyder, J. Spengler, P. T. P. Tang,M. Theobald, H. Toma, B. Towles, B. Vitale, S. C. Wang, C. Young, Anton 2: Raising the Barfor Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer, in:SC ’14: Proceedings of the International Conference for High Performance Computing, Networking,Storage and Analysis, (2014), pp. 41–53. doi:10.1109/SC.2014.9 .1333] A. C. Pan, D. Jacobson, K. Yatsenko, D. Sritharan, T. M. Weinreich, D. E. Shaw, Atomic-levelcharacterization of protein–protein association, Proceedings of the National Academy of Sciences ofthe United States of America 116 (10) (2019) 4244–4249. doi:10.1073/pnas.1815431116 .**
With long timescale MD simulations using a “tempered binding” protocolthat scales a soft-core energy across replicas to promote dissociation of long-lived states, this work found that protein binding occurs through repeatedassociation-dissociation events rather than prolonged in-contact exploration. [34] T. Siebenmorgen, M. Engelhard, M. Zacharias, Prediction of protein–protein complexes using replicaexchange with repulsive scaling, Journal of Computational Chemistry (2020) 1436–1447. doi:10.1002/jcc.26187 .**
Using a novel replica exchange scheme with variable van der Waals radiifor interface residue atoms, the RS-REMD approach promotes dissociation insome replicas, which improves sampling for both global searches and refine-ment. [35] P. Liu, B. Kim, R. A. Friesner, B. J. Berne, Replica exchange with solute tempering: A methodfor sampling biological systems in explicit water, Proceedings of the National Academy of Sciences102 (39) (2005) 13749–13754. doi:10.1073/pnas.0506346102 .[36] Z. Zhang, O. F. Lange, Replica Exchange Improves Sampling in Low-Resolution Docking Stage ofRosettaDock, PLoS ONE 8 (8) (2013) e72096. doi:10.1371/journal.pone.0072096 .[37] J. K¨astner, Umbrella sampling, Wiley Interdisciplinary Reviews: Computational Molecular Science1 (6) (2011) 932–942. doi:10.1002/wcms.66 .[38] V. Limongelli, M. Bonomi, M. Parrinello, Funnel metadynamics as accurate binding free-energymethod, Proceedings of the National Academy of Sciences of the United States of America 110 (16)(2013) 6358–6363. doi:10.1073/pnas.1303186110 .[39] A. Basciu, G. Malloci, F. Pietrucci, A. M. J. J. Bonvin, A. V. Vargiu, Holo-like and DruggableProtein Conformations from Enhanced Sampling of Binding Pocket Volume and Shape, Journal ofChemical Information and Modeling 59 (4) (2019) 1515–1528. doi:10.1021/acs.jcim.8b00730 .[40] J. J. Gray, High-resolution protein–protein docking, Current Opinion in Structural Biology 16 (2)(2006) 183–193. doi:https://doi.org/10.1016/j.sbi.2006.03.003 .[41] S. Vajda, D. R. Hall, D. Kozakov, Sampling and scoring: A marriage made in heaven, Proteins:Structure, Function and Bioinformatics 81 (11) (2013) 1874–1884. doi:10.1002/prot.24343 .1442] C. Wang, P. Bradley, D. Baker, Protein–Protein Docking with Backbone Flexibility, Journal ofMolecular Biology 373 (2) (2007) 503–519. doi:10.1016/j.jmb.2007.07.050 .[43] S. Chaudhury, J. J. Gray, Conformer Selection and Induced Fit in Flexible Backbone Protein–ProteinDocking Using Computational and NMR Ensembles, Journal of Molecular Biology 381 (4) (2008)1068–1087. doi:https://doi.org/10.1016/j.jmb.2008.05.042 .[44] Z. Zhang, U. Ehmann, M. Zacharias, Monte Carlo replica-exchange based ensemble docking ofprotein conformations, Proteins: Structure, Function, and Bioinformatics 85 (5) (2017) 924–937. doi:10.1002/prot.25262 .[45] D. Kuroda, J. J. Gray, Pushing the Backbone in Protein-Protein Docking, Structure 24 (10) (2016)1821–1829. doi:10.1016/j.str.2016.06.025 .[46] A. R. Atilgan, S. R. Durell, R. L. Jernigan, M. C. Demirel, O. Keskin, I. Bahar, Anisotropy offluctuation dynamics of proteins with an elastic network model, Biophysical Journal 80 (1) (2001)505–515. doi:10.1016/S0006-3495(01)76033-X .[47] C. A. Smith, T. Kortemme, Backrub-like backbone simulation recapitulates natural protein con-formational variability and improves mutant side-chain prediction., Journal of Molecular Biology380 (4) (2008) 742–756. doi:10.1016/j.jmb.2008.05.023 .[48] M. D. Tyka, D. a. Keedy, I. Andr´e, F. Dimaio, Y. Song, D. C. Richardson, J. S. Richardsonb,D. Baker, Alternate states of proteins revealed by detailed energy landscape mapping, J Mol Biol405 (2) (2011) 607–618.[49] M. Zacharias, H. Sklenar, Harmonic modes as variables to approximately account for receptor flex-ibility in ligand-receptor docking simulations: Application to DNA minor groove ligand complex,Journal of Computational Chemistry 20 (3) (1999) 287–300.[50] R. Gr¨unberg, M. Nilges, J. Leckner, Flexibility and Conformational Entropy in Protein-ProteinBinding, Structure 14 (4) (2006) 683–693. doi:10.1016/j.str.2006.01.014 .[51] M. Zacharias, Accounting for conformational changes during protein-protein docking, Current Opin-ion in Structural Biology 20 (2) (2010) 180–186. doi:10.1016/j.sbi.2010.02.001 .[52] S. de Vries, M. Zacharias, Flexible docking and refinement with a coarse-grained protein modelusing ATTRACT, Proteins: Structure, Function, and Bioinformatics 81 (12) (2013) 2167–2174. doi:10.1002/prot.24400 .[53] E. Mashiach, R. Nussinov, H. J. Wolfson, FiberDock: Flexible induced-fit backbone refinementin molecular docking, Proteins: Structure, Function and Bioinformatics 78 (6) (2010) 1503–1519. doi:10.1002/prot.22668 . 1554] I. H. Moal, P. A. Bates, SwarmDock and the use of normal modes in protein-protein docking.,International Journal of Molecular Sciences 11 (10) (2010) 3623–48. doi:10.3390/ijms11103623 .[55] V. Venkatraman, D. W. Ritchie, Flexible protein docking refinement using pose-dependent normalmode analysis, Proteins: Structure, Function and Bioinformatics 80 (9) (2012) 2262–2274. doi:10.1002/prot.24115 .[56] C. E. M. Schindler, S. J. de Vries, M. Zacharias, iATTRACT: Simultaneous global and local interfaceoptimization for protein-protein docking refinement, Proteins: Structure, Function, and Bioinformat-ics 83 (2) (2015) 248–258. doi:10.1002/prot.24728 .[57] M. Torchala, T. Gerguri, R. A. G. Chaleil, P. Gordon, F. Russell, M. Keshani, P. A. Bates, Enhancedsampling of protein conformational states for dynamic cross-docking within the protein-protein dock-ing server SwarmDock, Proteins: Structure, Function, and Bioinformatics 88 (8) (2020) 962–972. doi:10.1002/prot.25851 .* A hybrid conformational-selection/induced-fit approach for dynamic cross-docking in SwarmDock, a particle swarm optimization algorithm. Ensemblesare pre-generated with NMA and undergo cross-docking while sampling alter-nate protein conformations using low frequency normal modes. [58] B. Jim´enez-Garc´ıa, J. Roel-Touris, M. Romero-Durana, M. Vidal, D. Jim´enez-Gonz´alez,J. Fern´andez-Recio, LightDock: A new multi-scale approach to protein-protein docking, Bioinfor-matics 34 (1) (2018) 49–55. doi:10.1093/bioinformatics/btx555 .[59] E. Frezza, R. Lavery, Internal Coordinate Normal Mode Analysis: A Strategy to Predict ProteinConformational Transitions, Journal of Physical Chemistry B 123 (6) (2019) 1294–1301. doi:10.1021/acs.jpcb.8b11913 .**
This work employs NMA in the internal coordinate space with a reducedprotein model to capture large conformational changes of proteins with a fastercompute time and no distortion of protein bonds. [60] T. Oliwa, Y. Shen, cNMA: A framework of encounter complex-based normal mode analysis to modelconformational changes in protein interactions, Bioinformatics 31 (12) (2015) i151–i160. doi:10.1093/bioinformatics/btv252 .[61] H. Chen, Y. Sun, Y. Shen, Predicting protein conformational changes for unbound and homologydocking: learning from intrinsic and induced flexibility, Proteins: Structure, Function and Bioinfor-matics 85 (3) (2017) 544–556. doi:10.1002/prot.25212 .1662] E. Frezza, R. Lavery, Internal normal mode analysis (iNMA) applied to protein conformationalflexibility, Journal of Chemical Theory and Computation 11 (11) (2015) 5503–5512. doi:10.1021/acs.jctc.5b00724 .[63] A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. ˇZ´ıdek, A. W. R.Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D. T. Jones,D. Silver, K. Kavukcuoglu, D. Hassabis, Protein structure prediction using multiple deep neural net-works in the 13th Critical Assessment of Protein Structure Prediction (CASP13), Proteins: Struc-ture, Function, and Bioinformatics 87 (12) (2019) 1141–1148. doi:10.1002/prot.25834 .[64] A. W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. ˇZ´ıdek, A. W.Nelson, A. Bridgland, H. Penedones, S. Petersen, K. Simonyan, S. Crossan, P. Kohli, D. T. Jones,D. Silver, K. Kavukcuoglu, D. Hassabis, Improved protein structure prediction using potentials fromdeep learning, Nature 577 (7792) (2020) 706–710. doi:10.1038/s41586-019-1923-7 .[65] S. Wang, S. Sun, Z. Li, R. Zhang, J. Xu, Accurate De Novo Prediction of Protein Contact Mapby Ultra-Deep Learning Model, PLOS Computational Biology 13 (1) (2017) 1–34. doi:10.1371/journal.pcbi.1005324 .[66] W. Gao, S. P. Mahajan, J. Sulam, J. J. Gray, Deep learning in protein structural modeling anddesign (2020). arXiv:2007.08383 .[67] A. Shulman-Peleg, R. Nussinov, H. J. Wolfson, Recognition of functional sites in protein structures.,Journal of Molecular Biology 339 (3) (2004) 607–633. doi:10.1016/j.jmb.2004.04.012 .[68] A. Fout, J. Byrd, B. Shariat, A. Ben-Hur, Protein interface prediction using graph convolutionalnetworks, Advances in Neural Information Processing Systems 2017-December (NIPS) (2017) 6531–6540.[69] H. Zeng, S. Wang, T. Zhou, F. Zhao, X. Li, Q. Wu, J. Xu, ComplexContact: A web server for inter-protein contact prediction using deep learning, Nucleic Acids Research 46 (W1) (2018) W432–W437. doi:10.1093/nar/gky420 .[70] C. Geng, Y. Jung, N. Renaud, V. Honavar, A. M. J. J. Bonvin, L. C. Xue, iScore: a novel graphkernel-based function for scoring protein–protein docking models, Bioinformatics 36 (1) (2019) 112–121. doi:10.1093/bioinformatics/btz496 .[71] C. Dominguez, R. Boelens, A. M. Bonvin, HADDOCK: A protein-protein docking approach basedon biochemical or biophysical information, Journal of the American Chemical Society 125 (7) (2003)1731–1737. doi:10.1021/ja026939x . 1772] A. Kryshtafovych, T. Schwede, M. Topf, K. Fidelis, J. Moult, Critical assessment of methods ofprotein structure prediction (CASP)-Round XIII., Proteins 87 (12) (2019) 1011–1020. doi:10.1002/prot.25823 .[73] R. Townshend, R. Bedi, P. Suriana, R. Dror, End-to-End Learning on 3D Protein Structure forInterface Prediction, in: Advances in Neural Information Processing Systems 32, Curran Associates,Inc., (2019), pp. 15642–15651.[74] S. Pittala, C. Bailey-Kellogg, Learning context-aware structural representations to predict antigenand antibody binding interfaces, Bioinformatics (Oxford, England) 36 (13) (2020) 3996–4003. doi:10.1093/bioinformatics/btaa263 .[75] M. T. Degiacomi, Coupling Molecular Dynamics and Deep Learning to Mine Protein ConformationalSpace, Structure 27 (6) (2019) 1034 – 1040.e3. doi:https://doi.org/10.1016/j.str.2019.03.018 . *
This paper describes a unique method of generating plausible motions of aprotein using a generative neural network (autoencoder). When trained withconformations from an MD simulation, the autoencoder can quickly generateinterpolated structures. [76] Y. Cao, Y. Shen, Bayesian Active Learning for Optimization and Uncertainty Quantification inProtein Docking, Journal of Chemical Theory and Computation 16 (8) (2020) 5334–5347. doi:10.1021/acs.jctc.0c00476 .* With a framework to quantify uncertainty in docked models, the Bayesianapproach uses a posterior distribution to guide sampling to likely low-energyconformations. [77] C. V. Frost, M. Zacharias, From monomer to fibril: Abeta-amyloid binding to Aducanumab antibodystudied by molecular dynamics simulation, Proteins: Structure, Function, and Bioinformatics (2020)1–15 doi:10.1002/prot.25978 .[78] L. S. Høydahl, L. Richter, R. Frick, O. Snir, K. S. Gunnarsen, O. J. B. Landsverk, R. Iversen,J. R. Jeliazkov, J. J. Gray, E. Bergseng, S. Foss, S.-W. W. Qiao, K. E. A. Lundin, J. Jahnsen,F. L. Jahnsen, I. Sandlie, L. M. Sollid, G. ˚A. Løset, Plasma Cells Are the Most Abundant GlutenPeptide MHC-expressing Cells in Inflamed Intestinal Tissues From Patients With Celiac Disease,Gastroenterology 156 (5) (2019) 1428–1439.e10. doi:10.1053/j.gastro.2018.12.013 .[79] F. Cleri, M. Lensink, R. Blossey, DNA Aptamers Block the Receptor Binding Domain at the SpikeProtein of SARS-CoV-2, chemRxiv (2020). doi:10.26434/chemrxiv.12696173.v1 .1880] H. Xu, T. Palpant, C. Weinberger, D. E. Shaw, Characterizing receptor flexibility to predictmutations that lead to human adaptation of influenza hemagglutinin, bioRxiv (2020). doi:10.1101/2020.07.15.204982 .[81] J. Kalin, M. Wu, A. Gomez, Y. Song, J. Das, D. Hayward, N. Adejola, M. Wu, I. Panova, H. Chung,E. Kim, H. Roberts, J. Roberts, P. Prusevich, J. Jeliazkov, S. Roy Burman, L. Fairall, C. Milano,A. Eroglu, C. Proby, A. Dinkova-Kostova, W. Hancock, J. Gray, J. Bradner, S. Valente, A. Mai,N. Anders, M. Rudek, Y. Hu, B. Ryu, J. Schwabe, A. Mattevi, R. Alani, P. Cole, Targeting theCoREST complex with dual histone deacetylase and demethylase inhibitors, Nature Communications9 (2018). doi:10.1038/s41467-017-02242-4 .[82] R. F. Alford, N. Smolin, H. S. Young, J. J. Gray, S. L. Robia, Protein docking and steered moleculardynamics suggest alternative phospholamban-binding sites on the SERCA calcium transporter., TheJournal of Biological Chemistry 295 (32) (2020) 11262–11274. doi:10.1074/jbc.RA120.012948doi:10.1074/jbc.RA120.012948