PARCE: Protocol for Amino acid Refinement through Computational Evolution
Rodrigo Ochoa, Miguel A. Soler, Alessandro Laio, Pilar Cossio
PPARCE: Protocol for Amino acid Refinement throughComputational Evolution
Rodrigo Ochoa a , Miguel A. Soler b , Alessandro Laio c,d , Pilar Cossio a,e, ∗ a Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia,050010 Medellin, Colombia b Italian Institute of Technology (IIT), Via Melen 83, B Block, 16152, Genova, Italy c International School for Advanced Studies (SISSA), Via Bonomea 265, I-34136 Trieste,Italy d The Abdus Salam International Centre for Theoretical Physics (ICTP), Strada Costiera11, 34151 Trieste, Italy e Department of Theoretical Biophysics, Max Planck Institute of Biophysics, 60438Frankfurt am Main, Germany
Abstract
The in silico design of peptides and proteins as binders is useful fordiagnosis and therapeutics due to their low adverse effects and majorspecificity. To select the most promising candidates, a key matter is tounderstand their interactions with protein targets. In this work, we presentPARCE, an open source Protocol for Amino acid Refinement throughComputational Evolution that implements an advanced and promisingmethod for the design of peptides and proteins. The protocol performs arandom mutation in the binder sequence, then samples the boundconformations using molecular dynamics simulations, and evaluates theprotein-protein interactions from multiple scoring. Finally, it accepts orrejects the mutation by applying a consensus criterion based on bindingscores. The procedure is iterated with the aim to explore efficiently novelsequences with potential better affinities toward their targets. We alsoprovide a tutorial for running and reproducing the methodology.
Keywords:
Amino acids; Bioinformatics; Molecular simulations; Mutation
PROGRAM SUMMARY ∗ Corresponding author.
E-mail address: [email protected]; [email protected]
Preprint submitted to Computer Physics Communications April 29, 2020 a r X i v : . [ q - b i o . B M ] A p r rogram Title: PARCELicensing provisions: MIT LicenseProgramming language: Python 3Operating system: Linux, and docker container available to run the protocolunder other operating systems.Subprograms used: Gromacs 5.1.4, Scwrl 4, PDB2PQR, Scoring functions sourcecode.Nature of problem: Computational design of peptides and proteins as binders fordiagnosis and therapeutics.Solution method: A protocol that performs random mutations in the bindersequence, samples the bound conformations using molecular dynamicssimulations, and evaluates the protein-protein interactions from multiple scoringpredictions in order to accept or reject the mutations.Additional comments: This article describes version 1.0. PARCE is available at:https://github.com/PARCE-project/PARCE-1, and a Docker container can bedownloaded from: https://hub.docker.com/r/rochoa85/parce-1
1. Introduction
The rational design of proteins and peptides is a field that has beenexplored through experimental and computational approaches [1]. Whendetailed information of a particular system is available, conventionalknowledge-based methods provide tools for the design of more activeanalogs [2]. However, detailed information is not always available, and thus, de novo design strategies are required.Due to the significant amount of information related to the properties ofnatural amino acids, one strategy is to design proteins and peptides bymodifying directly the amino acid sequence, without taking into accountthe structure. This strategy is commonly used in the design ofantimicrobial peptides, where the mechanism of action has been elucidatedfor some families [3]. Physico-chemical properties such as hydrophobicity,charge, and molecular weight, have been used to create libraries of analogswith potentially similar activities [4]. However, these strategies lackinformation about the interacting partners, and on how the conformationallandscape of the residues can affect their activity [5]. This motivates theuse of structure-based design [6, 7].The most used structure-based strategies are molecular docking and the2se of structural libraries. Approaches based on experimental structuraldatabases take advantage on the huge quantity of protein structuralinformation to predict and optimize the most probable bindingconformations [8, 9, 10]. Molecular docking identifies the most probableposes of potential active molecules ranked by scoring functions that roughlyapproximate binding free energies or discriminate between active moleculesand decoys [11, 12]. In spite of these scoring-function improvements,protein and peptide design is still a computational challenge due to thelarge sequence space that has to be explored ( e.g. , for a peptide of 10 aminoacids in length, 20 sequences are possible). In addition, the intrinsicflexibility of non-structured protein regions, such as peptides or thecomplementary determining regions of antibodies, and the constraintsassociated to the protein interface region affect the reliability of theprediction [13]. To solve some of these problems, it has been proposed tosample the bound conformations using Molecular Dynamics (MD) [14].The prediction of the binding affinities is another challenge associatedto the design. Different alternatives exist based on the biological systemand the available computational infrastructure. One option is the use ofrepresentative structures from MD trajectories to calculate energy termsbased on molecular mechanics assumptions, entropy terms and free energiesof solvation using continuum-solvent models [15, 16, 17]. However, theseapproaches are computationally expensive. An alternative is averaging thescoring over conformational ensembles to estimate the affinity [18, 19, 20].We here describe a new tool for performing peptide and protein designusing MD. We called this tool Protocol for Amino acid Refinement throughComputational Evolution (PARCE). The algorithm implemented inPARCE is inspired by previous projects related to the exploration of thesequence space of peptides bound to proteins or small molecules[21, 14, 22]. The algorithm includes a single-point mutation protocol, whichhas been optimized for predicting the most common rotamers of peptideamino acids [5], an all-atom MD simulation of the mutated complex andthe estimate of the average score of the conformations from the trajectoryto assess the impact of the mutation in the binding [14, 23]. This approachhas been successfully used to design peptides and protein fragments, whosebinding affinity was afterward confirmed experimentally [24, 18, 25].PARCE implements multiple scoring functions that are evaluated using aconsensus metric for accepting or rejecting the mutations [25]. The processis iterated with the aim to evolve the original sequence and explore3fficiently novel sequences with potential better affinities toward theirtargets [7]. Figure 1 shows a graphical summary of PARCE. Figure 1: Summary of PARCE, which is composed of different phases. First, a startingprotein-peptide or protein-protein complex is used. Then, the MC cycle starts bygenerating a single-point mutation of a random amino acid in the peptide structure. Then,the conformational sampling of the mutated system is done with MD simulations. Scoringof the trajectory snapshots and the calculation of averages using multiple scoring functionsis performed. Finally, a consensus criterion accepts or rejects the mutation and an update(or not) of the sequence is performed. The process is iterated.
The computational protocol presented in this work is open source. Themanuscript is organized as follows. First, we explain the code architecture,the simulation parameters and the system used to test the PARCEfunctionalities. Then, we include details about the code requirements andthe performance of the protocol using a protein-peptide system as areference. Finally we discuss PARCE features in comparison with those ofsimilar software. 4 . Methods
The code is written in Python 3, with calls to Unix command lines formanipulation of files and external programs. Therefore, the currentprotocol depends on the availability of a bash environment to run. Toguarantee reproducibility, a docker container is provided with all therequirements, including the installation of the third-party open sourcesoftware, and the validation of the PARCE functionalities. However, a usercan configure her/his own machine following a simple setup validated withthe Travis CI framework (https://travis-ci.com). The documentationillustrates the definition of the parameters, the input options and files (seesupplementary Table A1). Figure 2 shows a workflow of the protocol basedon the tasks, scripts and software implemented at each step. In thefollowing, we present a description of each stage of PARCE.
Before applying the design protocol it is necessary to perform an MDsimulation of the starting complex to equilibrate the system, and also toobtain MD files required to start the protocol. When the starting system isobtained from a docked structural complex or a crystal structure, a longerprevious sampling of the complex is suggested. The equilibration of thecomplex can be detected using the RMSD or observables such as the numberof hydrogen bonds between the peptide and protein. We note that for mostsystems 200 ns is suitable for this purpose. If the protocol starts from acrystallized protein-peptide complex, the initial sampling time could be lower[24].The workflow requires as input: • A PDB and GRO (Gromacs format) file of the protein-peptide complexsolvated and equilibrated with MD using the Gromacs package [26]. • Topology files and MD parameters definitions.The protocol starts with an initial NPT simulation of the system usingthe Gromacs configuration files ( i.e. mdp files). Advanced users have thepossibility of modifying the Gromacs configuration files for adapting these tosimulate the specific systems. 5 igure 2: Workflow of PARCE. The protocol is managed by the script run protocol.py ,which accepts as input a configuration file with all the parameters required to run thedesign, and a folder with the structural input obtained from a previous MD trajectory.After configuring all the files and folders, and runing MD simulations of the initial complex,the first step is to perform a single-point mutation with the mutation.py script, whichselects a random position on the peptide and modifies its side chain by a random aminoacid. From there, the second step is to call the general.py script, responsible to samplethe new system. The third step is to score the trajectory using a set of scoring functionsavailable in the scoring.py file. The fourth step is to verify if the score difference betweenthe current state and the previous one satisfies a consensus threshold. If that happensthen the system is updated. The process is repeated during a number of attempts definedby the user.
The code has the main script run protocol.py , which calls three mainmodules: general functions, scoring functions and mutation functions. Thegeneral module creates an object responsible to run the simulations,perform the random mutations and score the trajectories to accept or rejectthe mutations. In a folder called design output , all the results are storedstep-by-step, including the different molecular entities (in complex orisolated), as well as the MD trajectories, the log files and the scores6alculated for each frame. In the end, a report file summarizes themutations attempted, if accepted or rejected, the average scores of eachfunction and the updated peptide sequence. An example is provided insupplementary Table A2. The number of MC steps and the MD simulationtime can be modified for making the protocol computationally efficient.
The script mutation.py randomly performs a single-point mutation overthe peptide. Currently, the code randomly selects a residue on the peptidechain and performs a random mutation using a uniform distribution for theaminoacids. However, this procedure can be customized, for example, byavoiding certain problematic amino acids, such cysteines, or including a non-uniform probability distribution for the location or aminoacid-type mutation.The prediction of the new amino acid rotamer is made with Scwrl4[27, 28], which selects the side chain conformations based on a library ofbackbone-dependent rotamers previously extracted from crystallized proteinstructures. The program was chosen based on a previous assessment ofdifferent mutation protocols to predict amino acid rotamers most similar tothose frequently explored by MD trajectories of protein-peptide systems [5].The mutation is first generated over the complex without solvent. Then afirst minimization is ran with the predicted side chain alone. Then themutated complex is combined with the equilibrated solvent box from theinput structure file, and a second minimization is ran with the new aminoacid and the water molecules surrounding it within 2˚A. This is done toavoid clashes between the new side chain and water molecules, which couldcrash the simulation. If the minimization crashes, the protocol attempts anew mutation, which is be repeated based on a number of times defined atthe beginning by the user. If the protocol continues presentingminimization problems, the new mutation is generated using as referencethe last accepted sequence. If the problems persists, the protocol stops. Inaddition, it is possible to minimize the input structure before mutating toavoid protein-solvent overlaps. Finally, the complete system is minimizedand subjected to NVT equilibration of 100 picosencods (ps). Details of theMD simulations are described next.
The script general.py runs the MD simulations using Gromacs [26]. Thecurrent PARCE code was tested using version 5.1.4, but any higher version7an be also selected by providing the path in the configuration file (seesupplementary Table A1). The MD simulation time is defined by the user,which by default is 10 nanoseconds (ns) per run. This production run isperformed in the NPT ensemble after minimizing and equilibrating thesystem. The Amber99SB-ILDN protein force field [29] is selected, with aTIP3P water model [30], a Parrinelo-Rahman barostat [31] and a modifiedBerendsen thermostat [32]. The Particle Mesh Ewald method (PME) isused to calculate electrostatic interactions with a threshold of 1.0 nm [33].The leap-frog integrator [34] is used to integrate the equations of motionwith a timestep of 2 femtoseconds (fs). A standard temperature of 310K isselected by default to run the simulations. We note that these variables canbe modified directly in the Gromacs configuration files depending on thesystem and the preferred conditions. After each run, the trajectory and thesimulation-log files are stored for further analysis.
The script scoring.py implements a scoring approach over theconformations from the complex trajectory. Its objective is to rank themutated complex by comparing the predicted activity to that of theprevious complex. A set of scoring functions used for protein-protein andprotein-peptide affinity predictions are implemented to score each snapshotand obtain an average score for each function. The average can becalculated over all the trajectory frames, or just the last half of thesimulation. The code includes six scoring functions: BACH [35, 36, 37],Pisa [38], FireDock [39], Irad [40], Zrank [41], BMF-Bluues [42, 43] andDFIRE-GOAP [44, 45] combinations. These software packages are opensource and are distributed with the PARCE code.
The mutation is accepted or rejected based on a consensus criterionusing multiple scoring functions. The sign of the score difference betweenthe new sequence and the previous one indicates if the mutation isfavorable or not for each scoring function. In this scenario, a negative signof the difference means a potentially better affinity when the mutation isperformed. The consensus criterion states that if the number of scoringfunctions that consider the mutation favorable is higher than a definedthreshold, then mutation is accepted [25]. By default the consensusthreshold is three, but this value can be changed by the user. The8mplemented scores can be customized by the user, and additional scoringfunctions can be added if necessary. After that, the protocol iterates overthe three main phases (mutating, sampling and scoring) and a number ofattempted modifications is achieved (Figure 2).
After running PARCE with a system of reference, a set of folderscontaining structures of each iteration, the MD trajectories, the calculatedscores and the sampling log files are generated locally. The explanation ofthe folders is provided in supplementary Table A3. The files and finalreport can be used for further analysis of the best sequence candidates.
3. Results
PARCE can be installed under any Linux operating system. However, thecode was optimized for Debian and Ubuntu OS distributions. PARCE canbe managed also through a docker container provided in the repository. Weshould note that the computational resources required for running PARCEare determined by the complexity of the system, since the design is based onrunning MD. A configuration file describes the path and characteristics of theinput files, as well as the necessary parameters to run the design protocol. Anexplanation of the parameters and output files is provided in Supplementarytables. The code contains instructions to configure the system and launchthe protocol. We note that all the dependencies required to run PARCE areopen source software, but some of them, such as Scwrl4, require academiclicences. In such cases, it is recommended to install these packages followingthe developer’s documentation to integrate their paths to the code. A listof all the required software, including the scoring functions, is provided inTable 1. All the steps associated to the mutation protocol, the sampling ofthe peptide-protein system and the application of scoring functions have beenoptimized in previous works of specific systems, supporting the choice of thedefault parameters. [5, 18, 14]. PARCE has an MIT licence that allows forthe distribution of the code and its improvement through new functionalities.
As a tutorial, we use a protease in complex with a modelled peptide.The protease is obtained from the crystallized structure (PDB code 1ppg).9 able 1: List of third-party tools and scoring functions required to run the PARCE.
Name Version/YearGromacs 5.1.4Scwrl 4.0GromacsWrapper (Python3) 0.8BioPython (Python3) 1.76PDB2PQR 2.1.1Bluues 2.0BMF 3.0Pisa 2011Zrank 2007BACH 6.0DFIRE-GOAP 2011FireDock 2007Irad 2011This is a neutrophil elastase, which is a serine protease part of thechymotrypsin family [46]. The enzyme is bound to a peptidomimeticinhibitor, so we modified its sequence to model a reported peptide-substrateavailable in the MEROPS database [47]. Specifically, the protease bindingpockets were annotated according to its catalytic site. Then, the modifiedamino acids were replaced by natural amino acids found in the substratesequence using Scwrl4. The peptide was completed to reach a final size ofeight amino acids (AAPAAAPP), which is characteristic of these proteas epeptide binders [48]. These missing amino acids were predicted with theModeller software [49]. The created complex was subjected to 100 ns of MDsimulations using the same MD configuration as explained previously. Weran the PARCE protocol to perform 30 random mutation attempts, usingsix scoring functions with a consensus threshold of three. The evolution ofthe scores was monitored to check if the scores -on average- decreased as afunction of the MC iteration step.The protocol was ran using 5 ns of conformational sampling permutation for 30 attempts. For this specific run, a total of 6 mutations wereaccepted based on the consensus criterion explained in Methods ( i.e., ifthree or more scores consider the mutation as favorable then it is accepted).During the PARCE run it is important to monitor if the scoring functionsare -on average- lowering their values. An example is shown in Figure 3 for10he scoring function BMF-Bluues. The sequence of the peptide shouldevolve towards a sequence with potentially better affinity.
Figure 3: Example of a PARCE run showing one (BMF-Bluues) of the 6 scoring functionswith the protease-peptide system (see the Methods). The dots represent the mutationsthat were accepted, with the structure of the protein in orange, the peptide in purple, andthe accepted mutations in green.
Despite the ideal behavior of BMF-Bluues, some of the scoring functionshave slightly different behaviours. An example of three other scoringfunctions behaviour is shown in Figure 4. For three of the four plottedfunctions, the scores get lower in the last steps. In the case of theDFIRE-GOAP combination (Figure 4D), the score fluctuates up and downwithout a defined trend. The consensus methodology ensures that onaverage most -but not all- of the scoring functions decrease. Therefore, ifthere is a poor-performing scoring function the method does not force it to11inimize ( e.g. , DFIRE-GOAP in Figure 4). This is an advantage becausesome scoring functions might perform well for some particular systems butfail for others, and knowing beforehand which scoring functions are the bestfor a particular system is challenging. We note that the evolution isstochastic (due to the random mutations) therefore for different runs therecould be different behaviors. However, if most of the scoring functions havean erratic performance, then the scoring function configuration should bere-evaluated. The user has the possibility to modify the selection of thescoring functions and the threshold to define if a mutation is accepted ornot. Moreover, if the system has available binding data, it might be usefulto benchmark the MD/scoring methodology with it. We applied the latestin the case of the Major Histocompatibility Complex class II (MHC-II),where we benchmarked a set of scoring functions and MD configurations torank bound peptides in agreement with available experimental data [24].We note that we have tested protocol extensively over other protein-peptide systems [7, 18].
Because the methodology is computationally expensive, theimplementation can take weeks of calculations for running 50 or 100mutation-attempts, depending on the system and the availablecomputational resources. The PARCE computational cost is dominated bythe MD simulation time ( i.e. , number of ns) multiplied by the number ofmutation attempts. Other relevant factors are the system size, and thenumber of frames used to score the trajectories. We recommend toconfigure the protocol to have a sufficiently good MD sampling and anumber of mutation-attempts sufficient to explore efficiently the peptidesequence space.
4. Discussion
The PARCE code is an open source software to design peptides or proteinscapable of binding with higher affinity to a protein target. The methodcombines computational biophysics tools with bioinformatics, in order toachieve an equilibrium between accuracy and computational efficiency. Oneadvantage is the possibility to obtain in an stochastic way peptide candidates,which can be different independent runs. This increases the pool of sequencesavailable for further filtering and validation. This is an advantage against12 igure 4: Evolution of four of the six scoring functions used in PARCE for the examplerun with the protease-peptide system. The large and red dots correspond to the acceptedand rejected mutations, respectively. The results are shown for (A) BMF-Bluues (samedata as in Figure 3), (B) Pisa, (C) Firedock and (D) DFIRE-GOAP. more deterministic or brute force alternatives. Moreover, since the code isopen source, it is possible to improve it according to the research-projectneeds.Most of the available protocols of peptide design have been built alsowith open source software, and some are available to users through webservers. This is the case of Peptiderive, which identifies the linearpolypeptide segment that contributes most to the binding energy given aprotein-protein interaction structure [50]. Another server, PepComposer,requires only the target protein structure in order to retrieve a set ofpeptide-backbone scaffolds derived from monomeric proteins. Then, thepeptides are optimized using a set of iterative mutations through controlled13ackbone movements [51]. Finally, other initiatives about the sampling andscoring peptides as binders are also available as web servers or codeprotocols that can be customized [52, 53, 54].Despite the efforts, the balance between the computational efficiencyand biological accuracy is still a major challenge, motivating thedevelopment and validation of novel pipelines as the one proposed here.PARCE can accelerate the discovery of novel peptides as potentialtherapeutics, biomarkers or vaccine sub-units. The code is flexible. Itallows adding protocols to modify different types of molecular targets, suchas small molecules, contributing to the engineering of peptides and proteinsfor a broad spectrum of applications.
5. Acknowledgements
R.O. and P.C. were supported by Colciencias, University of Antioquia,Ruta N, Colombia, and the Max Planck Society, Germany. Thecomputations were performed in a local server with an NVIDIA Titan XGPU. P.C. gratefully acknowledges the support of NVIDIA Corporation forthe donation of this GPU. Authors acknowledge Prof. Fogolari fromUniversity of Udine for providing the scoring functions Bluues and BMF.
References [1] P. Sormanni, F. A. Aprile, M. Vendruscolo, Third generation antibodydiscovery methods: in silico rational design, Chem. Soc. Rev. 47 (24)(2018) 9137–9157.[2] D. Jureti´c, D. Vukiˇcevi´c, D. Petrov, M. Novkovi´c, V. Bojovi´c,B. Luˇci´c, N. Ili´c, A. Tossi, Knowledge-based computational methodsfor identifying or designing novel, non-homologous antimicrobialpeptides, Eur. Biophys. J. 40 (4) (2011) 371–385. doi:10.1007/s00249-011-0674-7 .[3] W. Porto, O. Silva, O. Franco, Prediction and Rational Design ofAntimicrobial Peptides, in: Protein Structure, no. April 2012, InTech,2012, pp. 377–396. doi:10.5772/38023 .[4] K. Boone, K. Camarda, P. Spencer, C. Tamerler, Antimicrobialpeptide similarity and classification through rough set theory using14hysicochemical boundaries, BMC Bioinformatics 19 (1) (2018) 469. doi:10.1186/s12859-018-2514-6 .[5] R. Ochoa, M. A. Soler, A. Laio, P. Cossio, Assessing the capabilityof in silico mutation protocols for predicting the finite temperatureconformation of amino acids, Phys. Chem. Chem. Phys. 20 (40) (2018)25901–25909. doi:10.1039/C8CP03826K .[6] S. Hansen, J. D. Kiefer, C. Madhurantakam, P. R. Mittl, A. Pl¨uckthun,Structures of designed armadillo repeat proteins binding to peptidesfused to globular domains, Protein Sci. 26 (10) (2017) 1942–1952. doi:10.1002/pro.3229 .[7] F. Guida, A. Battisti, I. Gladich, M. Buzzo, E. Marangon, L. Giodini,G. Toffoli, A. Laio, F. Berti, Peptide biosensors for anticancerdrugs: Design in silico to work in denaturizing environment, Biosens.Bioelectron. 100 (2018) 298–303. doi:10.1016/j.bios.2017.09.012 .[8] P. Sormanni, F. A. Aprile, M. Vendruscolo, Rational design of antibodiestargeting specific epitopes within intrinsically disordered proteins,PNAS 112 (32) (2015) 9902–9907.[9] W. J. Van Patten, R. Walder, A. Adhikari, S. R. Okoniewski,R. Ravichandran, C. E. Tinberg, D. Baker, T. T. Perkins, Improvedfree-energy landscape quantification illustrated with a computationallydesigned protein–ligand interaction, Chem Phys Chem 19 (1) (2018)19–23.[10] J. Adolf-Bryfogle, O. Kalyuzhniy, M. Kubitz, B. D. Weitzner, X. Hu,Y. Adachi, W. R. Schief, R. L. Dunbrack Jr, Rosettaantibodydesign(rabd): A general framework for computational antibody design, PLOSComput. Biol. 14 (4) (2018) e1006112.[11] I. H. Moal, M. Torchala, P. a. Bates, J. Fern´andez-Recio, Thescoring of poses in protein-protein docking: current capabilities andfuture directions., BMC bioinformatics 14 (2013) 286. doi:10.1186/1471-2105-14-286 .[12] H.-J. B¨ohm, D. W. Banner, L. Weber, Combinatorial docking andcombinatorial chemistry: Design of potent non-peptide thrombin15nhibitors, J. Comput. Aided Mol. Des. 13 (1) (1999) 51–56. doi:10.1023/A:1008040531766 .[13] M. Kurcinski, M. Jamroz, M. Blaszczyk, A. Kolinski, S. Kmiecik, CABS-dock web server for the flexible docking of peptides to proteins withoutprior knowledge of the binding site, Nucleic Acids Res. 43 (W1) (2015)W419–W424. arXiv:1503.02032 , doi:10.1093/nar/gkv456 .[14] I. Gladich, A. Rodriguez, R. P. Hong Enriquez, F. Guida, F. Berti,A. Laio, Designing High-Affinity Peptides for Organic Molecules byExplicit Solvent Molecular Dynamics, J. Phys. Chem. B 119 (41) (2015)12963–12969. doi:10.1021/acs.jpcb.5b06227 .[15] S. Genheden, U. Ryde, The MM/PBSA and MM/GBSA methods toestimate ligand-binding affinities, Expert Opin. Drug Dis. 10 (5) (2015)449–461. doi:10.1517/17460441.2015.1032936 .[16] C. Obiol-Pardo, J. Rubio-Martinez, Comparative evaluation ofMMPBSA and XSCORE to compute binding free energy in XIAP-peptide complexes, J. Chem. Inf. Model. 47 (1) (2007) 134–142. doi:10.1021/ci600412z .[17] K. Wichapong, J. E. Alard, A. Ortega-Gomez, C. Weber, T. M.Hackeng, O. Soehnlein, G. A. F. Nicolaes, Structure-Based Design ofPeptidic Inhibitors of the Interaction between CC Chemokine Ligand5 (CCL5) and Human Neutrophil Peptides 1 (HNP1), J. Med. Chem.59 (9) (2016) 4289–4301. doi:10.1021/acs.jmedchem.5b01952 .[18] M. A. Soler, S. Fortuna, A. de Marco, A. Laio, Binding affinityprediction of nanobodyprotein complexes by scoring of moleculardynamics trajectories, Phys. Chem. Chem. Phys. 20 (5) (2018) 3438–3444. doi:10.1039/C7CP08116B .[19] R. Ochoa, S. J. Watowich, A. Fl´orez, C. V. Mesa, S. M. Robledo,C. Muskus, Drug search for leishmaniasis: a virtual screening approachby grid computing, J. Comput. Aided Mol. Des. 30 (7) (2016) 541–552. doi:10.1007/s10822-016-9921-4 .URL http://link.springer.com/10.1007/s10822-016-9921-4 [20] E. Sarti, I. Gladich, S. Zamuner, B. E. Correia, A. Laio, Protein–protein structure prediction by scoring molecular dynamics trajectories16f putative poses, Proteins: Struct. Funct. Bioinf. 84 (9) (2016) 1312–1320.[21] R. P. Hong Enriquez, S. Pavan, F. Benedetti, A. Tossi, A. Savoini,F. Berti, A. Laio, Designing short peptides with high affinity for organicmolecules: A combined docking, molecular dynamics, and Monte Carloapproach, J. Chem. Theory Comput. 8 (3) (2012) 1121–1128. doi:10.1021/ct200873y .[22] A. Russo, P. L. Scognamiglio, R. P. H. Enriquez, C. Santambrogio,R. Grandori, D. Marasco, A. Giordano, G. Scoles, S. Fortuna, In silicogeneration of peptides by replica exchange monte carlo: Docking-basedoptimization of maltose-binding-protein ligands, PLOS ONE 10 (8)(2015) 1–16. doi:10.1371/journal.pone.0133571 .[23] M. A. Soler, A. Rodriguez, A. Russo, A. F. Adedeji, C. J. DongmoFoumthuim, C. Cantarutti, E. Ambrosetti, L. Casalis, A. Corazza,G. Scoles, D. Marasco, A. Laio, S. Fortuna, Computational design ofcyclic peptides for the customized oriented immobilization of globularproteins, Phys. Chem. Chem. Phys. 19 (4) (2017) 2740–2748. doi:10.1039/C6CP07807A .[24] R. Ochoa, A. Laio, P. Cossio, Predicting the Affinity of Peptidesto Major Histocompatibility Complex Class II by Scoring MolecularDynamics Simulations, J. Chem. Inf. Model. 59 (8) (2019) 3464–3473. doi:10.1021/acs.jcim.9b00403 .[25] M. A. Soler, B. Medagli, M. S. Semrau, P. Storici, G. Bajc, A. de Marco,A. Laio, S. Fortuna, A consensus protocol for the in silico optimisationof antibody fragments, Chem. Commun. 55 (93) (2019) 14043–14046. doi:10.1039/C9CC06182G .[26] B. Hess, C. Kutzner, D. van der Spoel, E. Lindahl, GROMACS 4:Algorithms for highly efficient, load balanced, and scalable molecularsimulations, J. Chem. Theory Comput. 4 (2008) 435–447. doi:10.1021/ct700301q .[27] G. G. Krivov, M. V. Shapovalov, R. L. Dunbrack, Improved Predictionof Protein Side-Chain Conformations with SCWRL4, Proteins: Struct.Funct. Bioinf. 77 (4) (2009) 778–795. doi:10.1002/prot.22488 .1728] L. X. Peterson, X. Kang, D. Kihara, Assessment of protein side-chainconformation prediction methods in different residue environments,Proteins: Struct. Funct. Bioinf. 82 (9) (2014) 1971–1984. doi:10.1002/prot.24552 .[29] K. Lindorff-Larsen, S. Piana, K. Palmo, P. Maragakis, J. L. Klepeis,R. O. Dror, D. E. Shaw, Improved side-chain torsion potentials for theAmber ff99SB protein force field, Proteins: Struct. Funct. Bioinf. 78 (8)(2010) 1950–1958. doi:10.1002/prot.22711 .[30] W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, M. L.Klein, Comparison of simple potential functions for simulating liquidwater, J. Chem. Phys. 79 (2) (1983) 926–935. doi:10.1063/1.445869 .[31] M. Parrinello, A. Rahman, Crystal structure and pair potentials: Amolecular dynamics study, Phys. Rev. Lett. 45 (14) (1980) 1196–1199. doi:10.1103/PhysRevLett.45.1196 .[32] G. Bussi, D. Donadio, M. Parrinello, Canonical sampling throughvelocity rescaling, J. Chem. Phys. 126 (1) (2007) 014101.[33] M. Di Pierro, R. Elber, B. Leimkuhler, A stochastic algorithm for theisobaric-isothermal ensemble with Ewald summations for all long rangeforces, ”J. Chem. Theory Comput. 11 (12) (2015) 5624–5637. doi:10.1021/acs.jctc.5b00648 .[34] D. Janeˇziˇc, F. Merzel, An efficient symplectic integration algorithmfor molecular dynamics simulations, J. Chem. Inf. Comput. Sci. 35 (2)(1995) 321–326. doi:10.1021/ci00024a022 .[35] P. Cossio, D. Granata, A. Laio, F. Seno, A. Trovato, A Simpleand Efficient Statistical Potential for Scoring Ensembles of ProteinStructures, Sci. Rep. 2 (2012) 1–8. doi:10.1038/srep00351 .[36] E. Sarti, S. Zamuner, P. Cossio, A. Laio, F. Seno, A. Trovato, Bachscore.a tool for evaluating efficiently and reliably the quality of large sets ofprotein structures, Comput. Phys. Commun. 184 (12) (2013) 2860–2865.[37] E. Sarti, D. Granata, F. Seno, A. Trovato, A. Laio, Native Foldand Docking Pose Discrimination by the Same Residue-based Scoring18unction, Proteins: Struct. Funct. Bioinf. 83 (4) (2015) 621–630. doi:10.1002/prot.24764 .[38] E. Krissinel, K. Henrick, Inference of Macromolecular Assemblies fromCrystalline State, J. Mol. Biol. 372 (3) (2007) 774–797. arXiv:arXiv:1011.1669v3 , doi:10.1016/j.jmb.2007.05.022 .[39] N. Andrusier, R. Nussinov, H. J. Wolfson, FireDock: Fast InteractionRefinement in Molecular Docking, Proteins: Struct. Funct. Bioinf. 69 (1)(2007) 139–159. arXiv:0605018 , doi:10.1002/prot.21495 .[40] T. Vreven, H. Hwang, Z. Weng, Integrating Atom-based and Residue-based Scoring Functions for Protein-Protein Docking, Protein Sci. 20 (9)(2011) 1576–1586. doi:10.1002/pro.687 .[41] B. Pierce, Z. Weng, ZRANK: Reranking Protein Docking Predictionswith an Optimized Energy Function, Proteins: Struct. Funct. Bioinf.67 (4) (2007) 1078–1086. arXiv:0605018 , doi:10.1002/prot.21373 .URL http://doi.wiley.com/10.1002/prot.21373 [42] M. Berrera, H. Molinari, F. Fogolari, Amino Acid Empirical ContactEnergy Definitions for Fold Recognition in the Space of Contact Maps,BMC Bioinformatics 4 (1) (2003) 8. doi:10.1186/1471-2105-4-8 .[43] F. Fogolari, A. Corazza, V. Yarra, A. Jalaru, P. Viglino, G. Esposito,Bluues: a Program for the Analysis of the Electrostatic Propertiesof Proteins Based on Generalized Born Radii, BMC Bioinformatics13 (Suppl 4) (2012) S18. doi:10.1186/1471-2105-13-S4-S18 .[44] Y. Yang, Y. Zhou, Specific interactions for ab initio folding of proteinterminal regions with secondary structures, Proteins: Struct. Funct.Genet. 72 (2) (2008) 793–803. doi:10.1002/prot.21968 .[45] H. Zhou, J. Skolnick, GOAP: A generalized orientation-dependent, all-atom statistical potential for protein structure prediction, Biophys. J.101 (8) (2011) 2043–2052. doi:10.1016/j.bpj.2011.09.012 .[46] W. An-Zhi, I. Mayr, W. Bode, The refined 2.3 ˚a crystal structure ofhuman leukocyte elastase in a complex with a valine chloromethyl ketoneinhibitor, FEBS Lett. 234 (2) (1988) 367–373.1947] N. D. Rawlings, A. J. Barrett, P. D. Thomas, X. Huang, A. Bateman,R. D. Finn, The MEROPS database of proteolytic enzymes, theirsubstrates and inhibitors in 2017 and a comparison with peptidases inthe PANTHER database, Nucleic Acids Res. 46 (D1) (2018) D624–D632. doi:10.1093/nar/gkx1134 .[48] I. Schechter, A. Berger, On the size of the active site in proteases. I.Papain, Biochem. Biophys. Res. Commun. 27 (2) (1967) 157–162. doi:10.1016/S0006-291X(67)80055-X .[49] M. A. Marti-Renom, A. C. Stuart, R. Sanchez, F. Melo, A. Sali,Comparative Protein Structure Modeling of Genes and Genomes, Annu.Rev. Biophys. 29 (2000) 291–325.[50] Y. Sedan, O. Marcu, S. Lyskov, O. Schueler-Furman, Peptiderive server:derive peptide inhibitors from proteinprotein interactions, Nucleic AcidsRes. 44 (W1) (2016) W536–W541. doi:10.1093/nar/gkw385 .[51] A. Obarska-Kosinska, A. Iacoangeli, R. Lepore, A. Tramontano,PepComposer: computational design of peptides binding to a givenprotein surface, Nucleic Acids Res. 44 (W1) (2016) W522–W528. doi:10.1093/nar/gkw366 .[52] C. A. King, P. Bradley, Structure-based prediction of protein–peptidespecificity in rosetta, Proteins: Struct. Funct. Bioinf. 78 (16) (2010)3437–3449.[53] C. A. Smith, T. Kortemme, Backrub-like backbone simulationrecapitulates natural protein conformational variability and improvesmutant side-chain prediction, J. Mol. Biol. 380 (4) (2008) 742–756.[54] B. Raveh, N. London, O. Schueler-Furman, Sub-angstrom modeling ofcomplexes between flexible peptides and globular proteins, Proteins:Struct. Funct. Bioinf. 78 (9) (2010) 2029–2040. doi:10.1002/prot.22716 .[55] D. Spiliotopoulos, P. L. Kastritis, A. S. J. Melquiond, A. M. J. J.Bonvin, G. Musco, W. Rocchia, A. Spitaleri, dMM-PBSA: A NewHADDOCK Scoring Function for Protein-Peptide Docking, Front. Mol.Biosci. 3 (August) (2016) 46. doi:10.3389/fmolb.2016.00046 .2056] G. Bhardwaj, V. K. Mulligan, C. D. Bahl, J. M. Gilmore, Accuratede novo design of hyperstable constrained peptides, Nature 538 (7625)(2016) 329–335. arXiv:NIHMS150003 , doi:10.1038/nature19791 .[57] A. Chevalier, D.-a. Silva, G. J. Rocklin, D. R. Hicks, R. Vergara,P. Murapa, S. M. Bernard, L. Zhang, K.-H. Lam, G. Yao, C. D.Bahl, S.-i. Miyashita, I. Goreshnik, J. T. Fuller, M. T. Koday, C. M.Jenkins, T. Colvin, L. Carter, A. Bohn, C. M. Bryan, D. A. Fern´andez-Velasco, L. Stewart, M. Dong, X. Huang, R. Jin, I. A. Wilson,D. H. Fuller, D. Baker, Massively parallel de novo protein design fortargeted therapeutics, Nature 550 (7674) (2017) 74–79. doi:10.1038/nature23912 . Appendix A. Supplementary tables able A1: Parameters provided by the user in the configuration file. Parameter Explanationfolder Name of the folder that has all the input and output files ofthe protocolmode The design mode, which has three possible options: start(start the protocol from zero), restart (start from a particulariteration of a previous run) and *nothing* (just run withoutmodifying existing files)peptide reference The sequence of the peptide, or protein fragment that will bemodifiedpdbID Name of the structure that is used as input, which containsthe protein, the peptide/protein and the solvent moleculeschain Chain id of the peptide/protein in the structural complexsim time Time in nanoseconds that will be used to sample the complexafter each mutationnum mutations Number of mutations that will be attemptedtry mutations Number of mutations tried after having minimization orequilibration problemsresidues mod These are the specific positions of the residues that want tobe modified. This depends on the peptide/protein length andthe numbering in the PDB filemd route Path to the folder containing the input files, which are thefiles used during the previous MD sampling of the systemmd original Name of the system file located in the folder containing theprevious MD samplingscore list List of the scoring functions that will be used to calculate theconsensus. Currently the package only has available the codefor six of them: BACH, Pisa, ZRANK, IRAD, BMF-BLUUESand FireDockhalf flag Flag that controls which part of the trajectory is used toobtain the average score. If 0, the full trajectory is used,if 1, only the last halfthreshold Threshold used for the consensus. If the number of scoringfunctions in agreement are equal or greater than the threshold,then the mutation is accepted.scwrl path Provide the path to Scwrl4 in case it is not installed in aPATH folder.gmxrc path Provide the path to GMXRC for Gromacs.22 able A2: Example of PARCE report using as input a peptide binder with sequenceAAPFAAPP. The report includes the iteration number, the mutation, the acceptance, theaverage scores and the attempted peptide sequence.
Iteration Mutation Status BACH Pisa Zrank Irad BMF-BLUUES Firedock Sequence1 AB4F Accepted -3.514 -0.209 -59.637 -101.285 -30.542 -44.041 AAPFAAPP2 AB2G Rejected -4.705 -0.198 -56.245 -101.161 -34.482 -37.784 AGPFAAPP3 PB3Q Accepted -4.128 -0.273 -63.155 -108.444 -34.122 -46.402 AAQFAAPP4 PB8R Rejected -3.568 -0.237 -56.869 -100.154 -32.223 -37.734 AAQFAAPR5 AB1Q Accepted -3.463 -0.245 -65.987 -116.340 -35.730 -47.378 QAQFAAPP