[PDF] PARCE: Protocol for Amino acid Refinement through Computational Evolution

Abstract

The in silico design of peptides and proteins as binders is useful for diagnosis and therapeutics due to their low adverse effects and major specificity. To select the most promising candidates, a key matter is to understand their interactions with protein targets. In this work, we present PARCE, an open source Protocol for Amino acid Refinement through Computational Evolution that implements an advanced and promising method for the design of peptides and proteins. The protocol performs a random mutation in the binder sequence, then samples the bound conformations using molecular dynamics simulations, and evaluates the protein-protein interactions from multiple scoring. Finally, it accepts or rejects the mutation by applying a consensus criterion based on binding scores. The procedure is iterated with the aim to explore efficiently novel sequences with potential better affinities toward their targets. We also provide a tutorial for running and reproducing the methodology.

Full PDF

PPARCE: Protocol for Amino acid Reﬁnement throughComputational Evolution

Rodrigo Ochoa a , Miguel A. Soler b , Alessandro Laio c,d , Pilar Cossio a,e, ∗ a Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia,050010 Medellin, Colombia b Italian Institute of Technology (IIT), Via Melen 83, B Block, 16152, Genova, Italy c International School for Advanced Studies (SISSA), Via Bonomea 265, I-34136 Trieste,Italy d The Abdus Salam International Centre for Theoretical Physics (ICTP), Strada Costiera11, 34151 Trieste, Italy e Department of Theoretical Biophysics, Max Planck Institute of Biophysics, 60438Frankfurt am Main, Germany

Abstract

The in silico design of peptides and proteins as binders is useful fordiagnosis and therapeutics due to their low adverse eﬀects and majorspeciﬁcity. To select the most promising candidates, a key matter is tounderstand their interactions with protein targets. In this work, we presentPARCE, an open source Protocol for Amino acid Reﬁnement throughComputational Evolution that implements an advanced and promisingmethod for the design of peptides and proteins. The protocol performs arandom mutation in the binder sequence, then samples the boundconformations using molecular dynamics simulations, and evaluates theprotein-protein interactions from multiple scoring. Finally, it accepts orrejects the mutation by applying a consensus criterion based on bindingscores. The procedure is iterated with the aim to explore eﬃciently novelsequences with potential better aﬃnities toward their targets. We alsoprovide a tutorial for running and reproducing the methodology.

Keywords:

Amino acids; Bioinformatics; Molecular simulations; Mutation

PROGRAM SUMMARY ∗ Corresponding author.

E-mail address: [email protected]; [email protected]

Preprint submitted to Computer Physics Communications April 29, 2020 a r X i v : . [ q - b i o . B M ] A p r rogram Title: PARCELicensing provisions: MIT LicenseProgramming language: Python 3Operating system: Linux, and docker container available to run the protocolunder other operating systems.Subprograms used: Gromacs 5.1.4, Scwrl 4, PDB2PQR, Scoring functions sourcecode.Nature of problem: Computational design of peptides and proteins as binders fordiagnosis and therapeutics.Solution method: A protocol that performs random mutations in the bindersequence, samples the bound conformations using molecular dynamicssimulations, and evaluates the protein-protein interactions from multiple scoringpredictions in order to accept or reject the mutations.Additional comments: This article describes version 1.0. PARCE is available at:https://github.com/PARCE-project/PARCE-1, and a Docker container can bedownloaded from: https://hub.docker.com/r/rochoa85/parce-1

1. Introduction

The rational design of proteins and peptides is a ﬁeld that has beenexplored through experimental and computational approaches [1]. Whendetailed information of a particular system is available, conventionalknowledge-based methods provide tools for the design of more activeanalogs [2]. However, detailed information is not always available, and thus, de novo design strategies are required.Due to the signiﬁcant amount of information related to the properties ofnatural amino acids, one strategy is to design proteins and peptides bymodifying directly the amino acid sequence, without taking into accountthe structure. This strategy is commonly used in the design ofantimicrobial peptides, where the mechanism of action has been elucidatedfor some families [3]. Physico-chemical properties such as hydrophobicity,charge, and molecular weight, have been used to create libraries of analogswith potentially similar activities [4]. However, these strategies lackinformation about the interacting partners, and on how the conformationallandscape of the residues can aﬀect their activity [5]. This motivates theuse of structure-based design [6, 7].The most used structure-based strategies are molecular docking and the2se of structural libraries. Approaches based on experimental structuraldatabases take advantage on the huge quantity of protein structuralinformation to predict and optimize the most probable bindingconformations [8, 9, 10]. Molecular docking identiﬁes the most probableposes of potential active molecules ranked by scoring functions that roughlyapproximate binding free energies or discriminate between active moleculesand decoys [11, 12]. In spite of these scoring-function improvements,protein and peptide design is still a computational challenge due to thelarge sequence space that has to be explored ( e.g. , for a peptide of 10 aminoacids in length, 20 sequences are possible). In addition, the intrinsicﬂexibility of non-structured protein regions, such as peptides or thecomplementary determining regions of antibodies, and the constraintsassociated to the protein interface region aﬀect the reliability of theprediction [13]. To solve some of these problems, it has been proposed tosample the bound conformations using Molecular Dynamics (MD) [14].The prediction of the binding aﬃnities is another challenge associatedto the design. Diﬀerent alternatives exist based on the biological systemand the available computational infrastructure. One option is the use ofrepresentative structures from MD trajectories to calculate energy termsbased on molecular mechanics assumptions, entropy terms and free energiesof solvation using continuum-solvent models [15, 16, 17]. However, theseapproaches are computationally expensive. An alternative is averaging thescoring over conformational ensembles to estimate the aﬃnity [18, 19, 20].We here describe a new tool for performing peptide and protein designusing MD. We called this tool Protocol for Amino acid Reﬁnement throughComputational Evolution (PARCE). The algorithm implemented inPARCE is inspired by previous projects related to the exploration of thesequence space of peptides bound to proteins or small molecules[21, 14, 22]. The algorithm includes a single-point mutation protocol, whichhas been optimized for predicting the most common rotamers of peptideamino acids [5], an all-atom MD simulation of the mutated complex andthe estimate of the average score of the conformations from the trajectoryto assess the impact of the mutation in the binding [14, 23]. This approachhas been successfully used to design peptides and protein fragments, whosebinding aﬃnity was afterward conﬁrmed experimentally [24, 18, 25].PARCE implements multiple scoring functions that are evaluated using aconsensus metric for accepting or rejecting the mutations [25]. The processis iterated with the aim to evolve the original sequence and explore3ﬃciently novel sequences with potential better aﬃnities toward theirtargets [7]. Figure 1 shows a graphical summary of PARCE. Figure 1: Summary of PARCE, which is composed of diﬀerent phases. First, a startingprotein-peptide or protein-protein complex is used. Then, the MC cycle starts bygenerating a single-point mutation of a random amino acid in the peptide structure. Then,the conformational sampling of the mutated system is done with MD simulations. Scoringof the trajectory snapshots and the calculation of averages using multiple scoring functionsis performed. Finally, a consensus criterion accepts or rejects the mutation and an update(or not) of the sequence is performed. The process is iterated.

The computational protocol presented in this work is open source. Themanuscript is organized as follows. First, we explain the code architecture,the simulation parameters and the system used to test the PARCEfunctionalities. Then, we include details about the code requirements andthe performance of the protocol using a protein-peptide system as areference. Finally we discuss PARCE features in comparison with those ofsimilar software. 4 . Methods

The code is written in Python 3, with calls to Unix command lines formanipulation of ﬁles and external programs. Therefore, the currentprotocol depends on the availability of a bash environment to run. Toguarantee reproducibility, a docker container is provided with all therequirements, including the installation of the third-party open sourcesoftware, and the validation of the PARCE functionalities. However, a usercan conﬁgure her/his own machine following a simple setup validated withthe Travis CI framework (https://travis-ci.com). The documentationillustrates the deﬁnition of the parameters, the input options and ﬁles (seesupplementary Table A1). Figure 2 shows a workﬂow of the protocol basedon the tasks, scripts and software implemented at each step. In thefollowing, we present a description of each stage of PARCE.

Before applying the design protocol it is necessary to perform an MDsimulation of the starting complex to equilibrate the system, and also toobtain MD ﬁles required to start the protocol. When the starting system isobtained from a docked structural complex or a crystal structure, a longerprevious sampling of the complex is suggested. The equilibration of thecomplex can be detected using the RMSD or observables such as the numberof hydrogen bonds between the peptide and protein. We note that for mostsystems 200 ns is suitable for this purpose. If the protocol starts from acrystallized protein-peptide complex, the initial sampling time could be lower[24].The workﬂow requires as input: • A PDB and GRO (Gromacs format) ﬁle of the protein-peptide complexsolvated and equilibrated with MD using the Gromacs package [26]. • Topology ﬁles and MD parameters deﬁnitions.The protocol starts with an initial NPT simulation of the system usingthe Gromacs conﬁguration ﬁles ( i.e. mdp ﬁles). Advanced users have thepossibility of modifying the Gromacs conﬁguration ﬁles for adapting these tosimulate the speciﬁc systems. 5 igure 2: Workﬂow of PARCE. The protocol is managed by the script run protocol.py ,which accepts as input a conﬁguration ﬁle with all the parameters required to run thedesign, and a folder with the structural input obtained from a previous MD trajectory.After conﬁguring all the ﬁles and folders, and runing MD simulations of the initial complex,the ﬁrst step is to perform a single-point mutation with the mutation.py script, whichselects a random position on the peptide and modiﬁes its side chain by a random aminoacid. From there, the second step is to call the general.py script, responsible to samplethe new system. The third step is to score the trajectory using a set of scoring functionsavailable in the scoring.py ﬁle. The fourth step is to verify if the score diﬀerence betweenthe current state and the previous one satisﬁes a consensus threshold. If that happensthen the system is updated. The process is repeated during a number of attempts deﬁnedby the user.

The code has the main script run protocol.py , which calls three mainmodules: general functions, scoring functions and mutation functions. Thegeneral module creates an object responsible to run the simulations,perform the random mutations and score the trajectories to accept or rejectthe mutations. In a folder called design output , all the results are storedstep-by-step, including the diﬀerent molecular entities (in complex orisolated), as well as the MD trajectories, the log ﬁles and the scores6alculated for each frame. In the end, a report ﬁle summarizes themutations attempted, if accepted or rejected, the average scores of eachfunction and the updated peptide sequence. An example is provided insupplementary Table A2. The number of MC steps and the MD simulationtime can be modiﬁed for making the protocol computationally eﬃcient.

The script mutation.py randomly performs a single-point mutation overthe peptide. Currently, the code randomly selects a residue on the peptidechain and performs a random mutation using a uniform distribution for theaminoacids. However, this procedure can be customized, for example, byavoiding certain problematic amino acids, such cysteines, or including a non-uniform probability distribution for the location or aminoacid-type mutation.The prediction of the new amino acid rotamer is made with Scwrl4[27, 28], which selects the side chain conformations based on a library ofbackbone-dependent rotamers previously extracted from crystallized proteinstructures. The program was chosen based on a previous assessment ofdiﬀerent mutation protocols to predict amino acid rotamers most similar tothose frequently explored by MD trajectories of protein-peptide systems [5].The mutation is ﬁrst generated over the complex without solvent. Then aﬁrst minimization is ran with the predicted side chain alone. Then themutated complex is combined with the equilibrated solvent box from theinput structure ﬁle, and a second minimization is ran with the new aminoacid and the water molecules surrounding it within 2˚A. This is done toavoid clashes between the new side chain and water molecules, which couldcrash the simulation. If the minimization crashes, the protocol attempts anew mutation, which is be repeated based on a number of times deﬁned atthe beginning by the user. If the protocol continues presentingminimization problems, the new mutation is generated using as referencethe last accepted sequence. If the problems persists, the protocol stops. Inaddition, it is possible to minimize the input structure before mutating toavoid protein-solvent overlaps. Finally, the complete system is minimizedand subjected to NVT equilibration of 100 picosencods (ps). Details of theMD simulations are described next.

The script general.py runs the MD simulations using Gromacs [26]. Thecurrent PARCE code was tested using version 5.1.4, but any higher version7an be also selected by providing the path in the conﬁguration ﬁle (seesupplementary Table A1). The MD simulation time is deﬁned by the user,which by default is 10 nanoseconds (ns) per run. This production run isperformed in the NPT ensemble after minimizing and equilibrating thesystem. The Amber99SB-ILDN protein force ﬁeld [29] is selected, with aTIP3P water model [30], a Parrinelo-Rahman barostat [31] and a modiﬁedBerendsen thermostat [32]. The Particle Mesh Ewald method (PME) isused to calculate electrostatic interactions with a threshold of 1.0 nm [33].The leap-frog integrator [34] is used to integrate the equations of motionwith a timestep of 2 femtoseconds (fs). A standard temperature of 310K isselected by default to run the simulations. We note that these variables canbe modiﬁed directly in the Gromacs conﬁguration ﬁles depending on thesystem and the preferred conditions. After each run, the trajectory and thesimulation-log ﬁles are stored for further analysis.

The script scoring.py implements a scoring approach over theconformations from the complex trajectory. Its objective is to rank themutated complex by comparing the predicted activity to that of theprevious complex. A set of scoring functions used for protein-protein andprotein-peptide aﬃnity predictions are implemented to score each snapshotand obtain an average score for each function. The average can becalculated over all the trajectory frames, or just the last half of thesimulation. The code includes six scoring functions: BACH [35, 36, 37],Pisa [38], FireDock [39], Irad [40], Zrank [41], BMF-Bluues [42, 43] andDFIRE-GOAP [44, 45] combinations. These software packages are opensource and are distributed with the PARCE code.

The mutation is accepted or rejected based on a consensus criterionusing multiple scoring functions. The sign of the score diﬀerence betweenthe new sequence and the previous one indicates if the mutation isfavorable or not for each scoring function. In this scenario, a negative signof the diﬀerence means a potentially better aﬃnity when the mutation isperformed. The consensus criterion states that if the number of scoringfunctions that consider the mutation favorable is higher than a deﬁnedthreshold, then mutation is accepted [25]. By default the consensusthreshold is three, but this value can be changed by the user. The8mplemented scores can be customized by the user, and additional scoringfunctions can be added if necessary. After that, the protocol iterates overthe three main phases (mutating, sampling and scoring) and a number ofattempted modiﬁcations is achieved (Figure 2).

After running PARCE with a system of reference, a set of folderscontaining structures of each iteration, the MD trajectories, the calculatedscores and the sampling log ﬁles are generated locally. The explanation ofthe folders is provided in supplementary Table A3. The ﬁles and ﬁnalreport can be used for further analysis of the best sequence candidates.

3. Results

PARCE can be installed under any Linux operating system. However, thecode was optimized for Debian and Ubuntu OS distributions. PARCE canbe managed also through a docker container provided in the repository. Weshould note that the computational resources required for running PARCEare determined by the complexity of the system, since the design is based onrunning MD. A conﬁguration ﬁle describes the path and characteristics of theinput ﬁles, as well as the necessary parameters to run the design protocol. Anexplanation of the parameters and output ﬁles is provided in Supplementarytables. The code contains instructions to conﬁgure the system and launchthe protocol. We note that all the dependencies required to run PARCE areopen source software, but some of them, such as Scwrl4, require academiclicences. In such cases, it is recommended to install these packages followingthe developer’s documentation to integrate their paths to the code. A listof all the required software, including the scoring functions, is provided inTable 1. All the steps associated to the mutation protocol, the sampling ofthe peptide-protein system and the application of scoring functions have beenoptimized in previous works of speciﬁc systems, supporting the choice of thedefault parameters. [5, 18, 14]. PARCE has an MIT licence that allows forthe distribution of the code and its improvement through new functionalities.

As a tutorial, we use a protease in complex with a modelled peptide.The protease is obtained from the crystallized structure (PDB code 1ppg).9 able 1: List of third-party tools and scoring functions required to run the PARCE.

Name Version/YearGromacs 5.1.4Scwrl 4.0GromacsWrapper (Python3) 0.8BioPython (Python3) 1.76PDB2PQR 2.1.1Bluues 2.0BMF 3.0Pisa 2011Zrank 2007BACH 6.0DFIRE-GOAP 2011FireDock 2007Irad 2011This is a neutrophil elastase, which is a serine protease part of thechymotrypsin family [46]. The enzyme is bound to a peptidomimeticinhibitor, so we modiﬁed its sequence to model a reported peptide-substrateavailable in the MEROPS database [47]. Speciﬁcally, the protease bindingpockets were annotated according to its catalytic site. Then, the modiﬁedamino acids were replaced by natural amino acids found in the substratesequence using Scwrl4. The peptide was completed to reach a ﬁnal size ofeight amino acids (AAPAAAPP), which is characteristic of these proteas epeptide binders [48]. These missing amino acids were predicted with theModeller software [49]. The created complex was subjected to 100 ns of MDsimulations using the same MD conﬁguration as explained previously. Weran the PARCE protocol to perform 30 random mutation attempts, usingsix scoring functions with a consensus threshold of three. The evolution ofthe scores was monitored to check if the scores -on average- decreased as afunction of the MC iteration step.The protocol was ran using 5 ns of conformational sampling permutation for 30 attempts. For this speciﬁc run, a total of 6 mutations wereaccepted based on the consensus criterion explained in Methods ( i.e., ifthree or more scores consider the mutation as favorable then it is accepted).During the PARCE run it is important to monitor if the scoring functionsare -on average- lowering their values. An example is shown in Figure 3 for10he scoring function BMF-Bluues. The sequence of the peptide shouldevolve towards a sequence with potentially better aﬃnity.

Figure 3: Example of a PARCE run showing one (BMF-Bluues) of the 6 scoring functionswith the protease-peptide system (see the Methods). The dots represent the mutationsthat were accepted, with the structure of the protein in orange, the peptide in purple, andthe accepted mutations in green.

Despite the ideal behavior of BMF-Bluues, some of the scoring functionshave slightly diﬀerent behaviours. An example of three other scoringfunctions behaviour is shown in Figure 4. For three of the four plottedfunctions, the scores get lower in the last steps. In the case of theDFIRE-GOAP combination (Figure 4D), the score ﬂuctuates up and downwithout a deﬁned trend. The consensus methodology ensures that onaverage most -but not all- of the scoring functions decrease. Therefore, ifthere is a poor-performing scoring function the method does not force it to11inimize ( e.g. , DFIRE-GOAP in Figure 4). This is an advantage becausesome scoring functions might perform well for some particular systems butfail for others, and knowing beforehand which scoring functions are the bestfor a particular system is challenging. We note that the evolution isstochastic (due to the random mutations) therefore for diﬀerent runs therecould be diﬀerent behaviors. However, if most of the scoring functions havean erratic performance, then the scoring function conﬁguration should bere-evaluated. The user has the possibility to modify the selection of thescoring functions and the threshold to deﬁne if a mutation is accepted ornot. Moreover, if the system has available binding data, it might be usefulto benchmark the MD/scoring methodology with it. We applied the latestin the case of the Major Histocompatibility Complex class II (MHC-II),where we benchmarked a set of scoring functions and MD conﬁgurations torank bound peptides in agreement with available experimental data [24].We note that we have tested protocol extensively over other protein-peptide systems [7, 18].

Because the methodology is computationally expensive, theimplementation can take weeks of calculations for running 50 or 100mutation-attempts, depending on the system and the availablecomputational resources. The PARCE computational cost is dominated bythe MD simulation time ( i.e. , number of ns) multiplied by the number ofmutation attempts. Other relevant factors are the system size, and thenumber of frames used to score the trajectories. We recommend toconﬁgure the protocol to have a suﬃciently good MD sampling and anumber of mutation-attempts suﬃcient to explore eﬃciently the peptidesequence space.

4. Discussion

The PARCE code is an open source software to design peptides or proteinscapable of binding with higher aﬃnity to a protein target. The methodcombines computational biophysics tools with bioinformatics, in order toachieve an equilibrium between accuracy and computational eﬃciency. Oneadvantage is the possibility to obtain in an stochastic way peptide candidates,which can be diﬀerent independent runs. This increases the pool of sequencesavailable for further ﬁltering and validation. This is an advantage against12 igure 4: Evolution of four of the six scoring functions used in PARCE for the examplerun with the protease-peptide system. The large and red dots correspond to the acceptedand rejected mutations, respectively. The results are shown for (A) BMF-Bluues (samedata as in Figure 3), (B) Pisa, (C) Firedock and (D) DFIRE-GOAP. more deterministic or brute force alternatives. Moreover, since the code isopen source, it is possible to improve it according to the research-projectneeds.Most of the available protocols of peptide design have been built alsowith open source software, and some are available to users through webservers. This is the case of Peptiderive, which identiﬁes the linearpolypeptide segment that contributes most to the binding energy given aprotein-protein interaction structure [50]. Another server, PepComposer,requires only the target protein structure in order to retrieve a set ofpeptide-backbone scaﬀolds derived from monomeric proteins. Then, thepeptides are optimized using a set of iterative mutations through controlled13ackbone movements [51]. Finally, other initiatives about the sampling andscoring peptides as binders are also available as web servers or codeprotocols that can be customized [52, 53, 54].Despite the eﬀorts, the balance between the computational eﬃciencyand biological accuracy is still a major challenge, motivating thedevelopment and validation of novel pipelines as the one proposed here.PARCE can accelerate the discovery of novel peptides as potentialtherapeutics, biomarkers or vaccine sub-units. The code is ﬂexible. Itallows adding protocols to modify diﬀerent types of molecular targets, suchas small molecules, contributing to the engineering of peptides and proteinsfor a broad spectrum of applications.

5. Acknowledgements

R.O. and P.C. were supported by Colciencias, University of Antioquia,Ruta N, Colombia, and the Max Planck Society, Germany. Thecomputations were performed in a local server with an NVIDIA Titan XGPU. P.C. gratefully acknowledges the support of NVIDIA Corporation forthe donation of this GPU. Authors acknowledge Prof. Fogolari fromUniversity of Udine for providing the scoring functions Bluues and BMF.

References [1] P. Sormanni, F. A. Aprile, M. Vendruscolo, Third generation antibodydiscovery methods: in silico rational design, Chem. Soc. Rev. 47 (24)(2018) 9137–9157.[2] D. Jureti´c, D. Vukiˇcevi´c, D. Petrov, M. Novkovi´c, V. Bojovi´c,B. Luˇci´c, N. Ili´c, A. Tossi, Knowledge-based computational methodsfor identifying or designing novel, non-homologous antimicrobialpeptides, Eur. Biophys. J. 40 (4) (2011) 371–385. doi:10.1007/s00249-011-0674-7 .[3] W. Porto, O. Silva, O. Franco, Prediction and Rational Design ofAntimicrobial Peptides, in: Protein Structure, no. April 2012, InTech,2012, pp. 377–396. doi:10.5772/38023 .[4] K. Boone, K. Camarda, P. Spencer, C. Tamerler, Antimicrobialpeptide similarity and classiﬁcation through rough set theory using14hysicochemical boundaries, BMC Bioinformatics 19 (1) (2018) 469. doi:10.1186/s12859-018-2514-6 .[5] R. Ochoa, M. A. Soler, A. Laio, P. Cossio, Assessing the capabilityof in silico mutation protocols for predicting the ﬁnite temperatureconformation of amino acids, Phys. Chem. Chem. Phys. 20 (40) (2018)25901–25909. doi:10.1039/C8CP03826K .[6] S. Hansen, J. D. Kiefer, C. Madhurantakam, P. R. Mittl, A. Pl¨uckthun,Structures of designed armadillo repeat proteins binding to peptidesfused to globular domains, Protein Sci. 26 (10) (2017) 1942–1952. doi:10.1002/pro.3229 .[7] F. Guida, A. Battisti, I. Gladich, M. Buzzo, E. Marangon, L. Giodini,G. Toﬀoli, A. Laio, F. Berti, Peptide biosensors for anticancerdrugs: Design in silico to work in denaturizing environment, Biosens.Bioelectron. 100 (2018) 298–303. doi:10.1016/j.bios.2017.09.012 .[8] P. Sormanni, F. A. Aprile, M. Vendruscolo, Rational design of antibodiestargeting speciﬁc epitopes within intrinsically disordered proteins,PNAS 112 (32) (2015) 9902–9907.[9] W. J. Van Patten, R. Walder, A. Adhikari, S. R. Okoniewski,R. Ravichandran, C. E. Tinberg, D. Baker, T. T. Perkins, Improvedfree-energy landscape quantiﬁcation illustrated with a computationallydesigned protein–ligand interaction, Chem Phys Chem 19 (1) (2018)19–23.[10] J. Adolf-Bryfogle, O. Kalyuzhniy, M. Kubitz, B. D. Weitzner, X. Hu,Y. Adachi, W. R. Schief, R. L. Dunbrack Jr, Rosettaantibodydesign(rabd): A general framework for computational antibody design, PLOSComput. Biol. 14 (4) (2018) e1006112.[11] I. H. Moal, M. Torchala, P. a. Bates, J. Fern´andez-Recio, Thescoring of poses in protein-protein docking: current capabilities andfuture directions., BMC bioinformatics 14 (2013) 286. doi:10.1186/1471-2105-14-286 .[12] H.-J. B¨ohm, D. W. Banner, L. Weber, Combinatorial docking andcombinatorial chemistry: Design of potent non-peptide thrombin15nhibitors, J. Comput. Aided Mol. Des. 13 (1) (1999) 51–56. doi:10.1023/A:1008040531766 .[13] M. Kurcinski, M. Jamroz, M. Blaszczyk, A. Kolinski, S. Kmiecik, CABS-dock web server for the ﬂexible docking of peptides to proteins withoutprior knowledge of the binding site, Nucleic Acids Res. 43 (W1) (2015)W419–W424. arXiv:1503.02032 , doi:10.1093/nar/gkv456 .[14] I. Gladich, A. Rodriguez, R. P. Hong Enriquez, F. Guida, F. Berti,A. Laio, Designing High-Aﬃnity Peptides for Organic Molecules byExplicit Solvent Molecular Dynamics, J. Phys. Chem. B 119 (41) (2015)12963–12969. doi:10.1021/acs.jpcb.5b06227 .[15] S. Genheden, U. Ryde, The MM/PBSA and MM/GBSA methods toestimate ligand-binding aﬃnities, Expert Opin. Drug Dis. 10 (5) (2015)449–461. doi:10.1517/17460441.2015.1032936 .[16] C. Obiol-Pardo, J. Rubio-Martinez, Comparative evaluation ofMMPBSA and XSCORE to compute binding free energy in XIAP-peptide complexes, J. Chem. Inf. Model. 47 (1) (2007) 134–142. doi:10.1021/ci600412z .[17] K. Wichapong, J. E. Alard, A. Ortega-Gomez, C. Weber, T. M.Hackeng, O. Soehnlein, G. A. F. Nicolaes, Structure-Based Design ofPeptidic Inhibitors of the Interaction between CC Chemokine Ligand5 (CCL5) and Human Neutrophil Peptides 1 (HNP1), J. Med. Chem.59 (9) (2016) 4289–4301. doi:10.1021/acs.jmedchem.5b01952 .[18] M. A. Soler, S. Fortuna, A. de Marco, A. Laio, Binding aﬃnityprediction of nanobodyprotein complexes by scoring of moleculardynamics trajectories, Phys. Chem. Chem. Phys. 20 (5) (2018) 3438–3444. doi:10.1039/C7CP08116B .[19] R. Ochoa, S. J. Watowich, A. Fl´orez, C. V. Mesa, S. M. Robledo,C. Muskus, Drug search for leishmaniasis: a virtual screening approachby grid computing, J. Comput. Aided Mol. Des. 30 (7) (2016) 541–552. doi:10.1007/s10822-016-9921-4 .URL http://link.springer.com/10.1007/s10822-016-9921-4 [20] E. Sarti, I. Gladich, S. Zamuner, B. E. Correia, A. Laio, Protein–protein structure prediction by scoring molecular dynamics trajectories16f putative poses, Proteins: Struct. Funct. Bioinf. 84 (9) (2016) 1312–1320.[21] R. P. Hong Enriquez, S. Pavan, F. Benedetti, A. Tossi, A. Savoini,F. Berti, A. Laio, Designing short peptides with high aﬃnity for organicmolecules: A combined docking, molecular dynamics, and Monte Carloapproach, J. Chem. Theory Comput. 8 (3) (2012) 1121–1128. doi:10.1021/ct200873y .[22] A. Russo, P. L. Scognamiglio, R. P. H. Enriquez, C. Santambrogio,R. Grandori, D. Marasco, A. Giordano, G. Scoles, S. Fortuna, In silicogeneration of peptides by replica exchange monte carlo: Docking-basedoptimization of maltose-binding-protein ligands, PLOS ONE 10 (8)(2015) 1–16. doi:10.1371/journal.pone.0133571 .[23] M. A. Soler, A. Rodriguez, A. Russo, A. F. Adedeji, C. J. DongmoFoumthuim, C. Cantarutti, E. Ambrosetti, L. Casalis, A. Corazza,G. Scoles, D. Marasco, A. Laio, S. Fortuna, Computational design ofcyclic peptides for the customized oriented immobilization of globularproteins, Phys. Chem. Chem. Phys. 19 (4) (2017) 2740–2748. doi:10.1039/C6CP07807A .[24] R. Ochoa, A. Laio, P. Cossio, Predicting the Aﬃnity of Peptidesto Major Histocompatibility Complex Class II by Scoring MolecularDynamics Simulations, J. Chem. Inf. Model. 59 (8) (2019) 3464–3473. doi:10.1021/acs.jcim.9b00403 .[25] M. A. Soler, B. Medagli, M. S. Semrau, P. Storici, G. Bajc, A. de Marco,A. Laio, S. Fortuna, A consensus protocol for the in silico optimisationof antibody fragments, Chem. Commun. 55 (93) (2019) 14043–14046. doi:10.1039/C9CC06182G .[26] B. Hess, C. Kutzner, D. van der Spoel, E. Lindahl, GROMACS 4:Algorithms for highly eﬃcient, load balanced, and scalable molecularsimulations, J. Chem. Theory Comput. 4 (2008) 435–447. doi:10.1021/ct700301q .[27] G. G. Krivov, M. V. Shapovalov, R. L. Dunbrack, Improved Predictionof Protein Side-Chain Conformations with SCWRL4, Proteins: Struct.Funct. Bioinf. 77 (4) (2009) 778–795. doi:10.1002/prot.22488 .1728] L. X. Peterson, X. Kang, D. Kihara, Assessment of protein side-chainconformation prediction methods in diﬀerent residue environments,Proteins: Struct. Funct. Bioinf. 82 (9) (2014) 1971–1984. doi:10.1002/prot.24552 .[29] K. Lindorﬀ-Larsen, S. Piana, K. Palmo, P. Maragakis, J. L. Klepeis,R. O. Dror, D. E. Shaw, Improved side-chain torsion potentials for theAmber ﬀ99SB protein force ﬁeld, Proteins: Struct. Funct. Bioinf. 78 (8)(2010) 1950–1958. doi:10.1002/prot.22711 .[30] W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, M. L.Klein, Comparison of simple potential functions for simulating liquidwater, J. Chem. Phys. 79 (2) (1983) 926–935. doi:10.1063/1.445869 .[31] M. Parrinello, A. Rahman, Crystal structure and pair potentials: Amolecular dynamics study, Phys. Rev. Lett. 45 (14) (1980) 1196–1199. doi:10.1103/PhysRevLett.45.1196 .[32] G. Bussi, D. Donadio, M. Parrinello, Canonical sampling throughvelocity rescaling, J. Chem. Phys. 126 (1) (2007) 014101.[33] M. Di Pierro, R. Elber, B. Leimkuhler, A stochastic algorithm for theisobaric-isothermal ensemble with Ewald summations for all long rangeforces, ”J. Chem. Theory Comput. 11 (12) (2015) 5624–5637. doi:10.1021/acs.jctc.5b00648 .[34] D. Janeˇziˇc, F. Merzel, An eﬃcient symplectic integration algorithmfor molecular dynamics simulations, J. Chem. Inf. Comput. Sci. 35 (2)(1995) 321–326. doi:10.1021/ci00024a022 .[35] P. Cossio, D. Granata, A. Laio, F. Seno, A. Trovato, A Simpleand Eﬃcient Statistical Potential for Scoring Ensembles of ProteinStructures, Sci. Rep. 2 (2012) 1–8. doi:10.1038/srep00351 .[36] E. Sarti, S. Zamuner, P. Cossio, A. Laio, F. Seno, A. Trovato, Bachscore.a tool for evaluating eﬃciently and reliably the quality of large sets ofprotein structures, Comput. Phys. Commun. 184 (12) (2013) 2860–2865.[37] E. Sarti, D. Granata, F. Seno, A. Trovato, A. Laio, Native Foldand Docking Pose Discrimination by the Same Residue-based Scoring18unction, Proteins: Struct. Funct. Bioinf. 83 (4) (2015) 621–630. doi:10.1002/prot.24764 .[38] E. Krissinel, K. Henrick, Inference of Macromolecular Assemblies fromCrystalline State, J. Mol. Biol. 372 (3) (2007) 774–797. arXiv:arXiv:1011.1669v3 , doi:10.1016/j.jmb.2007.05.022 .[39] N. Andrusier, R. Nussinov, H. J. Wolfson, FireDock: Fast InteractionReﬁnement in Molecular Docking, Proteins: Struct. Funct. Bioinf. 69 (1)(2007) 139–159. arXiv:0605018 , doi:10.1002/prot.21495 .[40] T. Vreven, H. Hwang, Z. Weng, Integrating Atom-based and Residue-based Scoring Functions for Protein-Protein Docking, Protein Sci. 20 (9)(2011) 1576–1586. doi:10.1002/pro.687 .[41] B. Pierce, Z. Weng, ZRANK: Reranking Protein Docking Predictionswith an Optimized Energy Function, Proteins: Struct. Funct. Bioinf.67 (4) (2007) 1078–1086. arXiv:0605018 , doi:10.1002/prot.21373 .URL http://doi.wiley.com/10.1002/prot.21373 [42] M. Berrera, H. Molinari, F. Fogolari, Amino Acid Empirical ContactEnergy Deﬁnitions for Fold Recognition in the Space of Contact Maps,BMC Bioinformatics 4 (1) (2003) 8. doi:10.1186/1471-2105-4-8 .[43] F. Fogolari, A. Corazza, V. Yarra, A. Jalaru, P. Viglino, G. Esposito,Bluues: a Program for the Analysis of the Electrostatic Propertiesof Proteins Based on Generalized Born Radii, BMC Bioinformatics13 (Suppl 4) (2012) S18. doi:10.1186/1471-2105-13-S4-S18 .[44] Y. Yang, Y. Zhou, Speciﬁc interactions for ab initio folding of proteinterminal regions with secondary structures, Proteins: Struct. Funct.Genet. 72 (2) (2008) 793–803. doi:10.1002/prot.21968 .[45] H. Zhou, J. Skolnick, GOAP: A generalized orientation-dependent, all-atom statistical potential for protein structure prediction, Biophys. J.101 (8) (2011) 2043–2052. doi:10.1016/j.bpj.2011.09.012 .[46] W. An-Zhi, I. Mayr, W. Bode, The reﬁned 2.3 ˚a crystal structure ofhuman leukocyte elastase in a complex with a valine chloromethyl ketoneinhibitor, FEBS Lett. 234 (2) (1988) 367–373.1947] N. D. Rawlings, A. J. Barrett, P. D. Thomas, X. Huang, A. Bateman,R. D. Finn, The MEROPS database of proteolytic enzymes, theirsubstrates and inhibitors in 2017 and a comparison with peptidases inthe PANTHER database, Nucleic Acids Res. 46 (D1) (2018) D624–D632. doi:10.1093/nar/gkx1134 .[48] I. Schechter, A. Berger, On the size of the active site in proteases. I.Papain, Biochem. Biophys. Res. Commun. 27 (2) (1967) 157–162. doi:10.1016/S0006-291X(67)80055-X .[49] M. A. Marti-Renom, A. C. Stuart, R. Sanchez, F. Melo, A. Sali,Comparative Protein Structure Modeling of Genes and Genomes, Annu.Rev. Biophys. 29 (2000) 291–325.[50] Y. Sedan, O. Marcu, S. Lyskov, O. Schueler-Furman, Peptiderive server:derive peptide inhibitors from proteinprotein interactions, Nucleic AcidsRes. 44 (W1) (2016) W536–W541. doi:10.1093/nar/gkw385 .[51] A. Obarska-Kosinska, A. Iacoangeli, R. Lepore, A. Tramontano,PepComposer: computational design of peptides binding to a givenprotein surface, Nucleic Acids Res. 44 (W1) (2016) W522–W528. doi:10.1093/nar/gkw366 .[52] C. A. King, P. Bradley, Structure-based prediction of protein–peptidespeciﬁcity in rosetta, Proteins: Struct. Funct. Bioinf. 78 (16) (2010)3437–3449.[53] C. A. Smith, T. Kortemme, Backrub-like backbone simulationrecapitulates natural protein conformational variability and improvesmutant side-chain prediction, J. Mol. Biol. 380 (4) (2008) 742–756.[54] B. Raveh, N. London, O. Schueler-Furman, Sub-angstrom modeling ofcomplexes between ﬂexible peptides and globular proteins, Proteins:Struct. Funct. Bioinf. 78 (9) (2010) 2029–2040. doi:10.1002/prot.22716 .[55] D. Spiliotopoulos, P. L. Kastritis, A. S. J. Melquiond, A. M. J. J.Bonvin, G. Musco, W. Rocchia, A. Spitaleri, dMM-PBSA: A NewHADDOCK Scoring Function for Protein-Peptide Docking, Front. Mol.Biosci. 3 (August) (2016) 46. doi:10.3389/fmolb.2016.00046 .2056] G. Bhardwaj, V. K. Mulligan, C. D. Bahl, J. M. Gilmore, Accuratede novo design of hyperstable constrained peptides, Nature 538 (7625)(2016) 329–335. arXiv:NIHMS150003 , doi:10.1038/nature19791 .[57] A. Chevalier, D.-a. Silva, G. J. Rocklin, D. R. Hicks, R. Vergara,P. Murapa, S. M. Bernard, L. Zhang, K.-H. Lam, G. Yao, C. D.Bahl, S.-i. Miyashita, I. Goreshnik, J. T. Fuller, M. T. Koday, C. M.Jenkins, T. Colvin, L. Carter, A. Bohn, C. M. Bryan, D. A. Fern´andez-Velasco, L. Stewart, M. Dong, X. Huang, R. Jin, I. A. Wilson,D. H. Fuller, D. Baker, Massively parallel de novo protein design fortargeted therapeutics, Nature 550 (7674) (2017) 74–79. doi:10.1038/nature23912 . Appendix A. Supplementary tables able A1: Parameters provided by the user in the conﬁguration ﬁle. Parameter Explanationfolder Name of the folder that has all the input and output ﬁles ofthe protocolmode The design mode, which has three possible options: start(start the protocol from zero), restart (start from a particulariteration of a previous run) and *nothing* (just run withoutmodifying existing ﬁles)peptide reference The sequence of the peptide, or protein fragment that will bemodiﬁedpdbID Name of the structure that is used as input, which containsthe protein, the peptide/protein and the solvent moleculeschain Chain id of the peptide/protein in the structural complexsim time Time in nanoseconds that will be used to sample the complexafter each mutationnum mutations Number of mutations that will be attemptedtry mutations Number of mutations tried after having minimization orequilibration problemsresidues mod These are the speciﬁc positions of the residues that want tobe modiﬁed. This depends on the peptide/protein length andthe numbering in the PDB ﬁlemd route Path to the folder containing the input ﬁles, which are theﬁles used during the previous MD sampling of the systemmd original Name of the system ﬁle located in the folder containing theprevious MD samplingscore list List of the scoring functions that will be used to calculate theconsensus. Currently the package only has available the codefor six of them: BACH, Pisa, ZRANK, IRAD, BMF-BLUUESand FireDockhalf ﬂag Flag that controls which part of the trajectory is used toobtain the average score. If 0, the full trajectory is used,if 1, only the last halfthreshold Threshold used for the consensus. If the number of scoringfunctions in agreement are equal or greater than the threshold,then the mutation is accepted.scwrl path Provide the path to Scwrl4 in case it is not installed in aPATH folder.gmxrc path Provide the path to GMXRC for Gromacs.22 able A2: Example of PARCE report using as input a peptide binder with sequenceAAPFAAPP. The report includes the iteration number, the mutation, the acceptance, theaverage scores and the attempted peptide sequence.

Iteration Mutation Status BACH Pisa Zrank Irad BMF-BLUUES Firedock Sequence1 AB4F Accepted -3.514 -0.209 -59.637 -101.285 -30.542 -44.041 AAPFAAPP2 AB2G Rejected -4.705 -0.198 -56.245 -101.161 -34.482 -37.784 AGPFAAPP3 PB3Q Accepted -4.128 -0.273 -63.155 -108.444 -34.122 -46.402 AAQFAAPP4 PB8R Rejected -3.568 -0.237 -56.869 -100.154 -32.223 -37.734 AAQFAAPR5 AB1Q Accepted -3.463 -0.245 -65.987 -116.340 -35.730 -47.378 QAQFAAPP