[PDF] Peptipedia: a comprehensive database for peptide research supported by Assembled predictive models and Data Mining approaches

Abstract

Motivation: Peptides have attracted the attention in this century due to their remarkable therapeutic properties. Computational tools are being developed to take advantage of existing information, encapsulating knowledge and making it available in a simple way for general public use. However, these are property-specific redundant data systems, and usually do not display the data in a clear way. In some cases, information download is not even possible. This data needs to be available in a simple form for drug design and other biotechnological applications. Results: We developed Peptipedia, a user-friendly database and web application to search, characterise and analyse peptide sequences. Our tool integrates the information from thirty previously reported databases, making it the largest repository of peptides with recorded activities so far. Besides, we implemented a variety of services to increase our tool's usability. The significant differences of our tools with other existing alternatives becomes a substantial contribution to develop biotechnological and bioengineering applications for peptides. Availability: Peptipedia is available for non-commercial use as an open-access software, licensed under the GNU General Public License, version GPL 3.0. The web platform is publicly available at pesb2.cl/peptipedia. Both the source code and sample datasets are available in the GitHub repository this https URL Contact: [email protected], [email protected]

Full PDF

PPeptipedia: a comprehensive database for peptideresearch supported by Assembled predictive modelsand Data Mining approaches

Cristofer Quiroz , Yasna Barrera Saavedra , Benjam´ın Armijo-Galdames , JuanAmado-Hinojosa , ´Alvaro Olivera-Nappa , Anamaria Sanchez-Daza ∗ , and DavidMedina-Ortiz † Facultad de Ingenier´ıa, Universidad Auton´oma de Chile, Cinco Pte. 1670, Talca, 3467987, Chile. Escuela de Ingenier´ıa en Bioinform´atica, Universidad de Talca, Avenida Lircay SN, 3460000, Talca, Chile. Centre for Biotechnology and Bioengineering, University of Chile, Beauchef 851, Santiago, 8370456, Chile. Department of Chemical Engineering, Biotechnology and Materials, University of Chile, Beauchef 851,Santiago, 8370456, Chile.

Abstract

Motivation : Peptides have attracted the attention in this century due to their remark-able therapeutic properties. Computational tools are being developed to take advantageof existing information, encapsulating knowledge and making it available in a simple wayfor general public use. However, these are property-speciﬁc redundant data systems, andusually do not display the data in a clear way. In some cases, information download is noteven possible. This data needs to be available in a simple form for drug design and otherbiotechnological applications.

Results : We developed Peptipedia, a user-friendly database and web application to search,characterise and analyse peptide sequences. Our tool integrates the information from thirtypreviously reported databases, making it the largest repository of peptides with recorded ac-tivities so far. Besides, we implemented a variety of services to increase our tool’s usability.The signiﬁcant diﬀerences of our tools with other existing alternatives becomes a substantialcontribution to develop biotechnological and bioengineering applications for peptides.

Availability : Peptipedia is available for non-commercial use as an open-access software,licensed under the GNU General Public License, version GPL 3.0. The web platform ispublicly available at pesb2.cl/peptipedia . Both the source code and sample datasets areavailable in the GitHub repository https://github.com/CristoferQ/PeptideDatabase . Contact: [email protected], [email protected] K eywords Protein Engineering - predictive models · peptide databases · machine-learning algorithms · digital signal processing · assembled models INTRODUCTION

Peptides play a crucial role as signaling molecules, encompassing diverse therapeutic activities like antimi-crobial, antitumoral, hormone replacement, anti-inﬂammatory and antihypertensive (Lau and Dunn, 2018;Lien and Lowman, 2003). Peptides are polymers that can be sought in natural sources or syntheticallyobtained; they are constituted of at least 2 amino acids, and their maximum length is usually set to 50 - 100amino acids. However, it seems there is no consensus about the maximum of amino acids in a sequence to ∗ [email protected] † [email protected] a r X i v : . [ q - b i o . GN ] J a n eptipedia: a comprehensive database for peptide research supported by Assembled predictive models andData Mining approachesconsider it a peptide or a protein. (Morrison and Boyd, 1973; Latham, 1999; Lien and Lowman, 2003; Uhliget al., 2014).As therapeutic agents, peptides are especially attractive because they exhibit high biological activity andspeciﬁcity, reduced side eﬀects and low toxicity. Nevertheless, peptides have some disadvantages over othermolecules, such as high synthesis cost and low stability due of the lack of tertiary structure, making themparticularly susceptible to enzymatic degradation and diﬃculties in crossing biological membranes due totheir high polarity, molecular weight, and hydrophilicity. (Vlieghe et al., 2010; Uhlig et al., 2014).Despite the disadvantages mentioned, peptide researching interest has increased, resulting in a signiﬁcantaccumulation of new peptide sequences in conjunction with their related activities and properties. This hasbrought to the market over 70 peptides approved in the US, Europe, and Japan as therapeutic, more than200 in clinical trials, and more than 600 in pre-clinical tests (Srivastava, 2019; Usmani et al., 2017; Lau andDunn, 2018).One of the most signiﬁcant trends in recent times is ”drug discovery” to identify new drugs or new function-alities for speciﬁc targets. In this context, computational approaches are continually developed as supporttools for biological ﬁelds, where methodologies based on Machine Learning and Data Mining become relevanttools (Wu et al., 2019; Basith et al., 2020). However, these techniques require prior knowledge, which canbe obtained from biological databases that accumulate information on molecules and their characteristics.These data of interest can be collected and processed to develop a tool for solving a speciﬁc problem.Several dedicated databases have emerged for peptide grouping, mostly, according to their activities (e.g.,antimicrobial: APD3 (Wang et al., 2016), antituberculosis: AntiTBdb (Usmani et al., 2018) AntiTbPdb,antihypertensive: AHTPDB (Kumar et al., 2015)) or origin source (e.g., Plant: PlantPepDB (Das et al.,2020), bacterial: BACTIBASE (Hammami et al., 2010), anuran: DADP (Novkovi´c et al., 2012)). Theﬁrst web-based databases including peptides were reported in 1998 by Tossi and Sandri (2002), followed bySYFPEITHI, JenPep, FIMM and HIV database (Rammensee et al., 1999; Blythe et al., 2002; Sch¨onbachet al., 2000; Korber et al., 1998). Then in 2003, the Antimicrobial Peptide Database (APD) appeared andhas been continuously updated, but currently, the link is down (Wang and Wang, 2004; Wang et al., 2016),and since then, around 40 peptide databases have arisen.Each database is useful in their speciﬁc context, but a comprehensive and integrated database focused onpeptides is not available so far. Also, many of the databases present some issues which hinder their usability.Most of them do not indicate their last update, and if reviewing, they seem to have not been updated sincetheir launch, except for DRAMP, AllergenOnline, BactPepDB, DBAASP, ConoServer and APD. Other sitesare not found: PenBase, ANTIMIC. Almost all databases have redundancy in their sequences (see section1 of Supplementary Information). Others require informatic background, being unfriendly for users withno advanced computational skills. Many others do not provide a download tool: YADAMP, Quorumpepsdatabase, DADP, BIOPEP, BioDADpep, P´eptaibol; for others, the download tool is not working: PepBank,StraPep, PeptideDB, BactPepDB, MHCBN, ForPep, CancerPPD.Peptipedia was developed to fulﬁll the necessities that each database cannot solve separately. We haveimplemented a user-friendly web application with a new database that encompasses the highest number ofpeptide sequences with reported activity, curated from 30 existing peptide databases. Peptipedia classiﬁesreported activity for each peptide in categories and subcategories deﬁned according to our analysis andliterature (Kastin, 2013).Our application is more than a database compilation: it is the most extensive integrated peptide persistent-storage system to date. This user-friendly platform also includes useful physicochemical and statisticalproperties estimator from peptides, amino acid sequences characterisation, and a tool for Machine Learning-based activity prediction for a query peptide. Methods

Collection, preprocessing, characterisation, and database generation

We consolidate the information for Peptipedia by integrating the data from diﬀerent computational toolsand databases previously reported, such as APD (Wang et al., 2016), LAMP (Zhao et al., 2013), andUniprot (Consortium, 2015), among others (see section 1 in Supplementary Information for more details).Firstly, we manually downloaded the sequences from each tool and processed them independently, generatingdiﬀerent CSV ﬁles to facilitate their manipulation. We ﬁltered the sequences according to their length,2eptipedia: a comprehensive database for peptide research supported by Assembled predictive models andData Mining approachesconsidering a minimum of 2 residues and a maximum of 150. Secondly, we generated a single ﬁle with allsequences, eliminating redundancy between them. For each sequence, we searched its activities, using theprevious information in all databases employed to develop our web information system. It is importantto note that taxonomic and structural information, and speciﬁc information for particular activities, suchas IC50 measurements, experiments, among others, were also included in Peptipedia. Furthermore, thesequences are categorized depending on whether they present modiﬁcations or non-canonical residues. Then,we used ModLamp library (M¨uller et al., 2017) to characterise the peptides based on physicochemical andthermodynamic properties. Statistical properties were obtained for each sequence using the DMAKit-Liblibrary (Medina-Ortiz et al., 2020b). Finally, the amino acid frequency for each sequence was obtainedthrough scripts implemented in Python v3. Now, we store the processed information in a NoSQL database,using MongoDB as a handler due to its manipulation characteristics, information extraction speed andscaling.

Strategies for classiﬁcation systems

Most sequences report a speciﬁc activity in terms of their biochemical roles and/or biological eﬀects, spe-cially in humans. We noted that a signiﬁcant number of peptides are used or were designed for therapeuticpurposes, but there were another seven types of peptide activity which cannot be classiﬁed as therapeutic.Consequently, we classify all peptides in eight categories: (1) ’therapeutic’, (2) ’immunological’, (3) ’sen-sorial’, (4) ’neurological’, (5) ’drug delivery vehicle’, (6) ’transit’, (7) ’propeptide’ and (8) ’signal’. Eachcategory has subclassiﬁcations within it. However, there are a small group of peptides with particular activ-ity, so we categorise them in the category (9) ’other activity’. All peptides with no activity reported are inthe category (10) ’no activity reported’.One of the essential services of Peptipedia is the activity classiﬁcation system for peptide sequences basedon Machine Learning strategies. The training of models was based on the application of supervised learningalgorithms combined with sequence coding approaches, using physicochemical properties and Digital SignalProcessing, according to the strategies proposed by Medina-Ortiz et al. (2020a). In this way, we generatedassembled binary models to recognize activities for peptide sequences employing our categories proposedin this work. The training process was based on developing binary data sets to evaluate two categories:presents or absence of activity. Additionally, we generated each data set using the one v/s rest strategy,keeping class imbalance minimum. Finally, in those models with low performance, it was used the recursivebinary partitions strategies, according to the method proposed by Medina-Ortiz et al. (2020c) to improvethe performance of the classiﬁcation assembled models.

Implementation and Availability

Peptipedia was designed using a Model View Controller (MVC) design pattern. The view component andthe controllers were implemented using JavaScript programming language through the Express framework.Display components were optimised using Bootstrap 4. All the model members, including all service disposedin this work’s proposed tool, were developed using Python v3 programming language, supported by thelibraries DMAKit-Lib (Medina-Ortiz et al., 2020b) and Scikit-Learn (Pedregosa et al., 2011). Both theproposed software architecture and implementation features are detailed in section 2 of the SupplementaryInformation.

Results and Discussion

Peptipedia is a user-friendly web application system to search, analyse, evaluate and characterise peptidesequences using diﬀerent strategies, Machine Learning, and Data Mining techniques. This web tool has aNoSQL database system with 92055 peptides registered and described, being the most extensive database ofpeptide sequences with activities reported to date. This tool reports diﬀerent types of information for eachsequence, considering structural, physicochemical, and phylogenetic properties. Additionally, the variousactivities previously identiﬁed for each peptide are reported and so are the databases or repositories wherethey were extracted from. Finally, statistical properties related to the percentage of residues for eachsequence and the average per category are included in the database, providing interesting, useful, andeasy-to-understand information for scientists and researchers (see Figure 1).3eptipedia: a comprehensive database for peptide research supported by Assembled predictive models andData Mining approachesFigure 1:

Representative scheme of building and characteristics of Peptipedia.

Peptipedia is acomputational tool for peptide sequence analysis. The information presented by our tool was consolidatedfrom 30 databases, considering information on the sequence, taxonomy, and diﬀerent properties of storedpeptides. Searching for sequences and relevant information in our web application is easy, personalised andintuitive, allowing download the information in multiple formats. Peptipedia has enabled diﬀerent toolsthat will help characterise and analyse sequences, as well as functionalities supported by Machine Learningmethods that facilitate the development of predictive models and an activity predictor system.4eptipedia: a comprehensive database for peptide research supported by Assembled predictive models andData Mining approaches

Relevant tools and services available in PeptipediaSearches, Visualization and Downloads

Diﬀerent types of searches can be generated in Peptipedia, either with the sequence or through informationrelated to its activity, physicochemical properties, frequency of residues, among other relevant information.Besides, it is possible to apply diﬀerent ﬁlters to generate a personalised exploration for the user’s interest.We develop a general summary for each search, showing statistical descriptions and various visualizations todisplay the information. Furthermore, we present speciﬁc details for each peptide, including thermodynamicproperties, taxonomy, phylogeny, activity and sequence descriptors; we also show the databases where thepeptide sequence was previously reported. Remarkably, Peptipedia oﬀers speciﬁc information like IC50,assays information, organism evaluation and other relevant characteristics for particular activities such asantihypertensive, anti-HIV, and antiviral subcategories.Peptipedia has general and speciﬁc modules for downloading data, making it easier to obtain information,facilitating the download in CSV, Fasta, and JSON formats. Besides, our tool enables the complete databasedownload in easily manipulable forms, considering both the sequence and its reported information.

Services

Diﬀerent services were implemented in Peptipedia to facilitate analysis and characterisation of peptide se-quences. We propose various services that allow characterization through physicochemical and thermody-namic properties, using the ModLamp (M¨uller et al., 2017) library. We also provide modules that enablethe estimation of statistical properties for peptide sequences.Bioinformatic tools such as sequence alignments are available in our web tool: using the Edlib library Sosicand Sikic (2017), it is possible to align any sequence against those registered in our database.Another relevant service is peptide activity classiﬁcation system supported by assembled predictive models:the user can upload a list of amino acid sequences, and our tool classiﬁes them by the categories proposed inthis work, evaluating each of them. Furthermore, a peptide encoding service is implemented using commonstrategies such as One Hot Encoder and more sophisticated ones such as Embedding through the Tapelibrary.Finally, Peptipedia allows the generation of predictive models for sets of peptides with speciﬁc user require-ments through supervised learning algorithms and cross-validation techniques. Conﬁguration of hyperpa-rameters, coding strategy and validation method are selectable. The tool reports the performance of thegenerated model by the user, allowing the download to use it locally. Besides, this service enables theinterpretation of the results giving diﬀerent recommendations about them.

Peptides registered, categories, and relevant information in Peptipedia

We developed the largest database of peptides with reported activity to date, with a total of 92,055 records.Considering the information on previously reported activities and the characteristics of each their speciﬁcproperties, we propose a system of ten categories, which present sub-categories according to the features ofthe activities that constitute them. Using these categories, we analyse the peptide sequences, identifyingtherapeutic peptides, signal peptides, and sensory activity, representing the highest prevalence in our records.While immunological, transit, and neurological activity show the least trend or have fewer records (see Figure2 A).It is important to highlight the moonlighting characteristics of peptides. This feature is the feasibilityof a peptide to present diﬀerent activities at the same time (Jeﬀery, 1999). The main found tendenciesof moonlighting are between the therapeutic and sensorial peptides, and between propeptides and signalpeptides. This last overlapping of activities makes sense because propeptides generally contain a signalpeptide in their sequence (Wang et al., 2018), which they lose once processed. (see Figure 2 B). Thistype of properties reﬂects the potential features of a peptide when acting as a drug or presenting diﬀerentbiotechnological uses, making them interesting to study due to their fascinating characteristics. Residuefrequency analysis allows evaluating amino acid trends for particular activities. We compare trends for themain reported categories, with a clear preference for arginine residues for drug delivery peptides, which canbe explained because this kind of peptides are usually design to crosss membranes, so they need a chemicalaﬃnity for negatively charged membranes, which is given by the positive charge of arginine. In contrast,5eptipedia: a comprehensive database for peptide research supported by Assembled predictive models andData Mining approachessignal, transit and propeptides generally show similar trends. However, no major visible patterns wereidentiﬁed (see Section 4 in Supplementary Information).

Other Therapeutic No activity Signal SensorialPropeptide Drug delivery vehicleImmunologicalTransit Neurological

A B . % Figure 2:

Visualisation of registered peptides on Peptipedia

Representation of the information con-tained in Peptipedia. A: distribution of peptides according to the categories proposed in this work. B:analysis of the relationship of simultaneous activities for the same type of peptide; the most signiﬁcanttrends are seen between therapeutics and sensorial, and between propeptides and signal.

Binary classiﬁcation categories supported by Assembled Models

Using coding of physicochemical properties and their representation in frequency space (Medina-Ortiz et al.,2020a) and employing recursive binary division strategies to optimise performance measures (Medina-Ortizet al., 2020c) we depeloped 44 assembled binary models for classiﬁcation of activity for peptide sequences,considering the categories and subcategories proposed in this work. We used k-fold cross-validation to avoidmodel overﬁtting. Remarkably, all the models generated presented an accuracy of over 83% (see Table 1 andsection 5 of the Support Information for details). We previously compared the results obtained by applyingthis type of strategies against classical sequence coding methods, demonstrating better results (Medina-Ortiz et al., 2020a). Furthermore, we compare our results with previously developed classiﬁcation models forpeptide sequences. Xiao et al. (2013) proposed a classiﬁcation system for antimicrobial peptides with 86%accuracy; for the same task, our model achieves a performance of 88.7%. Similarly, Yi et al. (2019) proposed aclassiﬁcation system for anticancer peptides using Deep Learning Long Short-Term Memory Model strategies,achieving an accuracy of 81.48%, while our model achieves 83.54%. Another relevant example is identifyingquorum sensing peptides (QSPs): Rajput et al. (2015) proposed an identiﬁcation system for QSPs basedon sequence features in combination with support vector machine algorithms, obtaining 93 % accuracy; ouraccuracy is slightly lower for this peptides, reaching an accuracy of 86.4 %. Even though we present a lowerperformance in particular situations than previously developed methods. Nevertheless, the proposed strategyis generic, could be apply in activity classiﬁcation of peptides sequences problems, prediction of properties,and multiple issues in protein engineering (Medina-Ortiz et al., 2020a). Notably, we validated all our modelsusing statistical methods. Each data set was created by selecting random samples and repeating this process100 times, providing statistical support and demonstrating the robustness of the activity classiﬁcation modelsimplemented in Peptipedia. 6eptipedia: a comprehensive database for peptide research supported by Assembled predictive models andData Mining approaches

1. Sensorial Peptides 19982 85.272. Drug Delivery 4912 86.023. Therapeutic 50000 87.324. Neurological 2712 89.335. Immunological 2178 86.126. Other Activity 490 82.987. Transit Peptide 1350 88.488. Signal Peptide 26794 86.419. Propeptide 17768 88.63Table 1: Weighted performance for binary classiﬁcation models for the nine main categories proposed in thiswork.

Using Peptipedia to develop predictive models

The study of anti-HIV peptides is relevant because their potential therapeutic applications. They interactwith a speciﬁc domain of the glycoprotein 41, which is their pharmacological target for inhibiting the virusfusion and entry to the host cell. Diﬀerent eﬀorts have focused on designing new sequences, either throughtraditional techniques such as directed evolution or rational design strategies. Both strategies currentlybeneﬁt from the application of Machine Learning since it facilitates the simulation of the eﬀects of newvariants. We implemented an IC50 predictive model for anti-HIV peptides to demonstrate the usability ofPeptipedia, because this is a crucial parameter for assessing the performance of antimicrobial and antiviraldrugs. First, using the sequence search engine, we identify all the peptides that have this category. Wemanually downloaded and ﬁltered those with a quantitative IC50 measurement, discarding the cases inwhich it was expressed in terms of low, medium or high eﬀect, and standardising the measured values towork with them using the same units. Subsequently, we used the Peptipedia predictive models training tool,selecting coding by digital signal processing, using the alpha-structure property as coding strategy, RandomForest as supervised learning algorithm, and validation strategy k -fold with k = 10. The tool reported themodel’s performance, achieving a Pearson coeﬃcient of 0.8 (see Figure 3 A). Furthermore, Peptipedia allowsus to analyse the prediction error’s randomness to determine if there are biases in the generated predictions(see Figure 3 B). In this way, we are able to predict the therapeutic potency of an new anti-HIV peptidewith no need of performing lab assays, which, combined with the coding module, becomes powerful supportfor designing peptides with desirable activities. Conclusions

We designed and implemented Peptipedia, a web application supported by machine learning algorithms anddata mining strategies to characterise and analyse peptide sequences. Additionally, our tool has the mostextensive database of peptides with activity reported so far, with a total of 92,055 amino acid sequencesintegrated from thirty databases or repositories of previously reported peptides, Peptipedia has enabled dif-ferent tools that will help characterising, getting statistical properties and bioinformatics analysis supportedby sequence alignments, as well as services that facilitate the development of predictive models.Additionally, the sequence and the reported activity information of the registered peptides are integratedinto a robust binary classiﬁcation system, implemented through Machine Learning strategies, allowing topredict putative peptide activities. These services are useful as a previous approach to experimental workfor performing an activity screening of novel peptides with unknown activity. Besides, peptide design alsogets beneﬁted, since this tool helps to ﬁnd residues patterns based on their activity.Both the usability and the wide range of services available on Peptipedia, as well as the robustness of thepredictive systems implemented, considerably improve the current state of the art, becoming an attractivealternative to existing traditional applications and a good support for research in peptide engineering andits biotechnological applications. 7eptipedia: a comprehensive database for peptide research supported by Assembled predictive models andData Mining approaches

A.B.

Figure 3:

Predictive modeling of IC50 for Anti-HIV peptides using Peptipedia

A: Scatter Plotprediction v/s reality, denoting the performance of the predictive model. In general, there is no tendencyto over-adjust or under-adjust in any particular range, which shows that the cross-validation strategieswere correctly applied. B: histogram of the error distribution. The probability of error analysis indicates notendency for signiﬁcant errors that adversely alter the model predictions. The errors are mainly concentratedbetween -5 and 5, which is quite acceptable considering the nature of the entered values, where the largestreach 100 and the smallest are close to zero. 8eptipedia: a comprehensive database for peptide research supported by Assembled predictive models andData Mining approaches

CODE AVAILABILITY

All code is available at the authors’ GitHub repository https://github.com/CristoferQ/PeptideDatabase . ACKNOWLEDGEMENTS

This work was supported mainly by the Centre for Biotechnology and Bioengineering - CeBiB (PIA projectFB0001, ANID, Chile), Fondecyt 1180882 project, and Universidad de Magallanes for MAG1895 project.DM-O gratefully acknowledges ANID, Chile, for Ph.D. fellowship 21181435. JA-H gratefully acknowledgesANID, for Ph.D. fellowship 21182109. AS-D thanks PAI Programme (I7818010006).

Conﬂict of interest statement.

None declared.

References

Basith, S., Manavalan, B., Hwan Shin, T., and Lee, G. (2020). Machine intelligence in peptide therapeutics:A next-generation tool for rapid disease screening.

Medicinal research reviews , 40(4):1276–1314.Blythe, M. J., Doytchinova, I. A., and Flower, D. R. (2002). Jenpep: a database of quantitative functionalpeptide data for immunology.

Bioinformatics , 18(3):434–439.Consortium, U. (2015). Uniprot: a hub for protein information.

Nucleic acids research , 43(D1):D204–D212.Das, D., Jaiswal, M., Khan, F. N., Ahamad, S., and Kumar, S. (2020). Plantpepdb: A manually curatedplant peptide database.

Scientiﬁc Reports , 10(1):1–8.Hammami, R., Zouhir, A., Le Lay, C., Hamida, J. B., and Fliss, I. (2010). Bactibase second release: adatabase and tool platform for bacteriocin characterization.

Bmc Microbiology , 10(1):1–5.Jeﬀery, C. J. (1999). Moonlighting proteins.

Trends in biochemical sciences , 24(1):8–11.Kastin, A. (2013).

Handbook of biologically active peptides . Academic press.Korber, B., Moore, J., Brander, C., Walker, B., Haynes, B., and Koup, R. (1998). Hiv molecular immunologycompendium.

Los Alamos National Laboratory, Theoretical Biology and Biophysics, Los Alamos, NM .Kumar, R., Chaudhary, K., Sharma, M., Nagpal, G., Chauhan, J. S., Singh, S., Gautam, A., and Raghava,G. P. (2015). Ahtpdb: a comprehensive platform for analysis and presentation of antihypertensive peptides.

Nucleic acids research , 43(D1):D956–D962.Latham, P. W. (1999). Therapeutic peptides revisited.

Nature biotechnology , 17(8):755–757.Lau, J. L. and Dunn, M. K. (2018). Therapeutic peptides: Historical perspectives, current developmenttrends, and future directions.

Bioorganic & medicinal chemistry , 26(10):2700–2707.Lien, S. and Lowman, H. B. (2003). Therapeutic peptides.

Trends in biotechnology , 21(12):556–562.Medina-Ortiz, D., Contreras, S., Amado-Hinojosa, J., Torres-Almonacid, J., Asenjo, J. A., Navarrete, M.,and Olivera-Nappa, ´A. (2020a). Combination of digital signal processing and assembled predictive modelsfacilitates the rational design of proteins. arXiv preprint arXiv:2010.03516 .Medina-Ortiz, D., Contreras, S., Quiroz, C., Asenjo, J. A., and Olivera-Nappa, ´A. (2020b). Dmakit: Auser-friendly web platform for bringing state-of-the-art data analysis techniques to non-speciﬁc users.

Information Systems , page 101557.Medina-Ortiz, D., Contreras, S., Quiroz, C., and Olivera-Nappa, ´A. (2020c). Development of supervisedlearning predictive models for highly non-linear biological, biomedical, and general datasets.

Frontiers inMolecular Biosciences , 7.Morrison, R. and Boyd, R. (1973).

Organic Chemistry 3rd Ed., 1973 . Allyn and Bacon.M¨uller, A. T., Gabernet, G., Hiss, J. A., and Schneider, G. (2017). modlAMP: Python for antimicrobialpeptides.

Bioinformatics , 33(17):2753–2755.Novkovi´c, M., Simuni´c, J., Bojovi´c, V., Tossi, A., and Jureti´c, D. (2012). Dadp: the database of anurandefense peptides.

Bioinformatics , 28(10):1406–1407.9eptipedia: a comprehensive database for peptide research supported by Assembled predictive models andData Mining approachesPedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer,P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. the Journal of machineLearning research , 12:2825–2830.Rajput, A., Gupta, A. K., and Kumar, M. (2015). Prediction and analysis of quorum sensing peptides basedon sequence features.

PLoS One , 10(3):e0120066.Rammensee, H.-G., Bachmann, J., Emmerich, N. P. N., Bachor, O. A., and Stevanovi´c, S. (1999). Syfpeithi:database for mhc ligands and peptide motifs.

Immunogenetics , 50(3-4):213–219.Sch¨onbach, C., Koh, J. L., Sheng, X., Wong, L., and Brusic, V. (2000). Fimm, a database of functionalmolecular immunology.

Nucleic acids research , 28(1):222–224.Sosic, M. and Sikic, M. (2017). Edlib: a C/C ++ library for fast, exact sequence alignment using editdistance.

Bioinformatics , 33(9):1394–1395.Srivastava, V., editor (2019).

Peptide Therapeutics . Drug Discovery. The Royal Society of Chemistry.Tossi, A. and Sandri, L. (2002). Molecular diversity in gene-encoded, cationic antimicrobial polypeptides.

Current pharmaceutical design , 8(9):743–761.Uhlig, T., Kyprianou, T., Martinelli, F. G., Oppici, C. A., Heiligers, D., Hills, D., Calvo, X. R., and Verhaert,P. (2014). The emergence of peptides in the pharmaceutical business: From exploration to exploitation.

EuPA Open Proteomics , 4:58–69.Usmani, S. S., Bedi, G., Samuel, J. S., Singh, S., Kalra, S., Kumar, P., Ahuja, A. A., Sharma, M., Gautam,A., and Raghava, G. P. (2017). Thpdb: Database of fda-approved peptide and protein therapeutics.

PloSone , 12(7):e0181748.Usmani, S. S., Kumar, R., Kumar, V., Singh, S., and Raghava, G. P. (2018). Antitbpdb: a knowledgebaseof anti-tubercular peptides.

Database , 2018.Vlieghe, P., Lisowski, V., Martinez, J., and Khrestchatisky, M. (2010). Synthetic therapeutic peptides:science and market.

Drug discovery today , 15(1-2):40–56.Wang, G., Li, X., and Wang, Z. (2016). Apd3: the antimicrobial peptide database as a tool for research andeducation.

Nucleic acids research , 44(D1):D1087–D1093.Wang, J., Yin, T., Xiao, X., He, D., Xue, Z., Jiang, X., and Wang, Y. (2018). StraPep: a structure databaseof bioactive peptides.

Database , 2018(bay038).Wang, Z. and Wang, G. (2004). Apd: the antimicrobial peptide database.

Nucleic acids research ,32(suppl 1):D590–D592.Wu, Q., Ke, H., Li, D., Wang, Q., Fang, J., and Zhou, J. (2019). Recent progress in machine learning-basedprediction of peptide activity for drug discovery.

Current topics in medicinal chemistry , 19(1):4–16.Xiao, X., Wang, P., Lin, W.-Z., Jia, J.-H., and Chou, K.-C. (2013). iamp-2l: a two-level multi-label classiﬁerfor identifying antimicrobial peptides and their functional types.

Analytical biochemistry , 436(2):168–177.Yi, H.-C., You, Z.-H., Zhou, X., Cheng, L., Li, X., Jiang, T.-H., and Chen, Z.-H. (2019). Acp-dl: adeep learning long short-term memory model to predict anticancer peptides using high-eﬃciency featurerepresentation.

Molecular Therapy-Nucleic Acids , 17:1–9.Zhao, X., Wu, H., Lu, H., Li, G., and Huang, Q. (2013). Lamp: a database linking antimicrobial peptides.