Sherry Bhalla
Indraprastha Institute of Information Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sherry Bhalla.
Nucleic Acids Research | 2016
Piyush Agrawal; Sherry Bhalla; Salman Sadullah Usmani; Sandeep Singh; Kumardeep Chaudhary; Gajendra P. S. Raghava; Ankur Gautam
CPPsite 2.0 (http://crdd.osdd.net/raghava/cppsite/) is an updated version of manually curated database (CPPsite) of cell-penetrating peptides (CPPs). The current version holds around 1850 peptide entries, which is nearly two times than the entries in the previous version. The updated data were curated from research papers and patents published in last three years. It was observed that most of the CPPs discovered/ tested, in last three years, have diverse chemical modifications (e.g. non-natural residues, linkers, lipid moieties, etc.). We have compiled this information on chemical modifications systematically in the updated version of the database. In order to understand the structure-function relationship of these peptides, we predicted tertiary structure of CPPs, possessing both modified and natural residues, using state-of-the-art techniques. CPPsite 2.0 also maintains information about model systems (in vitro/in vivo) used for CPP evaluation and different type of cargoes (e.g. nucleic acid, protein, nanoparticles, etc.) delivered by these peptides. In order to assist a wide range of users, we developed a user-friendly responsive website, with various tools, suitable for smartphone, tablet and desktop users. In conclusion, CPPsite 2.0 provides significant improvements over the previous version in terms of data content.
Nucleic Acids Research | 2016
Sandeep Singh; Kumardeep Chaudhary; Sandeep Kumar Dhanda; Sherry Bhalla; Salman Sadullah Usmani; Ankur Gautam; Abhishek Tuknait; Piyush Agrawal; Deepika Mathur; Gajendra P. S. Raghava
SATPdb (http://crdd.osdd.net/raghava/satpdb/) is a database of structurally annotated therapeutic peptides, curated from 22 public domain peptide databases/datasets including 9 of our own. The current version holds 19192 unique experimentally validated therapeutic peptide sequences having length between 2 and 50 amino acids. It covers peptides having natural, non-natural and modified residues. These peptides were systematically grouped into 10 categories based on their major function or therapeutic property like 1099 anticancer, 10585 antimicrobial, 1642 drug delivery and 1698 antihypertensive peptides. We assigned or annotated structure of these therapeutic peptides using structural databases (Protein Data Bank) and state-of-the-art structure prediction methods like I-TASSER, HHsearch and PEPstrMOD. In addition, SATPdb facilitates users in performing various tasks that include: (i) structure and sequence similarity search, (ii) peptide browsing based on their function and properties, (iii) identification of moonlighting peptides and (iv) searching of peptides having desired structure and therapeutic activities. We hope this database will be useful for researchers working in the field of peptide-based therapeutics.
Frontiers in Microbiology | 2018
Piyush Agrawal; Sherry Bhalla; Kumardeep Chaudhary; Rajesh Kumar; Meenu Sharma; Gajendra P. S. Raghava
This paper describes in silico models developed using a wide range of peptide features for predicting antifungal peptides (AFPs). Our analyses indicate that certain types of residue (e.g., C, G, H, K, R, Y) are more abundant in AFPs. The positional residue preference analysis reveals the prominence of the particular type of residues (e.g., R, V, K) at N-terminus and a certain type of residues (e.g., C, H) at C-terminus. In this study, models have been developed for predicting AFPs using a wide range of peptide features (like residue composition, binary profile, terminal residues). The support vector machine based model developed using compositional features of peptides achieved maximum accuracy of 88.78% on the training dataset and 83.33% on independent or validation dataset. Our model developed using binary patterns of terminal residues of peptides achieved maximum accuracy of 84.88% on training and 84.64% on validation dataset. We benchmark models developed in this study and existing methods on a dataset containing compositionally similar antifungal and non-AFPs. It was observed that binary based model developed in this study preforms better than any model/method. In order to facilitate scientific community, we developed a mobile app, standalone and a user-friendly web server ‘Antifp’ (http://webs.iiitd.edu.in/raghava/antifp).
Scientific Reports | 2017
Sherry Bhalla; Kumardeep Chaudhary; Ritesh Kumar; Manika Sehgal; Harpreet Kaur; Suresh C. Sharma; Gajendra P. S. Raghava
In this study, an attempt has been made to identify expression-based gene biomarkers that can discriminate early and late stage of clear cell renal cell carcinoma (ccRCC) patients. We have analyzed the gene expression of 523 samples to identify genes that are differentially expressed in the early and late stage of ccRCC. First, a threshold-based method has been developed, which attained a maximum accuracy of 71.12% with ROC 0.67 using single gene NR3C2. To improve the performance of threshold-based method, we combined two or more genes and achieved maximum accuracy of 70.19% with ROC of 0.74 using eight genes on the validation dataset. These eight genes include four underexpressed (NR3C2, ENAM, DNASE1L3, FRMPD2) and four overexpressed (PLEKHA9, MAP6D1, SMPD4, C11orf73) genes in the late stage of ccRCC. Second, models were developed using state-of-art techniques and achieved maximum accuracy of 72.64% and 0.81 ROC using 64 genes on validation dataset. Similar accuracy was obtained on 38 genes selected from subset of genes, involved in cancer hallmark biological processes. Our analysis further implied a need to develop gender-specific models for stage classification. A web server, CancerCSP, has been developed to predict stage of ccRCC using gene expression data derived from RNAseq experiments.
Scientific Reports | 2017
Sherry Bhalla; Ruchi Verma; Harpreet Kaur; Rajesh Kumar; Salman Sadullah Usmani; Suresh Kumar Sharma; Gajendra P. S. Raghava
CancerPDF (Cancer Peptidome Database of bioFluids) is a comprehensive database of endogenous peptides detected in the human biofluids. The peptidome patterns reflect the synthesis, processing and degradation of proteins in the tissue environment and therefore can act as a gold mine to probe the peptide-based cancer biomarkers. Although an extensive data on cancer peptidome has been generated in the recent years, lack of a comprehensive resource restrains the facility to query the growing community knowledge. We have developed the cancer peptidome resource named CancerPDF, to collect and compile all the endogenous peptides isolated from human biofluids in various cancer profiling studies. CancerPDF has 14,367 entries with 9,692 unique peptide sequences corresponding to 2,230 unique precursor proteins from 56 high-throughput studies for ~27 cancer conditions. We have provided an interactive interface to query the endogenous peptides along with the primary information such as m/z, precursor protein, the type of cancer and its regulation status in cancer. To add-on, many web-based tools have been incorporated, which comprise of search, browse and similarity identification modules. We consider that the CancerPDF will be an invaluable resource to unwind the potential of peptidome-based cancer biomarkers. The CancerPDF is available at the web address http://crdd.osdd.net/raghava/cancerpdf/.
Frontiers in Microbiology | 2018
Vinod Kumar; Piyush Agrawal; Rajesh Kumar; Sherry Bhalla; Salman Sadullah Usmani; Grish C. Varshney; Gajendra P. S. Raghava
Designing drug delivery vehicles using cell-penetrating peptides is a hot area of research in the field of medicine. In the past, number of in silico methods have been developed for predicting cell-penetrating property of peptides containing natural residues. In this study, first time attempt has been made to predict cell-penetrating property of peptides containing natural and modified residues. The dataset used to develop prediction models, include structure and sequence of 732 chemically modified cell-penetrating peptides and an equal number of non-cell penetrating peptides. We analyzed the structure of both class of peptides and observed that positive charge groups, atoms, and residues are preferred in cell-penetrating peptides. In this study, models were developed to predict cell-penetrating peptides from its tertiary structure using a wide range of descriptors (2D, 3D descriptors, and fingerprints). Random Forest model developed by using PaDEL descriptors (combination of 2D, 3D, and fingerprints) achieved maximum accuracy of 95.10%, MCC of 0.90 and AUROC of 0.99 on the main dataset. The performance of model was also evaluated on validation/independent dataset which achieved AUROC of 0.98. In order to assist the scientific community, we have developed a web server “CellPPDMod” for predicting the cell-penetrating property of modified peptides (http://webs.iiitd.edu.in/raghava/cellppdmod/).
Archive | 2018
Salman Sadullah Usmani; Rajesh Kumar; Sherry Bhalla; Vinod Kumar; Gajendra P. S. Raghava
The prolonged conventional approaches of drug screening and vaccine designing prerequisite patience, vigorous effort, outrageous cost as well as additional manpower. Screening and experimentally validating thousands of molecules for a specific therapeutic property never proved to be an easy task. Similarly, traditional way of vaccination includes administration of either whole or attenuated pathogen, which raises toxicity and safety issues. Emergence of sequencing and recombinant DNA technology led to the epitope-based advanced vaccination concept, i.e., small peptides (epitope) can stimulate specific immune response. Advent of bioinformatics proved to be an adjunct in vaccine and drug designing. Genomic study of pathogens aid to identify and analyze the protective epitope. A number of in silico tools have been developed to design immunotherapy as well as peptide-based drugs in the last two decades. These tools proved to be a catalyst in drug and vaccine designing. This review solicits therapeutic peptide databases as well as in silico tools developed for designing peptide-based vaccine and drugs.
bioRxiv | 2018
Sherry Bhalla; Kumardeep Chaudhary; Ankur Gautam; Suresh Kumar Sharma; Gajendra P. S. Raghava
Urine-based cancer biomarkers offer numerous advantages over the other biomarkers and play a crucial role in cancer management. In this study, an attempt has been made to develop proteomics-based prediction models to discriminate patients of oncological disorders related to urinary tract and healthy controls from their urine samples. The dataset used in this study was obtained from human urinary peptide database that contains urine proteomics data of 1525 oncological and 1503 healthy controls with the spectral intensity of 5605 peptides. First, we identified peptide spectra using various feature selection techniques, which display different intensity and occurrence in oncological samples and healthy controls. Based on selected 173 peptide-based biomarkers, we developed models for predicting oncological samples and achieved maximum accuracy of 91.94% with 0.84 MCC. Prediction models were also developed based on spectral intensities with known peptide sequences. We also quantitated the amount of protein in a sample based on intensities of its fragments/peptides and developed prediction models based on protein expression. It was observed that certain proteins and their peptides such as fragments of collagen protein are more abundant in oncological samples. Based on this study, we also developed a web bench, CancerUBM, for mining proteomics data, which is freely available at http://webs.iiitd.edu.in/raghava/cancerubm/.
bioRxiv | 2018
Sherry Bhalla; Harpreet Kaur; A. Dhall; G. Raghava
Metastatic state of the Skin Cutaneous Melanoma (SKCM) has led to high mortality rate worldwide. Previously, various studies have revealed the association of the metastatic melanoma with the diminished survival rate in comparison to primary tumors. Thus, prediction of melanoma at primary tumor state is crucial to employ optimal therapeutic strategy for prolonged survival of patients. The RNA, miRNA and methylation data of The Cancer Genome Atlas (TCGA) cohort of SKCM is comprehensively analysed to recognize key genomic features that can categorize various states of metastatic tumors from primary tumors with high precision. Subsequently, various prediction models were developed using filtered genomic features implementing various machine learning techniques to classify these primary tumors from metastatic tumors. The SVC model (with class weight and RBF kernel) developed using 17 mRNA features achieved maximum MCC 0.73 with sensitivity, specificity and accuracy 89.19%, 90.48% and 89.47% respectively on independent validation dataset. Our study reveals that gene expression based features performs better than features obtained from miRNA profiling and epigenomic profiling. Our analysis shows that the expression of genes C7, MMP3, KRT14, KRT17, MASP1, and miRNA hsa-mir-205 and hsa-mir-203a are among the key genomic features that may substantially contribute to the oncogenesis of melanoma even on the basis of simple expression threshold. The major prediction models and analysis modules to predict metastatic and primary tumor samples of SKCM are available from a webserver, CancerSPP (http://webs.iiitd.edu.in/raghava/cancerspp/).
bioRxiv | 2018
Sherry Bhalla; Harpreet Kaur; R. Kaur; Suresh Kumar Sharma; G. Raghava
In this study, we describe the key transcripts and machine learning models developed for classifying the early and late stage samples of Papillary Thyroid Cancer (PTC), using transcripts’ expression data from The Cancer Genome Atlas (TCGA). First, we rank all the transcripts on the basis of area under receiver operating characteristic curve, (AUROC) value to discriminate the early and late stage, based on an expression threshold. With the expression of a single transcript DCN, we can classify the stage samples with a 68.5% accuracy and AUROC of 0.66. Then we implemented various combination of multiple gene panels, selected using various gold standard feature selection techniques. The model based on the expression of 36 multiple transcripts (protein coding and non-coding) selected using SVC-L1 achieves the maximum accuracy of 74.51% with AUROC of 0.75 on independent validation dataset with balanced sensitivity and specificity. Further, these signatures also performed well on external microarray data obtained from GEO, predicting nearly 70% (12 samples out of 17 samples) early stage samples correctly. Further, multiclass model, classifying the normal, early and late stage samples achieves the accuracy of 75.43% with AUROC of 0.80 on independent validation dataset. With correlation analysis, we found that transcripts with maximum change in correlation of their expression in both the stages are significantly enriched in neuroactive ligand receptor interaction pathway. We also propose a panel of five protein coding transcripts, which on the basis of their expression, can segregate cancer and normal samples with 97.32% accuracy and AUROC of 0.99 on independent validation dataset. All the models and dataset used in this study are available from the web server CancerTSP (http://webs.iiitd.edu.in/raghava/cancertsp/).