Is this you? Create Your Porfile

Ruchi Verma

Oklahoma State University–Stillwater

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ruchi Verma is active.

Explore More

Publication

Featured researches published by Ruchi Verma.

Journal of Biological Chemistry | 2006

Prediction of mitochondrial proteins using support vector machine and hidden Markov model

Manish Kumar; Ruchi Verma; Gajendra P. S. Raghava

Mitochondria are considered as one of the core organelles of eukaryotic cells hence prediction of mitochondrial proteins is one of the major challenges in the field of genome annotation. This study describes a method, MitPred, developed for predicting mitochondrial proteins with high accuracy. The data set used in this study was obtained from Guda, C., Fahy, E. & Subramaniam, S. (2004) Bioinformatics 20, 1785–1794. First support vector machine-based modules/methods were developed using amino acid and dipeptide composition of proteins and achieved accuracy of 78.37 and 79.38%, respectively. The accuracy of prediction further improved to 83.74% when split amino acid composition (25 N-terminal, 25 C-terminal, and remaining residues) of proteins was used. Then BLAST search and support vector machine-based method were combined to get 88.22% accuracy. Finally we developed a hybrid approach that combined hidden Markov model profiles of domains (exclusively found in mitochondrial proteins) and the support vector machine-based method. We were able to predict mitochondrial protein with 100% specificity at a 56.36% sensitivity rate and with 80.50% specificity at 98.95% sensitivity. The method estimated 9.01, 6.35, 4.84, 3.95, and 4.25% of proteins as mitochondrial in Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, mouse, and human proteomes, respectively. MitPred was developed on the above hybrid approach.

Journal of Microbiological Methods | 2013

E-probe Diagnostic Nucleic acid Analysis (EDNA): A theoretical approach for handling of next generation sequencing data for diagnostics

Anthony H. Stobbe; Jon Daniels; Andres Espindola; Ruchi Verma; Ulrich Melcher; Francisco M. Ochoa-Corona; Carla D. Garzón; Jacqueline Fletcher; William L. Schneider

Plant biosecurity requires rapid identification of pathogenic organisms. While there are many pathogen-specific diagnostic assays, the ability to test for large numbers of pathogens simultaneously is lacking. Next generation sequencing (NGS) allows one to detect all organisms within a given sample, but has computational limitations during assembly and similarity searching of sequence data which extend the time needed to make a diagnostic decision. To minimize the amount of bioinformatic processing time needed, unique pathogen-specific sequences (termed e-probes) were designed to be used in searches of unassembled, non-quality checked, sequence data. E-probes have been designed and tested for several selected phytopathogens, including an RNA virus, a DNA virus, bacteria, fungi, and an oomycete, illustrating the ability to detect several diverse plant pathogens. E-probes of 80 or more nucleotides in length provided satisfactory levels of precision (75%). The number of e-probes designed for each pathogen varied with the genome size of the pathogen. To give confidence to diagnostic calls, a statistical method of determining the presence of a given pathogen was developed, in which target e-probe signals (detection signal) are compared to signals generated by a decoy set of e-probes (background signal). The E-probe Diagnostic Nucleic acid Analysis (EDNA) process provides the framework for a new sequence-based detection system that eliminates the need for assembly of NGS data.

BMC Bioinformatics | 2008

Identification of Proteins Secreted by Malaria Parasite into Erythrocyte using SVM and PSSM profiles

Ruchi Verma; Ajit Tiwari; Sukhwinder Kaur; Grish C. Varshney; Gajendra P. S. Raghava

BackgroundMalaria parasite secretes various proteins in infected RBC for its growth and survival. Thus identification of these secretory proteins is important for developing vaccine/drug against malaria. The existing motif-based methods have got limited success due to lack of universal motif in all secretory proteins of malaria parasite.ResultsIn this study a systematic attempt has been made to develop a general method for predicting secretory proteins of malaria parasite. All models were trained and tested on a non-redundant dataset of 252 secretory and 252 non-secretory proteins. We developed SVM models and achieved maximum MCC 0.72 with 85.65% accuracy and MCC 0.74 with 86.45% accuracy using amino acid and dipeptide composition respectively. SVM models were developed using split-amino acid and split-dipeptide composition and achieved maximum MCC 0.74 with 86.40% accuracy and MCC 0.77 with accuracy 88.22% respectively. In this study, for the first time PSSM profiles obtained from PSI-BLAST, have been used for predicting secretory proteins. We achieved maximum MCC 0.86 with 92.66% accuracy using PSSM based SVM model. All models developed in this study were evaluated using 5-fold cross-validation technique.ConclusionThis study demonstrates that secretory proteins have different residue composition than non-secretory proteins. Thus, it is possible to predict secretory proteins from its residue composition-using machine learning technique. The multiple sequence alignment provides more information than sequence itself. Thus performance of method based on PSSM profile is more accurate than method based on sequence composition. A web server PSEApred has been developed for predicting secretory proteins of malaria parasites,the URL can be found in the Availability and requirements section.

BMC Bioinformatics | 2012

A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins

Ruchi Verma; Ulrich Melcher

BackgroundMembers of the phylum Proteobacteria are most prominent among bacteria causing plant diseases that result in a diminution of the quantity and quality of food produced by agriculture. To ameliorate these losses, there is a need to identify infections in early stages. Recent developments in next generation nucleic acid sequencing and mass spectrometry open the door to screening plants by the sequences of their macromolecules. Such an approach requires the ability to recognize the organismal origin of unknown DNA or peptide fragments. There are many ways to approach this problem but none have emerged as the best protocol. Here we attempt a systematic way to determine organismal origins of peptides by using a machine learning algorithm. The algorithm that we implement is a Support Vector Machine (SVM).ResultThe amino acid compositions of proteobacterial proteins were found to be different from those of plant proteins. We developed an SVM model based on amino acid and dipeptide compositions to distinguish between a proteobacterial protein and a plant protein. The amino acid composition (AAC) based SVM model had an accuracy of 92.44% with 0.85 Matthews correlation coefficient (MCC) while the dipeptide composition (DC) based SVM model had a maximum accuracy of 94.67% and 0.89 MCC. We also developed SVM models based on a hybrid approach (AAC and DC), which gave a maximum accuracy 94.86% and a 0.90 MCC. The models were tested on unseen or untrained datasets to assess their validity.ConclusionThe results indicate that the SVM based on the AAC and DC hybrid approach can be used to distinguish proteobacterial from plant protein sequences.

Frontiers in Plant Science | 2014

Metagenomic search strategies for interactions among plants and multiple microbes.

Ulrich Melcher; Ruchi Verma; William L. Schneider

Plants harbor multiple microbes. Metagenomics can facilitate understanding of the significance, for the plant, of the microbes, and of the interactions among them. However, current approaches to metagenomic analysis of plants are computationally time consuming. Efforts to speed the discovery process include improvement of computational speed, condensing the sequencing reads into smaller datasets before BLAST searches, simplifying the target database of BLAST searches, and flipping the roles of metagenomic and reference datasets. The latter is exemplified by the e-probe diagnostic nucleic acid analysis approach originally devised for improving analysis during plant quarantine.

Scientific Reports | 2017

CancerPDF: A repository of cancer-associated peptidome found in human biofluids

Sherry Bhalla; Ruchi Verma; Harpreet Kaur; Rajesh Kumar; Salman Sadullah Usmani; Suresh Kumar Sharma; Gajendra P. S. Raghava

CancerPDF (Cancer Peptidome Database of bioFluids) is a comprehensive database of endogenous peptides detected in the human biofluids. The peptidome patterns reflect the synthesis, processing and degradation of proteins in the tissue environment and therefore can act as a gold mine to probe the peptide-based cancer biomarkers. Although an extensive data on cancer peptidome has been generated in the recent years, lack of a comprehensive resource restrains the facility to query the growing community knowledge. We have developed the cancer peptidome resource named CancerPDF, to collect and compile all the endogenous peptides isolated from human biofluids in various cancer profiling studies. CancerPDF has 14,367 entries with 9,692 unique peptide sequences corresponding to 2,230 unique precursor proteins from 56 high-throughput studies for ~27 cancer conditions. We have provided an interactive interface to query the endogenous peptides along with the primary information such as m/z, precursor protein, the type of cancer and its regulation status in cancer. To add-on, many web-based tools have been incorporated, which comprise of search, browse and similarity identification modules. We consider that the CancerPDF will be an invaluable resource to unwind the potential of peptidome-based cancer biomarkers. The CancerPDF is available at the web address http://crdd.osdd.net/raghava/cancerpdf/.

BMC Bioinformatics | 2013

Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning

Rakesh Kaundal; Sitanshu S Sahu; Ruchi Verma; Tyler Weirick

BackgroundPlastids are an important component of plant cells, being the site of manufacture and storage of chemical compounds used by the cell, and contain pigments such as those used in photosynthesis, starch synthesis/storage, cell color etc. They are essential organelles of the plant cell, also present in algae. Recent advances in genomic technology and sequencing efforts is generating a huge amount of DNA sequence data every day. The predicted proteome of these genomes needs annotation at a faster pace. In view of this, one such annotation need is to develop an automated system that can distinguish between plastid and non-plastid proteins accurately, and further classify plastid-types based on their functionality. We compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning.ResultsIn this study, we developed separate Support Vector Machine (SVM) trained classifiers for characterizing the plastids in two steps: first distinguishing the plastid vs. non-plastid proteins, and then classifying the identified plastids into their various types based on their function (chloroplast, chromoplast, etioplast, and amyloplast). Five diverse protein features: amino acid composition, dipeptide composition, the pseudo amino acid composition, Nterminal-Center-Cterminal composition and the protein physicochemical properties are used to develop SVM models. Overall, the dipeptide composition-based module shows the best performance with an accuracy of 86.80% and Matthews Correlation Coefficient (MCC) of 0.74 in phase-I and 78.60% with a MCC of 0.44 in phase-II. On independent test data, this model also performs better with an overall accuracy of 76.58% and 74.97% in phase-I and phase-II, respectively. The similarity-based PSI-BLAST module shows very low performance with about 50% prediction accuracy for distinguishing plastid vs. non-plastids and only 20% in classifying various plastid-types, indicating the need and importance of machine learning algorithms.ConclusionThe current work is a first attempt to develop a methodology for classifying various plastid-type proteins. The prediction modules have also been made available as a web tool, PLpred available at http://bioinfo.okstate.edu/PLpred/ for real time identification/characterization. We believe this tool will be very useful in the functional annotation of various genomes.

Amino Acids | 2010