Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Anuj R. Shah is active.

Publication


Featured researches published by Anuj R. Shah.


Bioinformatics | 2008

A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics

Bobbie-Jo M. Webb-Robertson; William R. Cannon; Christopher S. Oehmen; Anuj R. Shah; Vidhya Gurumoorthi; Mary S. Lipton; Katrina M. Waters

MOTIVATION The standard approach to identifying peptides based on accurate mass and elution time (AMT) compares profiles obtained from a high resolution mass spectrometer to a database of peptides previously identified from tandem mass spectrometry (MS/MS) studies. It would be advantageous, with respect to both accuracy and cost, to only search for those peptides that are detectable by MS (proteotypic). RESULTS We present a support vector machine (SVM) model that uses a simple descriptor space based on 35 properties of amino acid content, charge, hydrophilicity and polarity for the quantitative prediction of proteotypic peptides. Using three independently derived AMT databases (Shewanella oneidensis, Salmonella typhimurium, Yersinia pestis) for training and validation within and across species, the SVM resulted in an average accuracy measure of 0.8 with a SD of <0.025. Furthermore, we demonstrate that these results are achievable with a small set of 12 variables and can achieve high proteome coverage. AVAILABILITY http://omics.pnl.gov/software/STEPP.php. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Nature Methods | 2011

Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra

Ari Frank; Matthew E. Monroe; Anuj R. Shah; Jeremy J. Carver; Nuno Bandeira; Ronald J. Moore; Gordon A. Anderson; Richard D. Smith; Pavel A. Pevzner

Tandem mass spectrometry (MS/MS) experiments yield multiple, nearly identical spectra of the same peptide in various laboratories, but proteomics researchers typically do not leverage the unidentified spectra produced in other labs to decode spectra they generate. We propose a spectral archives approach that clusters MS/MS datasets, representing similar spectra by a single consensus spectrum. Spectral archives extend spectral libraries by analyzing both identified and unidentified spectra in the same way and maintaining information about peptide spectra that are common across species and conditions. Thus archives offer both traditional library spectrum similarity-based search capabilities along with new ways to analyze the data. By developing a clustering tool, MS-Cluster, we generated a spectral archive from ∼1.18 billion spectra that greatly exceeds the size of existing spectral repositories. We advocate that publicly available data should be organized into spectral archives rather than be analyzed as disparate datasets, as is mostly the case today.


ieee symposium on information visualization | 2004

IN-SPIRE InfoVis 2004 Contest Entry

Pak Chung Wong; Christian Posse; Mark A. Whiting; Susan L. Havre; Nick Cramer; Anuj R. Shah; Mudita Singhal; Alan E. Turner; James J. Thomas

This is the first part (summary) of a three-part contest entry submitted to IEEE InfoVis 2004. The contest topic is visualizing InfoVis symposium papers from 1995 to 2002 and their references. The paper introduces the visualization tool IN-SPIRE, the visualization process and results, and presents lessons learned.


Bioinformatics | 2010

Machine learning based prediction for peptide drift times in ion mobility spectrometry

Anuj R. Shah; Khushbu Agarwal; Erin S. Baker; Mudita Singhal; Anoop Mayampurath; Yehia M. Ibrahim; Lars J. Kangas; Matthew E. Monroe; Rui Zhao; Mikhail E. Belov; Gordon A. Anderson; Richard D. Smith

MOTIVATION Ion mobility spectrometry (IMS) has gained significant traction over the past few years for rapid, high-resolution separations of analytes based upon gas-phase ion structure, with significant potential impacts in the field of proteomic analysis. IMS coupled with mass spectrometry (MS) affords multiple improvements over traditional proteomics techniques, such as in the elucidation of secondary structure information, identification of post-translational modifications, as well as higher identification rates with reduced experiment times. The high throughput nature of this technique benefits from accurate calculation of cross sections, mobilities and associated drift times of peptides, thereby enhancing downstream data analysis. Here, we present a model that uses physicochemical properties of peptides to accurately predict a peptides drift time directly from its amino acid sequence. This model is used in conjunction with two mathematical techniques, a partial least squares regression and a support vector regression setting. RESULTS When tested on an experimentally created high confidence database of 8675 peptide sequences with measured drift times, both techniques statistically significantly outperform the intrinsic size parameters-based calculations, the currently held practice in the field, on all charge states (+2, +3 and +4). AVAILABILITY The software executable, imPredict, is available for download from http:/omics.pnl.gov/software/imPredict.php CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


BMC Bioinformatics | 2013

MultiAlign: a multiple LC-MS analysis tool for targeted omics analysis

Brian L. Lamarche; Kevin L. Crowell; Navdeep Jaitly; Vladislav A. Petyuk; Anuj R. Shah; Ashoka D. Polpitiya; John D. Sandoval; Gary R. Kiebel; Matthew E. Monroe; Stephen J. Callister; Thomas O. Metz; Gordon A. Anderson; Richard D. Smith

BackgroundMultiAlign is a free software tool that aligns multiple liquid chromatography-mass spectrometry datasets to one another by clustering mass and chromatographic elution features across datasets. Applicable to both label-free proteomics and metabolomics comparative analyses, the software can be operated in several modes. For example, clustered features can be matched to a reference database to identify analytes, used to generate abundance profiles, linked to tandem mass spectra based on parent precursor masses, and culled for targeted liquid chromatography-tandem mass spectrometric analysis. MultiAlign is also capable of tandem mass spectral clustering to describe proteome structure and find similarity in subsequent sample runs.ResultsMultiAlign was applied to two large proteomics datasets obtained from liquid chromatography-mass spectrometry analyses of environmental samples. Peptides in the datasets for a microbial community that had a known metagenome were identified by matching mass and elution time features to those in an established reference peptide database. Results compared favorably with those obtained using existing tools such as VIPER, but with the added benefit of being able to trace clusters of peptides across conditions to existing tandem mass spectra. MultiAlign was further applied to detect clusters across experimental samples derived from a reactor biomass community for which no metagenome was available. Several clusters were culled for further analysis to explore changes in the community structure. Lastly, MultiAlign was applied to liquid chromatography-mass spectrometry-based datasets obtained from a previously published study of wild type and mitochondrial fatty acid oxidation enzyme knockdown mutants of human hepatocarcinoma to demonstrate its utility for analyzing metabolomics datasets.ConclusionMultiAlign is an efficient software package for finding similar analytes across multiple liquid chromatography-mass spectrometry feature maps, as demonstrated here for both proteomics and metabolomics experiments. The software is particularly useful for proteomic studies where little or no genomic context is known, such as with environmental proteomics.


Bioinformatics | 2006

SEBINI: Software Environment for BIological Network Inference

Ronald C. Taylor; Anuj R. Shah; Charles C. Treatman; Meredith L. Blevins

UNLABELLED The Software Environment for BIological Network Inference (SEBINI) has been created to provide an interactive environment for the deployment and evaluation of algorithms used to reconstruct the structure of biological regulatory and interaction networks. SEBINI can be used to compare and train network inference methods on artificial networks and simulated gene expression perturbation data. It also allows the analysis within the same framework of experimental high-throughput expression data using the suite of (trained) inference methods; hence SEBINI should be useful to software developers wishing to evaluate, compare, refine or combine inference techniques, and to bioinformaticians analyzing experimental data. SEBINI provides a platform that aids in more accurate reconstruction of biological networks, with less effort, in less time. AVAILABILITY A demonstration website is located at https://www.emsl.pnl.gov/NIT/NIT.html. The Java source code and PostgreSQL database schema are available freely for non-commercial use.


Journal of the American Society for Mass Spectrometry | 2010

An Efficient Data Format for Mass Spectrometry-Based Proteomics

Anuj R. Shah; Jennifer L. Davidson; Matthew E. Monroe; Anoop Mayampurath; William F. Danielson; Yan Shi; Aaron C. Robinson; Brian H. Clowers; Mikhail E. Belov; Gordon A. Anderson; Richard D. Smith

The diverse range of mass spectrometry (MS) instrumentation along with corresponding proprietary and nonproprietary data formats has generated a proteomics community driven call for a standardized format to facilitate management, processing, storing, visualization, and exchange of both experimental and processed data. To date, significant efforts have been extended towards standardizing XML-based formats for mass spectrometry data representation, despite the recognized inefficiencies associated with storing large numeric datasets in XML. The proteomics community has periodically entertained alternate strategies for data exchange, e.g., using a common application programming interface or a database-derived format. However, these efforts have yet to gain significant attention, mostly because they have not demonstrated significant performance benefits over existing standards, but also due to issues such as extensibility to multidimensional separation systems, robustness of operation, and incomplete or mismatched vocabulary. Here, we describe a format based on standard database principles that offers multiple benefits over existing formats in terms of storage size, ease of processing, data retrieval times, and extensibility to accommodate multidimensional separation systems.


ieee international conference on escience | 2008

An Extensible, Scalable Architecture for Managing Bioinformatics Data and Analyses

Anuj R. Shah; Mudita Singhal; Tara D. Gibson; Chandrika Sivaramakrishnan; Katrina M. Waters; Ian Gorton

Systems biology research demands the availability of tools and technologies that span a comprehensive range of computational capabilities, including data management, transfer, processing, integration, and interpretation. To address these needs, we have created the bioinformatics resource manager (BRM), a scalable, flexible, and easy to use tool for biologists to undertake complex analyses. This paper describes the underlying software architecture of the BRM that integrates multiple commodity platforms to provide a highly extensible and scalable software infrastructure for bioinformatics. The architecture integrates a J2EE 3-tier application with an archival experimental data management system, the GAGGLE framework for desktop tool integration, and the MeDICi integration framework for high-throughput data analysis workflows. This architecture facilitates a systems biology software solution that enables the entire spectrum of scientific activities, from experimental data access to high throughput processing and analysis of data for biologists and experimental scientists.


visualization and data analysis | 2006

Diverse Information Integration and Visualization

Susan L. Havre; Anuj R. Shah; Christian Posse; Bobbie-Jo M. Webb-Robertson

This paper presents and explores a technique for visually integrating and exploring diverse information. Researchers and analysts seeking knowledge and understanding of complex systems have increasing access to related, but diverse, data. These data provide an opportunity to consider entities of interest from multiple informational perspectives not available from any single, data or information type. These multiple perspectives are derived from diverse, but related data and integrated for simultaneous analysis. Our approach visualizes multiple entities across multiple perspectives where each perspective, or dimension, is an alternate partitioning of the entities. The partitioning may be based on inherent or assigned attributes such as meta-data or prior knowledge captured in annotations. The partitioning may also be directly derived from entity data; for example, clustering, or unsupervised classification, can be applied to multi-dimensional vector entity data to partition the entities into groups, or clusters. The same entities may be clustered on data from different experiment types or processing approaches. This reduction of diverse data/information on an entity to a series of partitions, or discrete (and unit-less) categories, allows the user to view the entities across diverse data without concern for data types and units. Parallel coordinate plots typically visualize continuous data across multiple dimensions. We adapt parallel coordinate plots for discrete values such as partition names to allow the comparison of entity patterns across multiple dimension for identifying trends and outlier entities. We illustrate this approach through a prototype, Juxter (short for Juxtaposer).


Computational Biology and Chemistry | 2008

Brief Communication: A feature vector integration approach for a generalized support vector machine pairwise homology algorithm

Bobbie-Jo M. Webb-Robertson; Christopher S. Oehmen; Anuj R. Shah

Due to the exponential growth of sequenced genomes, the need to quickly provide accurate annotation for existing and new sequences is paramount to facilitate biological research. Current sequence comparison approaches fail to detect homologous relationships when sequence similarity is low. Support vector machine (SVM) algorithms approach this problem by transforming all proteins into a feature space of equal dimension based on protein properties, such as sequence similarity scores against a basis set of proteins or motifs. This multivariate representation of the protein space is then used to build a classifier specific to a pre-defined protein family. However, this approach is not well suited to large-scale annotation. We have developed a SVM approach that formulates remote homology as a single classifier that answers the pairwise comparison problem by integrating the two feature vectors for a pair of sequences into a single vector representation that can be used to build a classifier that separates sequence pairs into homologs and non-homologs. This pairwise SVM approach significantly improves the task of remote homology detection on the benchmark dataset, quantified as the area under the receiver operating characteristic curve; 0.97 versus 0.73 and 0.70 for PSI-BLAST and Basic Local Alignment Search Tool (BLAST), respectively.

Collaboration


Dive into the Anuj R. Shah's collaboration.

Top Co-Authors

Avatar

Bobbie-Jo M. Webb-Robertson

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Christopher S. Oehmen

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Matthew E. Monroe

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Mudita Singhal

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Gordon A. Anderson

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Richard D. Smith

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Joshua N. Adkins

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Katrina M. Waters

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Christian Posse

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Ian Gorton

Pacific Northwest National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge