Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Valmik Desai is active.

Publication


Featured researches published by Valmik Desai.


Proteins | 2009

Genome-wide enzyme annotation with precision control: catalytic families (CatFam) databases.

Chenggang Yu; Nela Zavaljevski; Valmik Desai; Jaques Reifman

In this article, we present a new method termed CatFam (Catalytic Families) to automatically infer the functions of catalytic proteins, which account for 20–40% of all proteins in living organisms and play a critical role in a variety of biological processes. CatFam is a sequence‐based method that generates sequence profiles to represent and infer protein catalytic functions. CatFam generates profiles through a stepwise procedure that carefully controls profile quality and employs nonenzymes as negative samples to establish profile‐specific thresholds associated with a predefined nominal false‐positive rate (FPR) of predictions. The adjustable FPR allows for fine precision control of each profile and enables the generation of profile databases that meet different needs: function annotation with high precision and hypothesis generation with moderate precision but better recall. Multiple tests of CatFam databases (generated with distinct nominal FPRs) against enzyme and nonenzyme datasets show that the methods predictions have consistently high precision and recall. For example, a 1% FPR database predicts protein catalytic functions for a dataset of enzymes and nonenzymes with 98.6% precision and 95.0% recall. Comparisons of CatFam databases against other established profile‐based methods for the functional annotation of 13 bacterial genomes indicate that CatFam consistently achieves higher precision and (in most cases) higher recall, and that (on average) CatFam provides 21.9% additional catalytic functions not inferred by the other similarly reliable methods. These results strongly suggest that the proposed method provides a valuable contribution to the automated prediction of protein catalytic functions. The CatFam databases and the database search program are freely available at http://www.bhsai.org/downloads/catfam.tar.gz. Proteins 2009.


PLOS ONE | 2011

AGeS: A Software System for Microbial Genome Sequence Annotation

Kamal Kumar; Valmik Desai; Li Cheng; Maxim Y. Khitrov; Deepak Grover; Ravi Vijaya Satya; Chenggang Yu; Nela Zavaljevski; Jaques Reifman

Background The annotation of genomes from next-generation sequencing platforms needs to be rapid, high-throughput, and fully integrated and automated. Although a few Web-based annotation services have recently become available, they may not be the best solution for researchers that need to annotate a large number of genomes, possibly including proprietary data, and store them locally for further analysis. To address this need, we developed a standalone software application, the Annotation of microbial Genome Sequences (AGeS) system, which incorporates publicly available and in-house-developed bioinformatics tools and databases, many of which are parallelized for high-throughput performance. Methodology The AGeS system supports three main capabilities. The first is the storage of input contig sequences and the resulting annotation data in a central, customized database. The second is the annotation of microbial genomes using an integrated software pipeline, which first analyzes contigs from high-throughput sequencing by locating genomic regions that code for proteins, RNA, and other genomic elements through the Do-It-Yourself Annotation (DIYA) framework. The identified protein-coding regions are then functionally annotated using the in-house-developed Pipeline for Protein Annotation (PIPA). The third capability is the visualization of annotated sequences using GBrowse. To date, we have implemented these capabilities for bacterial genomes. AGeS was evaluated by comparing its genome annotations with those provided by three other methods. Our results indicate that the software tools integrated into AGeS provide annotations that are in general agreement with those provided by the compared methods. This is demonstrated by a >94% overlap in the number of identified genes, a significant number of identical annotated features, and a >90% agreement in enzyme function predictions.


PLOS ONE | 2009

PSPP: a protein structure prediction pipeline for computing clusters.

Michael S. Lee; Rajkumar Bondugula; Valmik Desai; Nela Zavaljevski; In-Chul Yeh; Anders Wallqvist; Jaques Reifman

Background Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a users own high-performance computing cluster. Methodology/Principal Findings The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes. Conclusions The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction.


BMC Bioinformatics | 2012

QuartetS-DB: a large-scale orthology database for prokaryotes and eukaryotes inferred by evolutionary evidence

Chenggang Yu; Valmik Desai; Li Cheng; Jaques Reifman

BackgroundThe concept of orthology is key to decoding evolutionary relationships among genes across different species using comparative genomics. QuartetS is a recently reported algorithm for large-scale orthology detection. Based on the well-established evolutionary principle that gene duplication events discriminate paralogous from orthologous genes, QuartetS has been shown to improve orthology detection accuracy while maintaining computational efficiency.DescriptionQuartetS-DB is a new orthology database constructed using the QuartetS algorithm. The database provides orthology predictions among 1621 complete genomes (1365 bacterial, 92 archaeal, and 164 eukaryotic), covering more than seven million proteins and four million pairwise orthologs. It is a major source of orthologous groups, containing more than 300,000 groups of orthologous proteins and 236,000 corresponding gene trees. The database also provides over 500,000 groups of inparalogs. In addition to its size, a distinguishing feature of QuartetS-DB is the ability to allow users to select a cutoff value that modulates the balance between prediction accuracy and coverage of the retrieved pairwise orthologs. The database is accessible at https://applications.bioanalysis.org/quartetsdb.ConclusionsQuartetS-DB is one of the largest orthology resources available to date. Because its orthology predictions are underpinned by evolutionary evidence obtained from sequenced genomes, we expect its accuracy to continue to increase in future releases as the genomes of additional species are sequenced.


PLOS Neglected Tropical Diseases | 2017

Dengue virus antibody database: Systematically linking serotype-specificity with epitope mapping in dengue virus

Sidhartha Chaudhury; Gregory D. Gromowski; Daniel R. Ripoll; Ilja V. Khavrutskii; Valmik Desai; Anders Wallqvist

Background A majority infections caused by dengue virus (DENV) are asymptomatic, but a higher incidence of severe illness, such as dengue hemorrhagic fever, is associated with secondary infections, suggesting that pre-existing immunity plays a central role in dengue pathogenesis. Primary infections are typically associated with a largely serotype-specific antibody response, while secondary infections show a shift to a broadly cross-reactive antibody response. Methods/Principal findings We hypothesized that the basis for the shift in serotype-specificity between primary and secondary infections can be found in a change in the antibody fine-specificity. To investigate the link between epitope- and serotype-specificity, we assembled the Dengue Virus Antibody Database, an online repository containing over 400 DENV-specific mAbs, each annotated with information on 1) its origin, including the immunogen, host immune history, and selection methods, 2) binding/neutralization data against all four DENV serotypes, and 3) epitope mapping at the domain or residue level to the DENV E protein. We combined epitope mapping and activity information to determine a residue-level index of epitope propensity and cross-reactivity and generated detailed composite epitope maps of primary and secondary antibody responses. We found differing patterns of epitope-specificity between primary and secondary infections, where secondary responses target a distinct subset of epitopes found in the primary response. We found that secondary infections were marked with an enhanced response to cross-reactive epitopes, such as the fusion-loop and E-dimer region, as well as increased cross-reactivity in what are typically more serotype-specific epitope regions, such as the domain I-II interface and domain III. Conclusions/Significance Our results support the theory that pre-existing cross-reactive memory B cells form the basis for the secondary antibody response, resulting in a broadening of the response in terms of cross-reactivity, and a focusing of the response to a subset of epitopes, including some, such as the fusion-loop region, that are implicated in poor neutralization and antibody-dependent enhancement of infection.


Frontiers in Pharmacology | 2017

vNN Web Server for ADMET Predictions

Patric Schyman; Ruifeng Liu; Valmik Desai; Anders Wallqvist

In drug development, early assessments of pharmacokinetic and toxic properties are important stepping stones to avoid costly and unnecessary failures. Considerable progress has recently been made in the development of computer-based (in silico) models to estimate such properties. Nonetheless, such models can be further improved in terms of their ability to make predictions more rapidly, easily, and with greater reliability. To address this issue, we have used our vNN method to develop 15 absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction models. These models quickly assess some of the most important properties of potential drug candidates, including their cytotoxicity, mutagenicity, cardiotoxicity, drug-drug interactions, microsomal stability, and likelihood of causing drug-induced liver injury. Here we summarize the ability of each of these models to predict such properties and discuss their overall performance. All of these ADMET models are publically available on our website (https://vnnadmet.bhsai.org/), which also offers the capability of using the vNN method to customize and build new models.


ieee international conference on high performance computing data and analytics | 2010

Large-Scale Orthology Predictions for Inferring Gene Functions across Multiple Species

Chenggang Yu; Valmik Desai; Nela Zavaljevski; Jaques Reifman

An effective approach to infer the functions of genes is to use the concept of gene orthology. Because orthologous genes are likely to share similar functions, the functions of genes in an unstudied species can be inferred through the functions of their orthologs in a studied model species. To infer gene functions for a multitude of species, we developed a high-throughput orthology prediction method, termed PhyloTrace. PhyloTrace is both highly accurate and computationally efficient for large-scale applications, having the ability to infer orthologous genes across thousands of species. This is accomplished through three major steps: 1) all-against-all gene comparisons for every pair of genes, 2) pair-wise orthology predictions for every two genomes, and 3) the generation of orthologous clusters that contain orthologous genes across multiple genomes. We employed the previously developed Pipe man parallelization program to break down a set of millions of input sequences into small chunks and then processed them in parallel. We successfully predicted orthologs for over 900 bacterial genomes, achieving a false-positive prediction rate of 2.0%, which was a significant improvement compared with the widely used bidirectional best-hit method, which yielded a false-positive rate of 5.5%.


Computing in Science and Engineering | 2010

Accelerating Biomedical Research in Designing Diagnostic Assays, Drugs, and Vaccines

Anders Wallqvist; Nela Zavaljevski; R Vijaya Satya; Rajkumar Bondugula; Valmik Desai; Xin Hu; Kamal Kumar; Michael S. Lee; In-Chul Yeh; Chenggang Yu; Jaques Reifman

The US Department of Defense Biotechnology High-Performance Computing Software Applications Institute for Force Health Protection develops state-of-the-art high-performance computing applications that accelerate biomedical research in the development of diagnostic assays, drugs, and vaccines. The BHSAI works together with DoD life scientists to develop and integrate HPC software applications into DoD biomedical research programs.


ieee international conference on high performance computing data and analytics | 2009

A Web-Accessible Protein Structure Prediction Pipeline

Michael S. Lee; Rajkumar Bondugula; Valmik Desai; Nela Zavaljevski; In-Chul Yeh; Anders Wallqvist; Jaques Reifman

Proteins are the molecular basis of nearly all structural, catalytic, sensory, and regulatory functions in living organisms. The biological function of a protein is inextricably linked to its three-dimensional (3D) atomic structure. Traditional structure determination methods, such as X-ray and nuclear magnetic resonance techniques, are time-consuming, expensive, and infeasible for the millions of proteins that have been sequenced so far from various organisms. Alternatively, computational structure prediction methods provide a faster and more cost-effective, albeit approximate, alternative to experimental structure determination. We present a high-throughput protein structure prediction pipeline (dubbed “PSPP”), which given input protein sequences infers their 3D atomic structures. The pipeline was designed to be used with high performance computing clusters and to scale with the number of processors. The pipeline encompasses a core Perl module, a parallel job manager, and a Web browser graphical user interface accessible at our Website (www.bhsai.org). The software is currently installed at the Department of Defense (DoD) Maui High Performance Computing Center, and it is available for download along with its associated databases from our site. Currently, DoD scientists are using the pipeline in basic science and drug and vaccine development projects.


BMC Bioinformatics | 2008

The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation

Chenggang Yu; Nela Zavaljevski; Valmik Desai; Seth Johnson; Fred J. Stevens; Jaques Reifman

Collaboration


Dive into the Valmik Desai's collaboration.

Top Co-Authors

Avatar

Nela Zavaljevski

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Chenggang Yu

United States Department of Defense

View shared research outputs
Top Co-Authors

Avatar

Anders Wallqvist

United States Army Medical Research and Materiel Command

View shared research outputs
Top Co-Authors

Avatar

In-Chul Yeh

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Li Cheng

United States Department of Defense

View shared research outputs
Top Co-Authors

Avatar

Michael S. Lee

United States Army Medical Research Institute of Infectious Diseases

View shared research outputs
Top Co-Authors

Avatar

Xin Hu

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Fred J. Stevens

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Gregory D. Gromowski

Walter Reed Army Institute of Research

View shared research outputs
Researchain Logo
Decentralizing Knowledge