Richa Agarwala | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Richa Agarwala is active.

Explore More

Publication

Featured researches published by Richa Agarwala.

FEBS Journal | 2005

Protein Database Searches Using Compositionally Adjusted Substitution Matrices

Stephen F. Altschul; John C. Wootton; E. Michael Gertz; Richa Agarwala; Aleksandr Morgulis; Alejandro A. Schäffer; Yi-Kuo Yu

Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long‐standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions.

Bioinformatics | 2007

COBALT: constraint-based alignment tool for multiple protein sequences

Jason S. Papadopoulos; Richa Agarwala

MOTIVATION A tool that simultaneously aligns multiple protein sequences, automatically utilizes information about protein domains, and has a good compromise between speed and accuracy will have practical advantages over current tools. RESULTS We describe COBALT, a constraint based alignment tool that implements a general framework for multiple alignment of protein sequences. COBALT finds a collection of pairwise constraints derived from database searches, sequence similarity and user input, combines these pairwise constraints, and then incorporates them into a progressive multiple alignment. We show that using constraints derived from the conserved domain database (CDD) and PROSITE protein-motif database improves COBALTs alignment quality. We also show that COBALT has reasonable runtime performance and alignment accuracy comparable to or exceeding that of other tools for a broad range of problems. AVAILABILITY COBALT is included in the NCBI C++ toolkit. A Linux executable for COBALT, and CDD and PROSITE data used is available at: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/cobalt

Bioinformatics | 2008

Database indexing for production MegaBLAST searches

Aleksandr Morgulis; George Coulouris; Yan Raytselis; Thomas L. Madden; Richa Agarwala; Alejandro A. Schäffer

Motivation: The BLAST software package for sequence comparison speeds up homology search by preprocessing a query sequence into a lookup table. Numerous research studies have suggested that preprocessing the database instead would give better performance. However, production usage of sequence comparison methods that preprocess the database has been limited to programs such as BLAT and SSAHA that are designed to find matches when query and database subsequences are highly similar. Results: We developed a new version of the MegaBLAST module of BLAST that does the initial phase of finding short seeds for matches by searching a database index. We also developed a program makembindex that preprocesses the database into a data structure for rapid seed searching. We show that the new ‘indexed MegaBLAST’ is faster than the ‘non-indexed’ version for most practical uses. We show that indexed MegaBLAST is faster than miBLAST, another implementation of BLAST nucleotide searching with a preprocessed database, for most of the 200 queries we tested. To deploy indexed MegaBLAST as part of NCBIsWeb BLAST service, the storage of databases and the queueing mechanism were modified, so that some machines are now dedicated to serving queries for a specific database. The response time for such Web queries is now faster than it was when each computer handled queries for multiple databases. Availability: The code for indexed MegaBLAST is part of the blastn program in the NCBI C++ toolkit. The preprocessor program makembindex is also in the toolkit. Indexed MegaBLAST has been used in production on NCBIs Web BLAST service to search one version of the human and mouse genomes since October 2007. The Linux command-line executables for blastn and makembindex, documentation, and some query sets used to carry out the tests described below are available in the directory: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/indexed_megablast Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

PLOS Biology | 2009

Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse

Deanna M. Church; Leo Goodstadt; LaDeana W. Hillier; Michael C. Zody; Steve Goldstein; Xinwe She; Richa Agarwala; Joshua L. Cherry; Michael DiCuccio; Wratko Hlavina; Yuri Kapustin; Peter Meric; Donna Maglott; Zoë Birtle; Ana C. Marques; Tina Graves; Shiguo Zhou; Brian Teague; Konstantinos Potamousis; Chris Churas; Michael Place; Jill Herschleb; Ron Runnheim; Dan Forrest; James M. Amos-Landgraf; David C. Schwartz; Ze Cheng; Kerstin Lindblad-Toh; Evan E. Eichler; Chris P. Ponting

A finished clone-based assembly of the mouse genome reveals extensive recent sequence duplication during recent evolution and rodent-specific expansion of certain gene families. Newly assembled duplications contain protein-coding genes that are mostly involved in reproductive function.

Nature Biotechnology | 2012

Assuring the quality of next-generation sequencing in clinical laboratory practice

Amy S. Gargis; Lisa Kalman; Meredith W Berry; David P. Bick; David Dimmock; Tina Hambuch; Fei Lu; Elaine Lyon; Karl V. Voelkerding; Barbara A. Zehnbauer; Richa Agarwala; Sarah F. Bennett; Bin Chen; Ephrem L.H. Chin; John Compton; Soma Das; Daniel H. Farkas; Matthew J. Ferber; Birgit Funke; Manohar R. Furtado; Lilia Ganova-Raeva; Ute Geigenmüller; Sandra J Gunselman; Madhuri Hegde; Philip L. F. Johnson; Andrew Kasarskis; Shashikant Kulkarni; Thomas Lenk; Cs Jonathan Liu; Megan Manion

Amy S Gargis, Centers for Disease Control and Prevention Lisa Kalman, Centers for Disease Control and Prevention Meredith W Berry, SeqWright Inc David P Bick, Medical College of Wisconsin David P Dimmock, Medical College of Wisconsin Tina Hambuch, Illumina Clinical Services Fei Lu, SeqWright Inc Elaine Lyon, University of Utah Karl V Voelkerding, University of Utah Barbara Zehnbauer, Emory University

Biology Direct | 2012

Domain enhanced lookup time accelerated BLAST.

Grzegorz M Boratyn; Alejandro A. Schäffer; Richa Agarwala; Stephen F. Altschul; David J. Lipman; Thomas L. Madden

BackgroundBLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific score matrix (PSSM) for searching the database in round i + 1. Biegert and Söding developed Context-sensitive BLAST (CS-BLAST), which combines information from searching the sequence database with information derived from a library of short protein profiles to achieve better homology detection than PSI-BLAST, which builds its PSSMs from scratch.ResultsWe describe a new method, called domain enhanced lookup time accelerated BLAST (DELTA-BLAST), which searches a database of pre-constructed PSSMs before searching a protein-sequence database, to yield better homology detection. For its PSSMs, DELTA-BLAST employs a subset of NCBI’s Conserved Domain Database (CDD). On a test set derived from ASTRAL, with one round of searching, DELTA-BLAST achieves a ROC5000 of 0.270 vs. 0.116 for CS-BLAST. The performance advantage diminishes in iterated searches, but DELTA-BLAST continues to achieve better ROC scores than CS-BLAST.ConclusionsDELTA-BLAST is a useful program for the detection of remote protein homologs. It is available under the “Protein BLAST” link at http://blast.ncbi.nlm.nih.gov.ReviewersThis article was reviewed by Arcady Mushegian, Nick V. Grishin, and Frank Eisenhaber.

American Journal of Human Genetics | 2000

A Novel Nemaline Myopathy in the Amish Caused by a Mutation in Troponin T1

Jennifer J. Johnston; Richard I. Kelley; Thomas O. Crawford; D. Holmes Morton; Richa Agarwala; Thorsten Koch; Alejandro A. Schäffer; Clair A. Francomano; Leslie G. Biesecker

The nemaline myopathies are characterized by weakness and eosinophilic, rodlike (nemaline) inclusions in muscle fibers. Amish nemaline myopathy is a form of nemaline myopathy common among the Old Order Amish. In the first months of life, affected infants have tremors with hypotonia and mild contractures of the shoulders and hips. Progressive worsening of the proximal contractures, weakness, and a pectus carinatum deformity develop before the children die of respiratory insufficiency, usually in the second year. The disorder has an incidence of approximately 1 in 500 among the Amish, and it is inherited in an autosomal recessive pattern. Using a genealogy database, automated pedigree software, and linkage analysis of DNA samples from four sibships, we identified an approximately 2-cM interval on chromosome 19q13.4 that was homozygous in all affected individuals. The gene for the sarcomeric thin-filament protein, slow skeletal muscle troponin T (TNNT1), maps to this interval and was sequenced. We identified a stop codon in exon 11, predicted to truncate the protein at amino acid 179, which segregates with the disease. We conclude that Amish nemaline myopathy is a distinct, heritable, myopathic disorder caused by a mutation in TNNT1.

Nucleic Acids Research | 2009

The National Center for Biotechnology Information's Protein Clusters Database

William Klimke; Richa Agarwala; Azat Badretdin; Slava Chetvernin; Stacy Ciufo; Boris Fedorov; Boris Kiryutin; Kathleen O’Neill; Wolfgang Resch; Sergei Resenchuk; Susan C. Schafer; Igor Tolstoy; Tatiana Tatusova

Rapid increases in DNA sequencing capabilities have led to a vast increase in the data generated from prokaryotic genomic studies, which has been a boon to scientists studying micro-organism evolution and to those who wish to understand the biological underpinnings of microbial systems. The NCBI Protein Clusters Database (ProtClustDB) has been created to efficiently maintain and keep the deluge of data up to date. ProtClustDB contains both curated and uncurated clusters of proteins grouped by sequence similarity. The May 2008 release contains a total of 285 386 clusters derived from over 1.7 million proteins encoded by 3806 nt sequences from the RefSeq collection of complete chromosomes and plasmids from four major groups: prokaryotes, bacteriophages and the mitochondrial and chloroplast organelles. There are 7180 clusters containing 376 513 proteins with curated gene and protein functional annotation. PubMed identifiers and external cross references are collected for all clusters and provide additional information resources. A suite of web tools is available to explore more detailed information, such as multiple alignments, phylogenetic trees and genomic neighborhoods. ProtClustDB provides an efficient method to aggregate gene and protein annotation for researchers and is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters.

BMC Biology | 2006

Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST

E. Michael Gertz; Yi-Kuo Yu; Richa Agarwala; Alejandro A. Schäffer; Stephen F. Altschul

BackgroundTBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server.ResultsWe evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy.ConclusionTBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms.

PLOS Biology | 2011

Modernizing Reference Genome Assemblies

Deanna M. Church; Valerie Schneider; Tina Graves; Katherine Auger; Fiona Cunningham; Nathan Bouk; Hsiu Chuan Chen; Richa Agarwala; William M. McLaren; Graham R. S. Ritchie; Derek Albracht; Milinn Kremitzki; Susan Rock; Holland Kotkiewicz; Colin Kremitzki; Aye Wollam; Lee Trani; Lucinda Fulton; Robert S. Fulton; Lucy Matthews; S. Whitehead; William Chow; James Torrance; Matthew Dunn; Glenn Harden; Glen Threadgold; Jonathan Wood; Joanna Collins; Paul Heath; Guy Griffiths

I have read the journals policy and have the following conflicts: Paul Flicek is married to the deputy editor of PLoS Medicine, Melissa Norton. Evan Eichler is on the board of Pacific Biosciences. Support for this work came from the Intramural Research Program of the NIH, The National Library of Medicine, the European Molecular Biology Laboratory, the Wellcome Trust (grant number 077198), and the Howard Hughes Medical Institute (EEE). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Explore More