Warren Gish | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Warren Gish is active.

Explore More

Publication

Featured researches published by Warren Gish.

Journal of Molecular Biology | 1990

Basic Local Alignment Search Tool

Stephen F. Altschul; Warren Gish; Webb Miller; Eugene W. Myers; David J. Lipman

A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.

Methods in Enzymology | 1996

Local alignment statistics.

Stephen F. Altschul; Warren Gish

Publisher Summary This chapter discusses the study of local alignment statistics, the distribution of optimal gapped subalignment scores, and the evidence that two parameters are sufficient to describe both the form of this distribution and its dependence on sequence length. Using a random protein model, the relevant statistical parameters are calculated for a variety of substitution matrices and gap costs. An analysis of these parameters elucidates the relative effectiveness of affine as opposed to length-proportional gap costs. Thus, sum statistics provide a method for evaluating sequence similarity that treats short and long gaps differently. By example, the chapter shows how this method has the potential to increase search sensitivity. The statistics described can be applied to the results of fast alignment (FASTA) searches or to those from a variation of the basic local alignment search tool (BLAST) programs.

Nature Genetics | 1994

Issues in searching molecular sequence databases

Stephen F. Altschul; Mark S. Boguski; Warren Gish; John C. Wootton

Sequence similarity search programs are versatile tools for the molecular biologist, frequently able to identify possible DNA coding regions and to provide clues to gene and protein structure and function. While much attention had been paid to the precise algorithms these programs employ and to their relative speeds, there is a constellation of associated issues that are equally important to realize the full potential of these methods. Here, we consider a number of these issues, including the choice of scoring systems, the statistical significance of alignments, the masking of uninformative or potentially confounding sequence regions, the nature and extent of sequence redundancy in the databases and network access to similarity search services.

Methods | 1991

Improved Sensitivity of Nucleic Acid Database Searches Using Application-Specific Scoring Matrices

David J. States; Warren Gish; Stephen F. Altschul

Scoring matrices for nucleic acid sequence comparison that are based on models appropriate to the analysis of molecular sequencing errors or biological mutation processes are presented. In mammalian genomes, transition mutations occur significantly more frequently than transversions, and the optimal scoring of sequence alignments based on this substitution model differs from that derived assuming a uniform mutation model. The information from sequence alignments potentially available using an optimal scoring system is compared with that obtained using the BLASTN default scoring. A modified BLAST database search tool allows these, or other explicitly specified scoring matrices, to be utilized in computationally efficient queries of nucleic acid databases with nucleic acid query sequences. Results of searches performed using BLASTNs default score matrix are compared with those using scores based on a mutational model in which transitions are more prevalent than transversions.

Nucleic Acids Research | 2003

WU-Blast2 server at the European Bioinformatics Institute

Rodrigo Lopez; Ville Silventoinen; Stephen Robinson; Asif Kibria; Warren Gish

Since 1995, the WU-BLAST programs (http://blast.wustl.edu) have provided a fast, flexible and reliable method for similarity searching of biological sequence databases. The software is in use at many locales and web sites. The European Bioinformatics Institutes WU-Blast2 (http://www.ebi.ac.uk/blast2/) server has been providing free access to these search services since 1997 and today supports many features that both enhance the usability and expand on the scope of the software.

Journal of Computational Biology | 1994

QGB: Combined Use of Sequence Similarity and Codon Bias for Coding Region Identification

David J. States; Warren Gish

A computer program called BLASTX was previously shown to be effective in identifying and assigning putative function to likely protein coding regions by detecting significant similarity between a conceptually translated nucleotide query sequence and members of a protein sequence database. We present and assess the sensitivity of a new option to this software tool, herein called BLASTC, which employs information obtained from biases in codon utilization, along with the information obtained from sequence similarity. A rationale for combining these diverse information sources was derived, and analyses of the information available from codon utilization in several species were performed, with wide variation seen. Codon bias information was found on average to improve the sensitivity of detection of short coding regions of human origin by about a factor of 5. The implications of combining information sources on the interpretation of positive findings are discussed.

Nature Genetics | 1993