Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stephen F. Altschul is active.

Publication


Featured researches published by Stephen F. Altschul.


Journal of Molecular Biology | 1990

Basic Local Alignment Search Tool

Stephen F. Altschul; Warren Gish; Webb Miller; Eugene W. Myers; David J. Lipman

A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.


Proceedings of the National Academy of Sciences of the United States of America | 2002

Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences.

Robert L. Strausberg; Elise A. Feingold; Lynette H. Grouse; Jeffery G. Derge; Richard D. Klausner; Francis S. Collins; Lukas Wagner; Carolyn M. Shenmen; Gregory D. Schuler; Stephen F. Altschul; Barry R. Zeeberg; Kenneth H. Buetow; Carl F. Schaefer; Narayan K. Bhat; Ralph F. Hopkins; Heather Jordan; Troy Moore; Steve I. Max; Jun Wang; Florence Hsieh; Luda Diatchenko; Kate Marusina; Andrew A. Farmer; Gerald M. Rubin; Ling Hong; Mark Stapleton; M. Bento Soares; Maria F. Bonaldo; Tom L. Casavant; Todd E. Scheetz

The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene. ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones, which then were sequenced to high accuracy. The MGC has currently sequenced and verified the full ORF for a nonredundant set of >9,000 human and >6,000 mouse genes. Candidate full-ORF clones for an additional 7,800 human and 3,500 mouse genes also have been identified. All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http://mgc.nci.nih.gov).


Methods in Enzymology | 1996

Local alignment statistics.

Stephen F. Altschul; Warren Gish

Publisher Summary This chapter discusses the study of local alignment statistics, the distribution of optimal gapped subalignment scores, and the evidence that two parameters are sufficient to describe both the form of this distribution and its dependence on sequence length. Using a random protein model, the relevant statistical parameters are calculated for a variety of substitution matrices and gap costs. An analysis of these parameters elucidates the relative effectiveness of affine as opposed to length-proportional gap costs. Thus, sum statistics provide a method for evaluating sequence similarity that treats short and long gaps differently. By example, the chapter shows how this method has the potential to increase search sensitivity. The statistics described can be applied to the results of fast alignment (FASTA) searches or to those from a variation of the basic local alignment search tool (BLAST) programs.


Nature Genetics | 1994

Issues in searching molecular sequence databases

Stephen F. Altschul; Mark S. Boguski; Warren Gish; John C. Wootton

Sequence similarity search programs are versatile tools for the molecular biologist, frequently able to identify possible DNA coding regions and to provide clues to gene and protein structure and function. While much attention had been paid to the precise algorithms these programs employ and to their relative speeds, there is a constellation of associated issues that are equally important to realize the full potential of these methods. Here, we consider a number of these issues, including the choice of scoring systems, the statistical significance of alignments, the masking of uninformative or potentially confounding sequence regions, the nature and extent of sequence redundancy in the databases and network access to similarity search services.


The FASEB Journal | 1997

A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins.

Peer Bork; Kay Hofmann; Philipp Bucher; Andrew F. Neuwald; Stephen F. Altschul; Eugene V. Koonin

Computer analysis of a conserved domain, BRCT, first described at the carboxyl ter‐minus of the breast cancer protein BRCA1, a p53 binding protein (53BP1), and the yeast cell cycle checkpoint protein RAD9 revealed a large super‐ family of domains that occur predominantly in proteins involved in cell cycle checkpoint functions responsive to DNA damage. The BRCT domain consists of ~95 amino acid residues and occurs as a tandem repeat at the carboxyl terminus of numerous proteins, but has been observed also as a tandem repeat at the amino terminus or as a single copy. The BRCT superfamily presently includes ~40 nonorthologous proteins, namely, BRCA1, 53BP1, and RAD9; a protein family that consists of the fission yeast replication checkpoint protein Rad4, the oncoprotein ECT2, the DNA repair protein XRCC1, and yeast DNA polymerase subunit DPB11; DNA binding enzymes such as terminal deoxynucleotidyltransferases, deoxycy‐ tidyl transferase involved in DNA repair, and DNA‐ligases III and IV; yeast multifunctional transcription factor RAP1; and several uncharacterized gene products. Another previously described domain that is shared by bacterial NAD‐dependent DNA‐ligases, the large subunits of eukaryotic replication factor C, and poly(ADP‐ri‐ bose) polymerases appears to be a distinct version of the BRCT domain. The retinoblastoma protein (a universal tumor suppressor) and related proteins may contain a distant relative of the BRCT domain. Despite the functional diversity of all these proteins, participation in DNA damage‐re‐ sponsive checkpoints appears to be a unifying theme. Thus, the BRCT domain is likely to perform critical, yet uncharacterized, functions in the cell cycle control of organisms from bacteria to humans. The car boxyterminal BRCT domain of BRCA1 corresponds precisely to the recently identified minimal transcription activation domain of this protein, indicating one such function.— Bork, P., Hofmann, K., Bucher, P., Neuwald, A. F., Altschul, S. F., Koonin, E. V. A superfamily of conserved domains in DNA damage‐responsive cell cycle checkpoint proteins. FASEB J. 11, 68‐ 76 (1997)


FEBS Journal | 2005

Protein Database Searches Using Compositionally Adjusted Substitution Matrices

Stephen F. Altschul; John C. Wootton; E. Michael Gertz; Richa Agarwala; Aleksandr Morgulis; Alejandro A. Schäffer; Yi-Kuo Yu

Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long‐standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions.


Journal of Molecular Biology | 1991

Amino Acid Substitution Matrices from an Information Theoretic Perspective

Stephen F. Altschul

Abstract Protein sequence alignments have become an important tool for molecular biologists. Local alignments are frequently constructed with the aid of a “substitution score matrix” that specifies a score for aligning each pair of amino acid residues. Over the years, many different substitution matrices have been proposed, based on a wide variety of rationales. Statistical results, however, demonstrate that any such matrix is implicitly a “log-odds” matrix, with a specific target distribution for aligned pairs of amino acid residues. In the light of information theory, it is possible to express the scores of a substitution matrix in bits and to see that different matrices are better adapted to different purposes. The most widely used matrix for protein sequence comparison has been the PAM-250 matrix. It is argued that for database searches the PAM-120 matrix generally is more appropriate, while for comparing two specific proteins with suspected homology the PAM-200 matrix is indicated. Examples discussed include the lipocalins, human α 1B-glycoprotein, the cystic fibrosis transmembrane conductance regulator and the globins.


Trends in Biochemical Sciences | 1998

ITERATED PROFILE SEARCHES WITH PSI-BLAST: A TOOL FOR DISCOVERY IN PROTEIN DATABASES

Stephen F. Altschul; Eugene V. Koonin

We thank the developers of PSI-BLAST, who include D. J. Lipman, T. L. Madden, W. Miller, A. A. Schaffer, J. Zhang and Z. Zhang. We also thank L. Aravind for his collaboration on the application of PSI-BLAST to the detection of subtle relationships among proteins.


Biology Direct | 2012

Domain enhanced lookup time accelerated BLAST.

Grzegorz M Boratyn; Alejandro A. Schäffer; Richa Agarwala; Stephen F. Altschul; David J. Lipman; Thomas L. Madden

BackgroundBLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific score matrix (PSSM) for searching the database in round i + 1. Biegert and Söding developed Context-sensitive BLAST (CS-BLAST), which combines information from searching the sequence database with information derived from a library of short protein profiles to achieve better homology detection than PSI-BLAST, which builds its PSSMs from scratch.ResultsWe describe a new method, called domain enhanced lookup time accelerated BLAST (DELTA-BLAST), which searches a database of pre-constructed PSSMs before searching a protein-sequence database, to yield better homology detection. For its PSSMs, DELTA-BLAST employs a subset of NCBI’s Conserved Domain Database (CDD). On a test set derived from ASTRAL, with one round of searching, DELTA-BLAST achieves a ROC5000 of 0.270 vs. 0.116 for CS-BLAST. The performance advantage diminishes in iterated searches, but DELTA-BLAST continues to achieve better ROC scores than CS-BLAST.ConclusionsDELTA-BLAST is a useful program for the detection of remote protein homologs. It is available under the “Protein BLAST” link at http://blast.ncbi.nlm.nih.gov.ReviewersThis article was reviewed by Arcady Mushegian, Nick V. Grishin, and Frank Eisenhaber.


Bioinformatics | 1999

IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices

Alejandro A. Schäffer; Yuri I. Wolf; Chris P. Ponting; Eugene V. Koonin; L. Aravind; Stephen F. Altschul

MOTIVATION Many studies have shown that database searches using position-specific score matrices (PSSMs) or profiles as queries are more effective at identifying distant protein relationships than are searches that use simple sequences as queries. One popular program for constructing a PSSM and comparing it with a database of sequences is Position-Specific Iterated BLAST (PSI-BLAST). RESULTS This paper describes a new software package, IMPALA, designed for the complementary procedure of comparing a single query sequence with a database of PSI-BLAST-generated PSSMs. We illustrate the use of IMPALA to search a database of PSSMs for protein folds, and one for protein domains involved in signal transduction. IMPALAs sensitivity to distant biological relationships is very similar to that of PSI-BLAST. However, IMPALA employs a more refined analysis of statistical significance and, unlike PSI-BLAST, guarantees the output of the optimal local alignment by using the rigorous Smith-Waterman algorithm. Also, it is considerably faster when run with a large database of PSSMs than is BLAST or PSI-BLAST when run against the complete non-redundant protein database.

Collaboration


Dive into the Stephen F. Altschul's collaboration.

Top Co-Authors

Avatar

David J. Lipman

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Yi-Kuo Yu

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Webb Miller

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Eugene V. Koonin

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Warren Gish

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John C. Wootton

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Thomas L. Madden

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

E. Michael Gertz

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge