Gary D. Stormo
Washington University in St. Louis
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gary D. Stormo.
Bioinformatics | 2000
Gary D. Stormo
The purpose of this article is to provide a brief history of the development and application of computer algorithms for the analysis and prediction of DNA binding sites. This problem can be conveniently divided into two subproblems. The first is, given a collection of known binding sites, develop a representation of those sites that can be used to search new sequences and reliably predict where additional binding sites occur. The second is, given a set of sequences known to contain binding sites for a common factor, but not knowing where the sites are, discover the location of the sites in each sequence and a representation for the specificity of the protein.
international conference on bioinformatics | 1999
Gerald Z. Hertz; Gary D. Stormo
MOTIVATION Molecular biologists frequently can obtain interesting insight by aligning a set of related DNA, RNA or protein sequences. Such alignments can be used to determine either evolutionary or functional relationships. Our interest is in identifying functional relationships. Unless the sequences are very similar, it is necessary to have a specific strategy for measuring-or scoring-the relatedness of the aligned sequences. If the alignment is not known, one can be determined by finding an alignment that optimizes the scoring scheme. RESULTS We describe four components to our approach for determining alignments of multiple sequences. First, we review a log-likelihood scoring scheme we call information content. Second, we describe two methods for estimating the P value of an individual information content score: (i) a method that combines a technique from large-deviation statistics with numerical calculations; (ii) a method that is exclusively numerical. Third, we describe how we count the number of possible alignments given the overall amount of sequence data. This count is multiplied by the P value to determine the expected frequency of an information content score and, thus, the statistical significance of the corresponding alignment. Statistical significance can be used to compare alignments having differing widths and containing differing numbers of sequences. Fourth, we describe a greedy algorithm for determining alignments of functionally related sequences. Finally, we test the accuracy of our P value calculations, and give an example of using our algorithm to identify binding sites for the Escherichia coli CRP protein. AVAILABILITY Programs were developed under the UNIX operating system and are available by anonymous ftp from ftp://beagle.colorado.edu/pub/consensus.
Journal of Molecular Biology | 1986
Thomas D. Schneider; Gary D. Stormo; Larry Gold; Andrzej Ehrenfeucht
Repressors, polymerases, ribosomes and other macromolecules bind to specific nucleic acid sequences. They can find a binding site only if the sequence has a recognizable pattern. We define a measure of the information (R sequence) in the sequence patterns at binding sites. It allows one to investigate how information is distributed across the sites and to compare one site to another. One can also calculate the amount of information (R frequency) that would be required to locate the sites, given that they occur with some frequency in the genome. Several Escherichia coli binding sites were analyzed using these two independent empirical measurements. The two amounts of information are similar for most of the sites we analyzed. In contrast, bacteriophage T7 RNA polymerase binding sites contain about twice as much information as is necessary for recognition by the T7 polymerase, suggesting that a second protein may bind at T7 promoters. The extra information can be accounted for by a strong symmetry element found at the T7 promoters. This element may be an operator. If this model is correct, these promoters and operators do not share much information. The comparisons between R sequence and R frequency suggest that the information at binding sites is just sufficient for the sites to be distinguished from the rest of the genome.
Archive | 2002
Alex Bateman; William R. Pearson; Lincoln Stein; Gary D. Stormo; John R. Yates
1. Please read the rough pages and mark any changes right in the text. 2. If you have large inserts to add, please supply us with a disk and hard copy of the insert(s) and indicate where they should go.
Cell | 2004
Jin Billy Li; Jantje M. Gerdes; Courtney J. Haycraft; Yanli Fan; Tanya M. Teslovich; Helen May-Simera; Haitao Li; Oliver E. Blacque; Linya Li; Carmen C. Leitch; Ra Lewis; Jane Green; Patrick S. Parfrey; Michel R. Leroux; William S. Davidson; Philip L. Beales; Lisa M. Guay-Woodford; Bradley K. Yoder; Gary D. Stormo; Nicholas Katsanis; Susan K. Dutcher
Cilia and flagella are microtubule-based structures nucleated by modified centrioles termed basal bodies. These biochemically complex organelles have more than 250 and 150 polypeptides, respectively. To identify the proteins involved in ciliary and basal body biogenesis and function, we undertook a comparative genomics approach that subtracted the nonflagellated proteome of Arabidopsis from the shared proteome of the ciliated/flagellated organisms Chlamydomonas and human. We identified 688 genes that are present exclusively in organisms with flagella and basal bodies and validated these data through a series of in silico, in vitro, and in vivo studies. We then applied this resource to the study of human ciliation disorders and have identified BBS5, a novel gene for Bardet-Biedl syndrome. We show that this novel protein localizes to basal bodies in mouse and C. elegans, is under the regulatory control of daf-19, and is necessary for the generation of both cilia and flagella.
Nature | 2009
Barbara U. Schraml; Kai Hildner; Wataru Ise; Wan-Ling Lee; Whitney A.-E. Smith; Ben Solomon; Gurmukh Sahota; Julia Sim; Ryuta Mukasa; Saso Cemerski; Robin D. Hatton; Gary D. Stormo; Casey T. Weaver; John H. Russell; Theresa L. Murphy; Kenneth M. Murphy
Activator protein 1 (AP-1, also known as JUN) transcription factors are dimers of JUN, FOS, MAF and activating transcription factor (ATF) family proteins characterized by basic region and leucine zipper domains. Many AP-1 proteins contain defined transcriptional activation domains, but BATF and the closely related BATF3 (refs 2, 3) contain only a basic region and leucine zipper, and are considered to be inhibitors of AP-1 activity. Here we show that Batf is required for the differentiation of IL17-producing T helper (TH17) cells. TH17 cells comprise a CD4+ T-cell subset that coordinates inflammatory responses in host defence but is pathogenic in autoimmunity. Batf-/- mice have normal TH1 and TH2 differentiation, but show a defect in TH17 differentiation, and are resistant to experimental autoimmune encephalomyelitis. Batf-/- T cells fail to induce known factors required for TH17 differentiation, such as RORγt (encoded by Rorc) and the cytokine IL21 (refs 14–17). Neither the addition of IL21 nor the overexpression of RORγt fully restores IL17 production in Batf-/- T cells. The Il17 promoter is BATF-responsive, and after TH17 differentiation, BATF binds conserved intergenic elements in the Il17a–Il17f locus and to the Il17, Il21 and Il22 (ref. 18) promoters. These results demonstrate that the AP-1 protein BATF has a critical role in TH17 differentiation.
Trends in Biochemical Sciences | 1998
Gary D. Stormo; Dana S. Fields
Site-specific DNA-protein interactions can be studied using experimental and computational methods. Experimental approaches typically analyze a protein-DNA interaction by measuring the free energy of binding under a variety of conditions. Computational methods focus on alignments of known binding sites for a protein, and, from these alignments, make estimates of the binding energy. Understanding the relationship between these two perspectives, and finding ways to improve both, is a major challenge of modern molecular biology.
Bioinformatics | 1990
Gerald Z. Hertz; George W. Hartzell; Gary D. Stormo
We have developed a method for identifying consensus patterns in a set of unaligned DNA sequences known to bind a common protein or to have some other common biochemical function. The method is based on a matrix representation of binding site patterns. Each row of the matrix represents one of the four possible bases, each column represents one of the positions of the binding site and each element is determined by the frequency the indicated base occurs at the indicated position. The goal of the method is to find the most significant matrix--i.e. the one with the lowest probability of occurring by chance--out of all the matrices that can be formed from the set of related sequences. The reliability of the method improves with the number of sequences, while the time required increases only linearly with the number of sequences. To test this method, we analysed 11 DNA sequences containing promoters regulated by the Escherichia coli LexA protein. The matrices we found were consistent with the known consensus sequence, and could distinguish the generally accepted LexA binding sites from other DNA sequences.
Molecular Microbiology | 1992
Steven Ringquist; Sidney Shinedling; Doug Barrick; Louis S. Green; Jonathan Binkley; Gary D. Stormo; Larry Gold
The translational roles of the Shine‐Dalgarno sequence, the initiation codon, the space between them, and the second codon have been studied. The Shine Dalgarno sequence UAAGGAGG initiated translation roughly four times more efficiently than did the shorter AAGGA sequence. Each Shine‐Dalgarno sequence required a minimum distance to the initiation codon in order to drive translation; spacing, however, could be rather long. Initiation at AUG was more efficient than at GUG or UUG at each spacing examined; initiation at GUG was only slightly better than UUG. Translation was also affected by residues 3′ to the initiation codon. The second codon can influence the rate of initiation, with the magnitude depending on the initiation codon. The data are consistent with a simple kinetic model in which a variety of rate constants contribute to the process of translation initiation.
Cell | 2008
Marcus Blaine Noyes; Atsuya Wakabayashi; Gary D. Stormo; Michael H. Brodsky; Scot A. Wolfe
We describe the comprehensive characterization of homeodomain DNA-binding specificities from a metazoan genome. The analysis of all 84 independent homeodomains from D. melanogaster reveals the breadth of DNA sequences that can be specified by this recognition motif. The majority of these factors can be organized into 11 different specificity groups, where the preferred recognition sequence between these groups can differ at up to four of the six core recognition positions. Analysis of the recognition motifs within these groups led to a catalog of common specificity determinants that may cooperate or compete to define the binding site preference. With these recognition principles, a homeodomain can be reengineered to create factors where its specificity is altered at the majority of recognition positions. This resource also allows prediction of homeodomain specificities from other organisms, which is demonstrated by the prediction and analysis of human homeodomain specificities.