Dirk Holste
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dirk Holste.
PLOS Biology | 2004
William G. Fairbrother; Dirk Holste; Christopher B. Burge; Phillip A. Sharp
Because deleterious alleles arising from mutation are filtered by natural selection, mutations that create such alleles will be underrepresented in the set of common genetic variation existing in a population at any given time. Here, we describe an approach based on this idea called VERIFY (variant elimination reinforces functionality), which can be used to assess the extent of natural selection acting on an oligonucleotide motif or set of motifs predicted to have biological activity. As an application of this approach, we analyzed a set of 238 hexanucleotides previously predicted to have exonic splicing enhancer (ESE) activity in human exons using the relative enhancer and silencer classification by unanimous enrichment (RESCUE)-ESE method. Aligning the single nucleotide polymorphisms (SNPs) from the public human SNP database to the chimpanzee genome allowed inference of the direction of the mutations that created present-day SNPs. Analyzing the set of SNPs that overlap RESCUE-ESE hexamers, we conclude that nearly one-fifth of the mutations that disrupt predicted ESEs have been eliminated by natural selection (odds ratio = 0.82 ± 0.05). This selection is strongest for the predicted ESEs that are located near splice sites. Our results demonstrate a novel approach for quantifying the extent of natural selection acting on candidate functional motifs and also suggest certain features of mutations/SNPs, such as proximity to the splice site and disruption or alteration of predicted ESEs, that should be useful in identifying variants that might cause a biological phenotype.
Nucleic Acids Research | 2006
Dirk Holste; George Huo; Vivian Tung; Christopher B. Burge
RNA splicing is an essential step in gene expression, and is often variable, giving rise to multiple alternatively spliced mRNA and protein isoforms from a single gene locus. The design of effective databases to support experimental and computational investigations of alternative splicing (AS) is a significant challenge. In an effort to integrate accurate exon and splice site annotation with current knowledge about splicing regulatory elements and predicted AS events, and to link information about the splicing of orthologous genes in different species, we have developed the Hollywood system. This database was built upon genomic annotation of splicing patterns of known genes derived from spliced alignment of complementary DNAs (cDNAs) and expressed sequence tags, and links features such as splice site sequence and strength, exonic splicing enhancers and silencers, conserved and non-conserved patterns of splicing, and cDNA library information for inferred alternative exons. Hollywood was implemented as a relational database and currently contains comprehensive information for human and mouse. It is accompanied by a web query tool that allows searches for sets of exons with specific splicing characteristics or splicing regulatory element composition, or gives a graphical or sequence-level summary of splicing patterns for a specific gene. A streamlined graphical representation of gene splicing patterns is provided, and these patterns can alternatively be layered onto existing information in the UCSC Genome Browser. The database is accessible at .
Physical Review E | 2005
Wentian Li; Dirk Holste
Spatial fluctuations of guanine and cytosine base content (GC%) are studied by spectral analysis for the complete set of human genomic DNA sequences. We find that (i) 1/ f(alpha) decay is universally observed in the power spectra of all 24 chromosomes, and (ii) the exponent alpha approximately 1 extends to about 10(7) bases, one order of magnitude longer than has previously been observed. We further find that (iii) almost all human chromosomes exhibit a crossover from alpha(1) approximately 1 (1/ f (alpha(1))) at lower frequency to alpha(2) <1 (1/ f (alpha(2))) at higher frequency, typically occurring at around 30,000-100,000 bases, while (iv) the crossover in this frequency range is virtually absent in human chromosome 22. In addition to the universal 1/ f(alpha) noise in power spectra, we find (v) several lines of evidence for chromosome-specific correlation structures, including a 500,000 base long oscillation in human chromosome 21. The universal 1/ f(alpha) spectrum in the human genome is further substantiated by a resistance to reduction in variance of guanine and cytosine content when the window size is increased.
Journal of Molecular Evolution | 2000
Dirk Holste; Olaf Weiss; Ivo Grosse; Hanspeter Herzel
Abstract. It has been hypothesized that a large fraction of 24% noncoding DNA in R. prowazekii consists of degraded genes. This hypothesis has been based on the relatively high G+C content of noncoding DNA. However, a comparison with other genomes also having a low overall G+C content shows that this argument would also apply to other bacteria. To test this hypothesis, we study the coding potential in sets of genes, pseudogenes, and intergenic regions. We find that the correlation function and the χ2-measure are clearly indicative of the coding function of genes and pseudogenes. However, both coding potentials make almost no indication of a preexisting reading frame in the remaining 23% of noncoding DNA. We simulate the degradation of genes due to single-nucleotide substitutions and insertions/deletions and quantify the number of mutations required to remove indications of the reading frame. We discuss a reduced selection pressure as another possible origin of this comparatively large fraction of noncoding sequences.
pacific symposium on biocomputing | 1999
Ivo Grosse; Sergey V. Buldyrev; H. E. Stanley; Dirk Holste; Hanspeter Herzel
One basic problem in the analysis of DNA sequences is the recognition of protein-coding genes. Computer algorithms to facilitate gene identification have become important as genome sequencing projects have turned from mapping to large-scale sequencing, resulting in an exponentially growing number of sequenced nucleotides that await their annotation. Many statistical patterns have been discovered that are different in coding and noncoding DNA, but most of them vary from species to species, and hence require prior training on organism-specific data sets. Here, we investigate if there exist species-independent statistical patterns that are different in coding and noncoding DNA. We introduce an information-theoretic quantity, the average mutual information (AMI), and we find that the probability distribution functions of the AMI are significantly different in coding and noncoding DNA, while they are almost identical for different species. This finding suggests that the AMI might be useful for the recognition of protein-coding regions in genomes for which training sets do not exist.
Fluctuation and Noise Letters | 2004
Wentian Li; Dirk Holste
We study global fluctuations of the guanine and cytosine base content (GC%) in mouse genomic DNA using spectral analyses. Power spectra S(f) of GC% fluctuations in all nineteen autosomal and two sex chromosomes are observed to have the universal functional form S(f)~1/fα (α≈1) over several orders of magnitude in the frequency range 10-7 10-5 cycle/base) shows a flattened power-law function with α<1 across all twenty-one chromosomes. The substitution of about 38% interspersed repeats does not affect the functional form of S(f), indicating that these are not predominantly responsible for the long-ranged multi-scale GC% fluctuations in mammalian genomes. Several biological implications of the large-scale GC% fluctuation are discussed, including neutral evolutionary history by DNA duplication, chromosomal bands, spatial distribution of transcription units (genes), replication timing, and recombination hot spots.
Computational Biology and Chemistry | 2004
Wentian Li; Dirk Holste
An oscillation with a period of around 500 kb in guanine and cytosine content (GC%) is observed in the DNA sequence of human chromosome 21. This oscillation is localized in the rightmost one-eighth region of the chromosome, from 43.5 Mb to 46.5 Mb. Five cycles of oscillation are observed in this region with six GC-rich peaks and five GC-poor valleys. The GC-poor valleys comprise regions with low density of CpG islands and, alternating between the two DNA strands, low gene density regions. Consequently, the long-range oscillation of GC% result in spacing patterns of both CpG island density, and to a lesser extent, gene densities.
Genome Biology | 2004
Gene W. Yeo; Dirk Holste; Gabriel Kreiman; Christopher B. Burge
Proceedings of the National Academy of Sciences of the United States of America | 2005
Gene W. Yeo; Eric L. Van Nostrand; Dirk Holste; Tomaso Poggio; Christopher B. Burge
Physical Review E | 2003
Dirk Holste; Ivo Grosse; Stephan Beirer; Patrick Schieg; Hanspeter Herzel