Douglas L. Brutlag | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Douglas L. Brutlag is active.

Explore More

Publication

Featured researches published by Douglas L. Brutlag.

Proceedings of the National Academy of Sciences of the United States of America | 2006

A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters

Serge Saxonov; Paul Berg; Douglas L. Brutlag

A striking feature of the human genome is the dearth of CpG dinucleotides (CpGs) interrupted occasionally by CpG islands (CGIs), regions with relatively high content of the dinucleotide. CGIs are generally associated with promoters; genes, whose promoters are especially rich in CpG sequences, tend to be expressed in most tissues. However, all working definitions of what constitutes a CGI rely on ad hoc thresholds. Here we adopt a direct and comprehensive survey to identify the locations of all CpGs in the human genome and find that promoters segregate naturally into two classes by CpG content. Seventy-two percent of promoters belong to the class with high CpG content (HCG), and 28% are in the class whose CpG content is characteristic of the overall genome (low CpG content). The enrichment of CpGs in the HCG class is symmetric and peaks around the core promoter. The broad-based expression of the HCG promoters is not a consequence of a correlation with CpG content because within the HCG class the breadth of expression is independent of the CpG content. The overall depletion of CpGs throughout the genome is thought to be a consequence of the methylation of some germ-line CpGs and their susceptibility to mutation. A comparison of the frequencies of inferred deamination mutations at CpG and GpC dinucleotides in the two classes of promoters using SNPs in human-chimpanzee sequence alignments shows that CpGs mutate at a lower frequency in the HCG promoters, suggesting that CpGs in the HCG class are hypomethylated in the germ line.

pacific symposium on biocomputing | 2000

BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes.

X. Liu; Douglas L. Brutlag; Jun S. Liu

The development of genome sequencing and DNA microarray analysis of gene expression gives rise to the demand for data-mining tools. BioProspector, a C program using a Gibbs sampling strategy, examines the upstream region of genes in the same gene expression pattern group and looks for regulatory sequence motifs. BioProspector uses zero to third-order Markov background models whose parameters are either given by the user or estimated from a specified sequence file. The significance of each motif found is judged based on a motif score distribution estimated by a Monte Carlo method. In addition, BioProspector modifies the motif model used in the earlier Gibbs samplers to allow for the modeling of gapped motifs and motifs with palindromic patterns. All these modifications greatly improve the performance of the program. Although testing and development are still in progress, the program has shown preliminary success in finding the binding motifs for Saccharomyces cerevisiae RAP1, Bacillus subtilis RNA polymerase, and Escherichia coli CRP. We are currently working on combining BioProspector with a clustering program to explore gene expression networks and regulatory mechanisms.

Nature Biotechnology | 2002

An algorithm for finding protein-DNA binding sites with applications to chromatin- immunoprecipitation microarray experiments

X. Shirley Liu; Douglas L. Brutlag; Jun S. Liu

Chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP–array) has become a popular procedure for studying genome-wide protein–DNA interactions and transcription regulation. However, it can only map the probable protein–DNA interaction loci within 1–2 kilobases resolution. To pinpoint interaction sites down to the base-pair level, we introduce a computational method, Motif Discovery scan (MDscan), that examines the ChIP–array-selected sequences and searches for DNA sequence motifs representing the protein–DNA interaction sites. MDscan combines the advantages of two widely adopted motif search strategies, word enumeration and position-specific weight matrix updating, and incorporates the ChIP–array ranking information to accelerate searches and enhance their success rates. MDscan correctly identified all the experimentally verified motifs from published ChIP–array experiments in yeast (STE12, GAL4, RAP1, SCB, MCB, MCM1, SFF, and SWI5), and predicted two motif patterns for the differential binding of Rap1 protein in telomere regions. In our studies, the method was faster and more accurate than several established motif-finding algorithms. MDscan can be used to find DNA motifs not only in ChIP–array experiments but also in other experiments in which a subgroup of the sequences can be inferred to contain relatively abundant motif sites. The MDscan web server can be accessed at http://BioProspector.stanford.edu/MDscan/.

Cell | 1980

ATP-dependent DNA topoisomerase from D. melanogaster reversibly catenates duplex DNA rings

Tao-shih Hsieh; Douglas L. Brutlag

Extracts of Drosophila embryos contain an enzymatic activity that converts circular DNAs into huge networks of catenated rings in an ATP-dependent fashion. The catenated activity is resolved into two protein components during purification. One component is a novel DNA topoisomerase that requires the presence of ATP in order to relax supercoiled DNA. We have shown that the ATP-dependent DNA topoisomerase relaxes DNA by a mechanism distinct from that of nicking-closing enzymes. The Drosophila ATP-dependent topoisomerase allows one segment of a circular DNA to pass through transient breaks in both strands at another site on the DNA circle without any relative rotation between the ends at the transient break. This mechanism can convert negative supertwists to positive twists and vice versa until a relaxed equilibrium state is reached. The formation of catenated rings is mediated by an analogous bimolecular reaction which can occur between two nonhomologous DNA circles. The catenation reaction is fully reversible: in the presence of the second protein component, circular DNA is converted quantitatively into catenated forms; in its absence, the ATP-dependent topoisomerase resolves catenated networks back into monomer circles. The Drosophila ATP-dependent topoisomerase appears to be closely related to E. coli DNA gyrase in that both use a similar mechanism to change the topology of DNA, both require ATP and both are inhibited by the antibiotic novobiocin. The presence of an enzyme that allows one DNA helix to pass freely through another could not only be useful in relaxation of topological constraints, but also may be involved in the folding and unfolding of eucaryotic chromosomes.

Proceedings of the National Academy of Sciences of the United States of America | 2006

Genotypic predictors of human immunodeficiency virus type 1 drug resistance

Soo-Yon Rhee; Jonathan Taylor; Gauhar Wadhera; Asa Ben-Hur; Douglas L. Brutlag; Robert W. Shafer

Understanding the genetic basis of HIV-1 drug resistance is essential to developing new antiretroviral drugs and optimizing the use of existing drugs. This understanding, however, is hampered by the large numbers of mutation patterns associated with cross-resistance within each antiretroviral drug class. We used five statistical learning methods (decision trees, neural networks, support vector regression, least-squares regression, and least angle regression) to relate HIV-1 protease and reverse transcriptase mutations to in vitro susceptibility to 16 antiretroviral drugs. Learning methods were trained and tested on a public data set of genotype–phenotype correlations by 5-fold cross-validation. For each learning method, four mutation sets were used as input features: a complete set of all mutations in ≥2 sequences in the data set, the 30 most common data set mutations, an expert panel mutation set, and a set of nonpolymorphic treatment-selected mutations from a public database linking protease and reverse transcriptase sequences to antiretroviral drug exposure. The nonpolymorphic treatment-selected mutations led to the best predictions: 80.1% accuracy at classifying sequences as susceptible, low/intermediate resistant, or highly resistant. Least angle regression predicted susceptibility significantly better than other methods when using the complete set of mutations. The three regression methods provided consistent estimates of the quantitative effect of mutations on drug susceptibility, identifying nearly all previously reported genotype–phenotype associations and providing strong statistical support for many new associations. Mutation regression coefficients showed that, within a drug class, cross-resistance patterns differ for different mutation subsets and that cross-resistance has been underestimated.

Biochemical and Biophysical Research Communications | 1969

An active fragment of DNA polymerase produced by proteolytic cleavage.

Douglas L. Brutlag; Maurice R. Atkinson; Peter Setlow; Arthur Kornberg

Abstract DNA polymerase from Escherichia coli was cleaved by limited proteolytic action into two fragments of 76, 000 and 34, 000 molecular weight. The cleaved enzyme is still an active polymerase but has a reduced 5′→3′ nuclease activity. The fragments were separated by gel filtration. The isolated larger fragment retains the polymerizing activity and the 3′→5′ nuclease activity present in the native enzyme, but not the 5′→3′ nuclease. This specific proteolytic cleavage was catalyzed most effectively by an extract from Bacillus subtilis or by trypsin.

Nucleic Acids Research | 2001

The EMOTIF database

Jimmy Y. Huang; Douglas L. Brutlag

The EMOTIF database is a collection of more than 170 000 highly specific and sensitive protein sequence motifs representing conserved biochemical properties and biological functions. These protein motifs are derived from 7697 sequence alignments in the BLOCKS+ database (released on June 23, 2000) and all 8244 protein sequence alignments in the PRINTS database (version 27.0) using the emotif-maker algorithm developed by Nevill-Manning et al. (Nevill-Manning,C.G., Wu,T.D. and Brutlag,D.L. (1998) Proc. Natl Acad. Sci. USA, 95, 5865-5871; Nevill-Manning,C.G., Sethi,K.S., Wu,T. D. and Brutlag,D.L. (1997) ISMB-97, 5, 202-209). Since the amino acids and the groups of amino acids in these sequence motifs represent critical positions conserved in evolution, search algorithms employing the EMOTIF patterns can identify and classify more widely divergent sequences than methods based on global sequence similarity. The emotif protein pattern database is available at http://motif.stanford.edu/emotif/.

Journal of Computational Biology | 2000

Bayesian Segmentation of Protein Secondary Structure

Scott C. Schmidler; Jun S. Liu; Douglas L. Brutlag

We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of protein sequence/structure relationships in terms of structural segments, and formulate secondary structure prediction as a general Bayesian inference problem. A distinctive feature of our approach is the ability to develop explicit probabilistic models for alpha-helices, beta-strands, and other classes of secondary structure, incorporating experimentally and empirically observed aspects of protein structure such as helical capping signals, side chain correlations, and segment length distributions. Our model is Markovian in the segments, permitting efficient exact calculation of the posterior probability distribution over all possible segmentations of the sequence using dynamic programming. The optimal segmentation is computed and compared to a predictor based on marginal posterior modes, and the latter is shown to provide significant improvement in predictive accuracy. The marginalization procedure provides exact secondary structure probabilities at each sequence position, which are shown to be reliable estimates of prediction uncertainty. We apply this model to a database of 452 nonhomologous structures, achieving accuracies as high as the best currently available methods. We conclude by discussing an extension of this framework to model nonlocal interactions in protein structures, providing a possible direction for future improvements in secondary structure prediction accuracy.

Bioinformatics | 1990

Improved sensitivity of biological sequence database searches.

Douglas L. Brutlag; Jean-Pierre Dautricourt; S. Maulik; J. Relph

We have increased the sensitivity of DNA and protein sequence database searches by allowing similar but non-identical amino acids or nucleotides to match. In addition, one can match k-tuples or words instead of matching individual residues in order to speed the search. A matching matrix species which k-tuples match each other. The matching matrix can be calculated from a similarity matrix of amino acids and a threshold of similarity required for matching. This permits amino acid similarity matrices or replacement matrices (PAM matrices) to be used in the first step of a sequence comparison rather than in a secondary scoring phase. The concept of matching non-identical k-tuples also increases the power of DNA database searches. For example, a matrix that specifies that any 3-tuple in a DNA sequence can match any other 3-tuple encoding the same amino acid permits a DNA database search using a DNA query sequence for regions that would encode a similar amino acid sequence.

Cell | 1977

Synthesis of Hybrid Bacterial Plasmids Containing Highly Repeated Satellite DNA

Douglas L. Brutlag; Kirk E. Fry; Timothy Knight Nelson; Peggy Hung

Hybrid plasmid molecules containing tandemly repeated Drosophila satellite DNA were constructed using a modification of the (dA)-(dT) homopolymer procedure of Lobban and Kaiser (1973). Recombinant plasmids recovered after transformation of recA bacteria contained 10% of the amount of satellite DNA present in the transforming molecules. The cloned plasmids were not homogenous in size. Recombinant plasmids isolated from a single colony contained populations of circular molecules which varied both in the length of the satellite region and in the poly(dA)-(dt) regions linking satellite and vector. While subcloning reduced the heterogeneity of these plasmid populations, continued cell growth caused further variations in the size of the repeated regions. Two different simple sequence satellites of Drosophila melanogaster (1.672 and 1.705 g/cm3) were unstable in both recA and recBC hosts and in both pSC101 and pCR1 vectors. We propose that this recA-independent instability of tandemly repeated sequences is due to unequal intramolecular recombination events in replicating DNA molecules, a mechanism analogous to sister chromatid exchange in eucaryotes.

Explore More