Is this you? Create Your Porfile

Jeremy Buhler

Washington University in St. Louis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeremy Buhler is active.

Explore More

Publication

Featured researches published by Jeremy Buhler.

research in computational molecular biology | 2001

Finding motifs using random projections

Jeremy Buhler; Martin Tompa

Pevzner and Sze [23] considered a precise version of the motif discovery problem and simultaneously issued an algorithmic challenge: find a motif M of length 15, where each planted instance differs from M in 4 positions. Whereas previous algorithms all failed to solve this (15,4)-motif problem. Pevzner and Sze introduced algorithms that succeeded. However, their algorithms failed to solve the considerably more difficult (14,4)-, (16,5)-, and (18,6)-motif problems. We introduce a novel motif discovery algorithm based on the use of random projections of the inputs substrings. Experiments on simulated data demonstrate that this algorithm performs better than existing algorithms and, in particular, typically solves the difficult (14,4)-, (16,5)-, and (18,6)-motif problems quite efficiently. A probabilistic estimate shows that the small values of d for which the algorithm fails to recover the planted (l, d)-motif are in all likelihood inherently impossible to solve. We also present experimental results on realistic biological data by identifying ribosome binding sites in prokaryotes as well as a number of known transcriptional regulatory motifs in eukaryotes.

Bioinformatics | 2001

Efficient large-scale sequence comparison by locality-sensitive hashing

Jeremy Buhler

MOTIVATION Comparison of multimegabase genomic DNA sequences is a popular technique for finding and annotating conserved genome features. Performing such comparisons entails finding many short local alignments between sequences up to tens of megabases in length. To process such long sequences efficiently, existing algorithms find alignments by expanding around short runs of matching bases with no substitutions or other differences. Unfortunately, exact matches that are short enough to occur often in significant alignments also occur frequently by chance in the background sequence. Thus, these algorithms must trade off between efficiency and sensitivity to features without long exact matches. RESULTS We introduce a new algorithm, LSH-ALL-PAIRS, to find ungapped local alignments in genomic sequence with up to a specified fraction of substitutions. The length and substitution rate of these alignments can be chosen so that they appear frequently in significant similarities yet still remain rare in the background sequence. The algorithm finds ungapped alignments efficiently using a randomized search technique, locality-sensitive hashing. We have found LSH-ALL-PAIRS to be both efficient and sensitive for finding local similarities with as little as 63% identity in mammalian genomic sequences up to tens of megabases in length

CBE- Life Sciences Education | 2010

The Genomics Education Partnership: Successful Integration of Research into Laboratory Classes at a Diverse Group of Undergraduate Institutions

Christopher D. Shaffer; Consuelo J. Alvarez; Cheryl Bailey; Daron C. Barnard; Satish C. Bhalla; Chitra Chandrasekaran; Vidya Chandrasekaran; Hui-Min Chung; Douglas R Dorer; Chunguang Du; Todd T. Eckdahl; Jeff L Poet; Donald Frohlich; Anya Goodman; Yuying Gosser; Charles Hauser; Laura L. Mays Hoopes; Diana Johnson; Christopher J. Jones; Marian Kaehler; Nighat P. Kokan; Olga R Kopp; Gary Kuleck; Gerard P. McNeil; Robert Moss; Jennifer L Myka; Alexis Nagengast; Robert W. Morris; Paul Overvoorde; Elizabeth Shoop

Genomics is not only essential for students to understand biology but also provides unprecedented opportunities for undergraduate research. The goal of the Genomics Education Partnership (GEP), a collaboration between a growing number of colleges and universities around the country and the Department of Biology and Genome Center of Washington University in St. Louis, is to provide such research opportunities. Using a versatile curriculum that has been adapted to many different class settings, GEP undergraduates undertake projects to bring draft-quality genomic sequence up to high quality and/or participate in the annotation of these sequences. GEP undergraduates have improved more than 2 million bases of draft genomic sequence from several species of Drosophila and have produced hundreds of gene models using evidence-based manual annotation. Students appreciate their ability to make a contribution to ongoing research, and report increased independence and a more active learning approach after participation in GEP projects. They show knowledge gains on pre- and postcourse quizzes about genes and genomes and in bioinformatic analysis. Participating faculty also report professional gains, increased access to genomics-related technology, and an overall positive experience. We have found that using a genomics research project as the core of a laboratory course is rewarding for both faculty and students.

application-specific systems, architectures, and processors | 2004

Biosequence similarity search on the Mercury system

Praveen Krishnamurthy; Jeremy Buhler; Roger D. Chamberlain; Mark A. Franklin; M. Gyang; Joseph M. Lancaster

Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described.

Science | 2008

Genomics Education Partnership

David Lopatto; Consuelo J. Alvarez; Daron C. Barnard; Chitra Chandrasekaran; Hui-Min Chung; Chunguang Du; Todd T. Eckdahl; Anya Goodman; Charles Hauser; Christopher J. Jones; Olga R Kopp; Gary Kuleck; Gerard P. McNeil; Robert W. Morris; J. L. Myka; Alexis Nagengast; Paul Overvoorde; Jeffrey L. Poet; Kelynne E. Reed; G. Regisford; Dennis Revie; Anne G. Rosenwald; Kenneth Saville; Mary Shaw; Gary R. Skuse; Christopher D. Smith; Mary A. Smith; Mary Spratt; Joyce Stamm; Jeffrey S. Thompson

The Genomics Education Partnership offers an inclusive model for undergraduate research experiences, with students pooling their work to contribute to international databases.

Bioinformatics | 2005

Operon prediction without a training set

Benjamin P. Westover; Jeremy Buhler; Justin L. Sonnenburg; Jeffrey I. Gordon

MOTIVATION Annotation of operons in a bacterial genome is an important step in determining an organisms transcriptional regulatory program. While extensive studies of operon structure have been carried out in a few species such as Escherichia coli, fewer resources exist to inform operon prediction in newly sequenced genomes. In particular, many extant operon finders require a large body of training examples to learn the properties of operons in the target organism. For newly sequenced genomes, such examples are generally not available; moreover, a model of operons trained on one species may not reflect the properties of other, distantly related organisms. We encountered these issues in the course of predicting operons in the genome of Bacteroides thetaiotaomicron (B.theta), a common anaerobe that is a prominent component of the normal adult human intestinal microbial community. RESULTS We describe an operon predictor designed to work without extensive training data. We rely on a small set of a priori assumptions about the properties of the genome being annotated that permit estimation of the probability that two adjacent genes lie in a common operon. Predictions integrate several sources of information, including intergenic distance, common functional annotation and a novel formulation of conserved gene order. We validate our predictor both on the known operons of E.coli and on the genome of B.theta, using expression data to evaluate our predictions in the latter.

ACM Transactions on Reconfigurable Technology and Systems | 2008

Mercury BLASTP: Accelerating Protein Sequence Alignment

Arpith C. Jacob; Joseph M. Lancaster; Jeremy Buhler; Brandon Harris; Roger D. Chamberlain

Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this article, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11--15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results.

Microprocessors and Microsystems | 2009

Acceleration of ungapped extension in Mercury BLAST

Joseph M. Lancaster; Jeremy Buhler; Roger D. Chamberlain

The amount of biosequence data being produced each year is growing exponentially. Extracting useful information from this massive amount of data efficiently is becoming an increasingly difficult task. There are many available software tools that molecular biologists use for comparing genomic data. This paper focuses on accelerating the most widely used such tool, BLAST. Mercury BLAST takes a streaming approach to the BLAST computation by off loading the performance-critical sections to specialized hardware. This hardware is then used in combination with the processor of the host system to deliver BLAST results in a fraction of the time of the general-purpose processor alone.This paper presents the design of the ungapped extension stage of Mercury BLAST. The architecture of the ungapped extension stage is described along with the context of this stage within the Mercury BLAST system. The design is compact and runs at 100 MHz on available FPGAs, making it an effective and powerful component for accelerating biosequence comparisons. The performance of this stage is 25× that of the standard software distribution, yielding close to 50× performance improvement on the complete BLAST application. The sensitivity is essentially equivalent to that of the standard distribution.

Genome Biology | 2006

Comparison of dot chromosome sequences from D. melanogaster and D. virilis reveals an enrichment of DNA transposon sequences in heterochromatic domains

Elizabeth Slawson; Christopher D. Shaffer; Colin D Malone; Wilson Leung; Elmer Kellmann; Rachel B Shevchek; Carolyn A Craig; Seth M Bloom; James W Bogenpohl; James Dee; Emiko Ta Morimoto; Jenny Myoung; Andrew S. Nett; Fatih Ozsolak; Mindy E Tittiger; Andrea Zeug; Mary Lou Pardue; Jeremy Buhler; Elaine R. Mardis; Sarah C. R. Elgin

BackgroundChromosome four of Drosophila melanogaster, known as the dot chromosome, is largely heterochromatic, as shown by immunofluorescent staining with antibodies to heterochromatin protein 1 (HP1) and histone H3K9me. In contrast, the absence of HP1 and H3K9me from the dot chromosome in D. virilis suggests that this region is euchromatic. D. virilis diverged from D. melanogaster 40 to 60 million years ago.ResultsHere we describe finished sequencing and analysis of 11 fosmids hybridizing to the dot chromosome of D. virilis (372,650 base-pairs) and seven fosmids from major euchromatic chromosome arms (273,110 base-pairs). Most genes from the dot chromosome of D. melanogaster remain on the dot chromosome in D. virilis, but many inversions have occurred. The dot chromosomes of both species are similar to the major chromosome arms in gene density and coding density, but the dot chromosome genes of both species have larger introns. The D. virilis dot chromosome fosmids have a high repeat density (22.8%), similar to homologous regions of D. melanogaster (26.5%). There are, however, major differences in the representation of repetitive elements. Remnants of DNA transposons make up only 6.3% of the D. virilis dot chromosome fosmids, but 18.4% of the homologous regions from D. melanogaster; DINE-1 and 1360 elements are particularly enriched in D. melanogaster. Euchromatic domains on the major chromosomes in both species have very few DNA transposons (less than 0.4 %).ConclusionCombining these results with recent findings about RNAi, we suggest that specific repetitive elements, as well as density, play a role in determining higher-order chromatin packaging.

international conference on supercomputing | 2006

Accelerator design for protein sequence HMM search

Rahul P. Maddimsetty; Jeremy Buhler; Roger D. Chamberlain; Mark A. Franklin; Brandon Harris

Profile Hidden Markov models (HMMs) are a powerful approach to describing biologically significant functional units, or motifs, in protein sequences. Entire databases of such models are regularly compared to large collections of proteins to recognize motifs in them. Exponentially increasing rates of genome sequencing have caused both protein and model databases to explode in size, placing an ever-increasing computational burden on users of these systems.Here, we describe an accelerated search system that exploits parallelism in a number of ways. First, the application is functionally decomposed into a pipeline, with distinct compute resources executing each pipeline stage. Second, the first pipeline stage is deployed on a systolic array, which yields significant fine-grained parallelism. Third, for some instantiations of the design, parallel copies of the first pipeline stage are used, further increasing the level of coarse-grained parallelism.A naïve parallelization of the first stage computation has serious repercussions for the sensitivity of the search. We present a pair of remedies to this dilemma and quantify the regions of interest within which each approach is most effective. Analytic performance models are used to assess the overall speedup that can be attained relative to a single-processor software solution. Performance improvements of 1 to 2 orders of magnitude are predicted.

Explore More