Clark M. Mobarry | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Clark M. Mobarry is active.

Explore More

Publication

Featured researches published by Clark M. Mobarry.

Proceedings of the National Academy of Sciences of the United States of America | 2004

Whole-genome shotgun assembly and comparison of human genome assemblies

Sorin Istrail; Granger Sutton; Liliana Florea; Aaron L. Halpern; Clark M. Mobarry; Ross A. Lippert; Brian Walenz; Hagit Shatkay; Ian M. Dew; Jason R. Miller; Michael Flanigan; Nathan Edwards; Randall Bolanos; Daniel Fasulo; Bjarni V. Halldórsson; Sridhar Hannenhalli; Russell Turner; Shibu Yooseph; Fu Lu; Deborah Nusskern; Bixiong Shue; Xiangqun Holly Zheng; Fei Zhong; Arthur L. Delcher; Daniel H. Huson; Saul Kravitz; Laurent Mouchard; Knut Reinert; Karin A. Remington; Andrew G. Clark

We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304–1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860–921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.

Journal of Computational Biology | 2005

A Space-Efficient Construction of the Burrows–Wheeler Transform for Genomic Data

Ross A. Lippert; Clark M. Mobarry; Brian Walenz

Algorithms for exact string matching have substantial application in computational biology. Time-efficient data structures which support a variety of exact string matching queries, such as the suffix tree and the suffix array, have been applied to such problems. As sequence databases grow, more space-efficient approaches to exact matching are becoming more important. One such data structure, the compressed suffix array (CSA), based on the Burrows-Wheeler transform, has been shown to require memory which is nearly equal to the memory requirements of the original database, while supporting common sorts of query problems time efficiently. However, building a CSA from a sequence in efficient space and time is challenging. In 2002, the first space-efficient CSA construction algorithm was presented. That implementation used (1+2 log2 |summation|)(1+epsilon) bits per character (where epsilon is a small fraction). The construction algorithm ran in as much as twice that space, in O(| summation|n log(n)) time. We have created an implementation which can also achieve these asymptotic bounds, but for small alphabets, and only uses 1/2 (1+|summation|)(1+epsilon) bits per character, a factor of 2 less space for nucleotide alphabets. We present time and space results for the CSA construction and querying of our implementation on publicly available genome data which demonstrate the practicality of this approach.

research in computational molecular biology | 2004

Finding anchors for genomic sequence comparison

Ross A. Lippert; Xiaoyue Zhao; Liliana Florea; Clark M. Mobarry; Sorin Istrail

Journal of Computational Biology | 2005

Finding anchors for genomic sequence comparison.

Ross A. Lippert; Xiaoyue Zhao; Liliana Florea; Clark M. Mobarry; Sorin Istrail

Journal of Computational Biology | 2004

ThurGood: Evaluating Assembly-to-Assembly Mapping

Hagit Shatkay; Jason R. Miller; Clark M. Mobarry; Michael Flanigan; Shibu Yooseph; Granger Sutton

-score, from which noise and repeat filtering conditions are explored. Dynamic programming-based chaining algorithms are also evaluated as context-based filters. We apply the methods described here to the comparison of two progressive assemblies of the human genome, NCBI build 28 and build 34 http://genome.ucsc.edu), and show that a significant portion of the two genomes can be found in selected exact matches, with very limited amount of sequence duplication.

Science | 2000

A whole-genome assembly of Drosophila

Eugene W. Myers; Granger Sutton; Arthur L. Delcher; Ian M. Dew; Dan P. Fasulo; Michael Flanigan; Saul Kravitz; Clark M. Mobarry; Knut Reinert; Karin A. Remington; Eric L. Anson; Randall Bolanos; Hui Hsien Chou; Catherine Jordan; Aaron L. Halpern; Stefano Lonardi; Ellen M. Beasley; Rhonda Brandon; Lin Chen; Patrick Dunn; Zhongwu Lai; Yong Liang; Deborah Nusskern; Ming Zhan; Qing Zhang; Xiangqun Zheng; Gerald M. Rubin; Mark D. Adams; J. Craig Venter

Recent sequencing of the human and other mammalian genomes has brought about the necessity to align them, to identify and characterize their commonalities and differences. Programs that align whole genomes generally use a seed-and-extend technique, starting from exact or near-exact matches and selecting a reliable subset of these, called anchors, and then filling in the remaining portions between the anchors using a combination of local and global alignment algorithms, but their choices for the parameters so far have been primarily heuristic. We present a statistical framework and practical methods for selecting a set of matches that is both sensitive and specific and can constitute a reliable set of anchors for a one-to-one mapping of two genomes from which a whole-genome alignment can be built. Starting from exact matches, we introduce a novel per-base repeat annotation, the Z-score, from which noise and repeat filtering conditions are explored. Dynamic programming-based chaining algorithms are also evaluated as context-based filters. We apply the methods described here to the comparison of two progressive assemblies of the human genome, NCBI build 28 and build 34 (www.genome.ucsc.edu), and show that a significant portion of the two genomes can be found in selected exact matches, with very limited amount of sequence duplication.

Genome Research | 2005

Gene and alternative splicing annotation with AIR

Liliana Florea; Valentina Di Francesco; Jason R. Miller; Russell Turner; Alison Yao; Michael Harris; Brian Walenz; Clark M. Mobarry; Gennady V. Merkulov; Rosane Charlab; Ian M. Dew; Zuoming Deng; Sorin Istrail; Peter Li; Granger Sutton

The alignment and mapping of large genomic sequences is the focus of much recent research. However, relatively little has been done so far about testing and validating alignment methods. We introduce criteria and new tools we have developed for alignment evaluation. These tools have already proved useful in the evaluation and ranking of several methods for assembly-to-assembly mapping, which were recently used to map multiple versions of the human genome to each other (Istrail et aL, 2004).

intelligent systems in molecular biology | 2001

Design of a compartmentalized shotgun assembler for the human genome.

Daniel H. Huson; Knut Reinert; Saul Kravitz; Karin A. Remington; Arthur L. Delcher; Ian M. Dew; Michael Flanigan; Aaron L. Halpern; Zhongwu Lai; Clark M. Mobarry; Granger Sutton; Eugene W. Myers

Archive | 2000