Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael Roberts is active.

Publication


Featured researches published by Michael Roberts.


Genome Biology | 2009

A Whole-Genome Assembly of the Domestic Cow, Bos taurus

Aleksey V. Zimin; Arthur L. Delcher; Liliana Florea; David R. Kelley; Michael C. Schatz; Daniela Puiu; Finnian Hanrahan; Geo Pertea; Curtis P. Van Tassell; Tad S. Sonstegard; Guillaume Marçais; Michael Roberts; Poorani Subramanian; James A. Yorke

BackgroundThe genome of the domestic cow, Bos taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods.ResultsWe have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion base pairs that has multiple improvements over previous assemblies: it is more complete, covering more of the genome; thousands of gaps have been closed; many erroneous inversions, deletions, and translocations have been corrected; and thousands of single-nucleotide errors have been corrected. Our evaluation using independent metrics demonstrates that the resulting assembly is substantially more accurate and complete than alternative versions.ConclusionsBy using independent mapping data and conserved synteny between the cow and human genomes, we were able to construct an assembly with excellent large-scale contiguity in which a large majority (approximately 91%) of the genome has been placed onto the 30 B. taurus chromosomes. We constructed a new cow-human synteny map that expands upon previous maps. We also identified for the first time a portion of the B. taurus Y chromosome.


Genome Research | 2012

GAGE: A critical evaluation of genome assemblies and assembly algorithms

Adam M. Phillippy; Aleksey V. Zimin; Daniela Puiu; Tanja Magoc; Sergey Koren; Todd J. Treangen; Michael C. Schatz; Arthur L. Delcher; Michael Roberts; Guillaume Marçais; Mihai Pop; James A. Yorke

New sequencing technology has dramatically altered the landscape of whole-genome sequencing, allowing scientists to initiate numerous projects to decode the genomes of previously unsequenced organisms. The lowest-cost technology can generate deep coverage of most species, including mammals, in just a few days. The sequence data generated by one of these projects consist of millions or billions of short DNA sequences (reads) that range from 50 to 150 nt in length. These sequences must then be assembled de novo before most genome analyses can begin. Unfortunately, genome assembly remains a very difficult problem, made more difficult by shorter reads and unreliable long-range linking information. In this study, we evaluated several of the leading de novo assembly algorithms on four different short-read data sets, all generated by Illumina sequencers. Our results describe the relative performance of the different assemblers as well as other significant differences in assembly difficulty that appear to be inherent in the genomes themselves. Three overarching conclusions are apparent: first, that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome; second, that the degree of contiguity of an assembly varies enormously among different assemblers and different genomes; and third, that the correctness of an assembly also varies widely and is not well correlated with statistics on contiguity. To enable others to replicate our results, all of our data and methods are freely available, as are all assemblers used in this study.


Genome Biology | 2014

Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies

David B. Neale; Jill L. Wegrzyn; Kristian A. Stevens; Aleksey V. Zimin; Daniela Puiu; Marc W. Crepeau; Charis Cardeno; Maxim Koriabine; Ann Holtz-Morris; John D. Liechty; Pedro J. Martínez-García; Hans A. Vasquez-Gross; Brian Y. Lin; Jacob J. Zieve; William M. Dougherty; Sara Fuentes-Soriano; Le Shin Wu; Don Gilbert; Guillaume Marçais; Michael Roberts; Carson Holt; Mark Yandell; John M. Davis; Katherine E. Smith; Jeffrey F. D. Dean; W. Walter Lorenz; Ross W. Whetten; Ronald R. Sederoff; Nicholas Wheeler; Patrick E. McGuire

BackgroundThe size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.ResultsWe develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.ConclusionsIn addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.


Genetics | 2014

Sequencing and assembly of the 22-gb loblolly pine genome.

Aleksey V. Zimin; Kristian A. Stevens; Marc W. Crepeau; Ann Holtz-Morris; Maxim Koriabine; Guillaume Marçais; Daniela Puiu; Michael Roberts; Jill L. Wegrzyn; Pieter J. de Jong; David B. Neale; James A. Yorke; Charles H. Langley

Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.


Biology Direct | 2014

A new rhesus macaque assembly and annotation for next-generation sequencing analyses

Aleksey V. Zimin; Adam Cornish; Mnirnal D Maudhoo; Robert M Gibbs; Xiongfei Zhang; Sanjit Pandey; Daniel Meehan; Kristin Wipfler; Steven E. Bosinger; Zachary P. Johnson; Gregory K. Tharp; Guillaume Marçais; Michael Roberts; Betsy Ferguson; Howard S. Fox; Todd J. Treangen; James A. Yorke; Robert B. Norgren

BackgroundThe rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.ResultsWe report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.ConclusionsThe MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates.ReviewersThis article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.


PLOS ONE | 2012

Mis-Assembled “Segmental Duplications” in Two Versions of the Bos taurus Genome

Aleksey V. Zimin; David R. Kelley; Michael Roberts; Guillaume Marçais; James A. Yorke

We analyzed the whole genome sequence coverage in two versions of the Bos taurus genome and identified all regions longer than five kilobases (Kbp) that are duplicated within chromosomes with >99% sequence fidelity in both copies. We call these regions High Fidelity Duplications (HFDs). The two assemblies were Btau 4.2, produced by the Human Genome Sequencing Center at Baylor College of Medicine, and UMD Bos taurus 3.1 (UMD 3.1), produced by our group at the University of Maryland. We found that Btau 4.2 has a far greater number of HFDs, 3111 versus only 69 in UMD 3.1. Read coverage analysis shows that 39 million base pairs (Mbp) of sequence in HFDs in Btau 4.2 appear to be a result of a mis-assembly and therefore cannot be qualified as segmental duplications. UMD 3.1 has only 0.41 Mbp of sequence in HFDs that are due to a mis-assembly.


PLOS ONE | 2008

Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps

Michael Roberts; Aleksey V. Zimin; Wayne B. Hayes; Brian R. Hunt; Cevat Ustun; James Robert White; Paul Havlak; James A. Yorke

The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our “reliable-overlap” algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps.


Bioinformatics | 2013

The MaSuRCA genome assembler

Aleksey V. Zimin; Guillaume Marçais; Daniela Puiu; Michael Roberts; James A. Yorke


Bioinformatics | 2004

Reducing storage requirements for biological sequence comparison

Michael Roberts; Wayne B. Hayes; Brian R. Hunt; Stephen M. Mount; James A. Yorke


Bioinformatics | 2008

Figaro: a novel statistical method for vector sequence removal

James Robert White; Michael Roberts; James A. Yorke; Mihai Pop

Collaboration


Dive into the Michael Roberts's collaboration.

Top Co-Authors

Avatar

James A. Yorke

Johns Hopkins University School of Medicine

View shared research outputs
Top Co-Authors

Avatar

Daniela Puiu

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar

Jill L. Wegrzyn

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wayne B. Hayes

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Adam M. Phillippy

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Ann Holtz-Morris

Children's Hospital Oakland Research Institute

View shared research outputs
Top Co-Authors

Avatar

Arthur L. Delcher

Loyola University Maryland

View shared research outputs
Researchain Logo
Decentralizing Knowledge