Is this you? Create Your Porfile

Arthur L. Delcher

University of Maryland, College Park

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arthur L. Delcher is active.

Explore More

Publication

Featured researches published by Arthur L. Delcher.

Bioinformatics | 2007

Identifying bacterial genes and endosymbiont DNA with Glimmer

Arthur L. Delcher; Kirsten A. Bratke; Edwin C. Powers

MOTIVATION The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archaea and viruses representing hundreds of species. We describe several major changes to the Glimmer system, including improved methods for identifying both coding regions and start codons. We also describe a new module of Glimmer that can distinguish host and endosymbiont DNA. This module was developed in response to the discovery that eukaryotic genome sequencing projects sometimes inadvertently capture the DNA of intracellular bacteria living in the host. RESULTS The new methods dramatically reduce the rate of false-positive predictions, while maintaining Glimmers 99% sensitivity rate at detecting genes in most species, and they find substantially more correct start sites, as measured by comparisons to known and well-curated genes. We show that our interpolated Markov model (IMM) DNA discriminator correctly separated 99% of the sequences in a recent genome project that produced a mixture of sequences from the bacterium Prochloron didemni and its sea squirt host, Lissoclinum patella. AVAILABILITY Glimmer is OSI Certified Open Source and available at http://cbcb.umd.edu/software/glimmer.

Nature | 2009

The genome of the blood fluke Schistosoma mansoni

Matthew Berriman; Brian J. Haas; Philip T. LoVerde; R. Alan Wilson; Gary P. Dillon; Gustavo C. Cerqueira; Susan T. Mashiyama; Bissan Al-Lazikani; Luiza F. Andrade; Peter D. Ashton; Martin Aslett; Daniella Castanheira Bartholomeu; Gaëlle Blandin; Conor R. Caffrey; Avril Coghlan; Richard M. R. Coulson; Tim A. Day; Arthur L. Delcher; Ricardo DeMarco; Appoliniare Djikeng; Tina Eyre; John Gamble; Elodie Ghedin; Yong-Hong Gu; Christiane Hertz-Fowler; Hirohisha Hirai; Yuriko Hirai; Robin Houston; Alasdair Ivens; David A. Johnston

Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.

Nature | 2008

The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus)

Ray Ming; Shaobin Hou; Yun Feng; Qingyi Yu; Alexandre Dionne-Laporte; Jimmy H. Saw; Pavel Senin; Wei Wang; Benjamin V. Ly; Kanako L. T. Lewis; Lu Feng; Meghan R. Jones; Rachel L. Skelton; Jan E. Murray; Cuixia Chen; Wubin Qian; Junguo Shen; Peng Du; Moriah Eustice; Eric J. Tong; Haibao Tang; Eric Lyons; Robert E. Paull; Todd P. Michael; Kerr Wall; Danny W. Rice; Henrik H. Albert; Ming Li Wang; Yun J. Zhu; Michael C. Schatz

Papaya, a fruit crop cultivated in tropical and subtropical regions, is known for its nutritional benefits and medicinal applications. Here we report a 3× draft genome sequence of ‘SunUp’ papaya, the first commercial virus-resistant transgenic fruit tree to be sequenced. The papaya genome is three times the size of the Arabidopsis genome, but contains fewer genes, including significantly fewer disease-resistance gene analogues. Comparison of the five sequenced genomes suggests a minimal angiosperm gene set of 13,311. A lack of recent genome duplication, atypical of other angiosperm genomes sequenced so far, may account for the smaller papaya gene number in most functional groups. Nonetheless, striking amplifications in gene number within particular functional groups suggest roles in the evolution of tree-like habit, deposition and remobilization of starch reserves, attraction of seed dispersal agents, and adaptation to tropical daylengths. Transgenesis at three locations is closely associated with chloroplast insertions into the nuclear genome, and with topoisomerase I recognition sites. Papaya offers numerous advantages as a system for fruit-tree functional genomics, and this draft genome sequence provides the foundation for revealing the basis of Carica’s distinguishing morpho-physiological, medicinal and nutritional properties.

Genome Biology | 2009

A Whole-Genome Assembly of the Domestic Cow, Bos taurus

Aleksey V. Zimin; Arthur L. Delcher; Liliana Florea; David R. Kelley; Michael C. Schatz; Daniela Puiu; Finnian Hanrahan; Geo Pertea; Curtis P. Van Tassell; Tad S. Sonstegard; Guillaume Marçais; Michael Roberts; Poorani Subramanian; James A. Yorke

BackgroundThe genome of the domestic cow, Bos taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods.ResultsWe have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion base pairs that has multiple improvements over previous assemblies: it is more complete, covering more of the genome; thousands of gaps have been closed; many erroneous inversions, deletions, and translocations have been corrected; and thousands of single-nucleotide errors have been corrected. Our evaluation using independent metrics demonstrates that the resulting assembly is substantially more accurate and complete than alternative versions.ConclusionsBy using independent mapping data and conserved synteny between the cow and human genomes, we were able to construct an assembly with excellent large-scale contiguity in which a large majority (approximately 91%) of the genome has been placed onto the 30 B. taurus chromosomes. We constructed a new cow-human synteny map that expands upon previous maps. We also identified for the first time a portion of the B. taurus Y chromosome.

Nature Genetics | 2011

The genome of woodland strawberry ( Fragaria vesca )

Vladimir Shulaev; Daniel J. Sargent; Ross N. Crowhurst; Todd C. Mockler; Otto Folkerts; Arthur L. Delcher; Pankaj Jaiswal; Keithanne Mockaitis; Aaron Liston; Shrinivasrao P. Mane; Paul D. Burns; Thomas M. Davis; Janet P. Slovin; Nahla Bassil; Roger P. Hellens; Clive Evans; Tim Harkins; Chinnappa D. Kodira; Brian Desany; Oswald Crasta; Roderick V. Jensen; Andrew C. Allan; Todd P. Michael; João C. Setubal; Jean Marc Celton; Kelly P. Williams; Sarah H. Holt; Juan Jairo Ruiz Rojas; Mithu Chatterjee; Bo Liu

The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted.

Journal of Bacteriology | 2002

Whole-Genome Comparison of Mycobacterium tuberculosis Clinical and Laboratory Strains

Robert D. Fleischmann; D. Alland; Jonathan A. Eisen; L. Carpenter; Owen White; Jeremy Peterson; Robert T. DeBoy; Robert J. Dodson; Michelle L. Gwinn; Daniel H. Haft; Erin Hickey; James F. Kolonay; William C. Nelson; Lowell Umayam; Maria D. Ermolaeva; Arthur L. Delcher; Terry Utterback; Janice Weidman; Hoda Khouri; John Gill; A. Mikula; W. Bishai; W. R. Jacobs; Venter Jc; Claire M. Fraser

Virulence and immunity are poorly understood in Mycobacterium tuberculosis. We sequenced the complete genome of the M. tuberculosis clinical strain CDC1551 and performed a whole-genome comparison with the laboratory strain H37Rv in order to identify polymorphic sequences with potential relevance to disease pathogenesis, immunity, and evolution. We found large-sequence and single-nucleotide polymorphisms in numerous genes. Polymorphic loci included a phospholipase C, a membrane lipoprotein, members of an adenylate cyclase gene family, and members of the PE/PPE gene family, some of which have been implicated in virulence or the host immune response. Several gene families, including the PE/PPE gene family, also had significantly higher synonymous and nonsynonymous substitution frequencies compared to the genome as a whole. We tested a large sample of M. tuberculosis clinical isolates for a subset of the large-sequence and single-nucleotide polymorphisms and found widespread genetic variability at many of these loci. We performed phylogenetic and epidemiological analysis to investigate the evolutionary relationships among isolates and the origins of specific polymorphic loci. A number of these polymorphisms appear to have occurred multiple times as independent events, suggesting that these changes may be under selective pressure. Together, these results demonstrate that polymorphisms among M. tuberculosis strains are more extensive than initially anticipated, and genetic variation may have an important role in disease pathogenesis and immunity.

Science | 2007

Draft Genome of the Filarial Nematode Parasite Brugia malayi

Elodie Ghedin; Shiliang Wang; David J. Spiro; Elisabet Caler; Qi Zhao; Jonathan Crabtree; Jonathan E. Allen; Arthur L. Delcher; David B. Guiliano; Diego Miranda-Saavedra; Samuel V. Angiuoli; Todd Creasy; Paolo Amedeo; Brian J. Haas; Najib M. El-Sayed; Jennifer R. Wortman; Tamara Feldblyum; Luke J. Tallon; Michael C. Schatz; Martin Shumway; Hean Koo; Seth Schobel; Mihaela Pertea; Mihai Pop; Owen White; Geoffrey J. Barton; Clotilde K. S. Carlow; Michael J. Crawford; Jennifer Daub; Matthew W. Dimmic

Parasitic nematodes that cause elephantiasis and river blindness threaten hundreds of millions of people in the developing world. We have sequenced the ∼90 megabase (Mb) genome of the human filarial parasite Brugia malayi and predict ∼11,500 protein coding genes in 71 Mb of robustly assembled sequence. Comparative analysis with the free-living, model nematode Caenorhabditis elegans revealed that, despite these genes having maintained little conservation of local synteny during ∼350 million years of evolution, they largely remain in linkage on chromosomal units. More than 100 conserved operons were identified. Analysis of the predicted proteome provides evidence for adaptations of B. malayi to niches in its human and vector hosts and insights into the molecular basis of a mutualistic relationship with its Wolbachia endosymbiont. These findings offer a foundation for rational drug design.

Bioinformatics | 2008

Aggressive assembly of pyrosequencing reads with mates

Jason Miller; Arthur L. Delcher; Sergey Koren; Eli Venter; Brian Walenz; Anushka Brownley; Justin Johnson; Kelvin Li; Clark M. Mobarry; Granger Sutton

Motivation: DNA sequence reads from Sanger and pyrosequencing platforms differ in cost, accuracy, typical coverage, average read length and the variety of available paired-end protocols. Both read types can complement one another in a ‘hybrid’ approach to whole-genome shotgun sequencing projects, but assembly software must be modified to accommodate their different characteristics. This is true even of pyrosequencing mated and unmated read combinations. Without special modifications, assemblers tuned for homogeneous sequence data may perform poorly on hybrid data. Results: Celera Assembler was modified for combinations of ABI 3730 and 454 FLX reads. The revised pipeline called CABOG (Celera Assembler with the Best Overlap Graph) is robust to homopolymer run length uncertainty, high read coverage and heterogeneous read lengths. In tests on four genomes, it generated the longest contigs among all assemblers tested. It exploited the mate constraints provided by paired-end reads from either platform to build larger contigs and scaffolds, which were validated by comparison to a finished reference sequence. A low rate of contig mis-assembly was detected in some CABOG assemblies, but this was reduced in the presence of sufficient mate pair data. Availability: The software is freely available as open-source from http://wgs-assembler.sf.net under the GNU Public License. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Genome Research | 2010

Assembly of large genomes using second-generation sequencing

Michael C. Schatz; Arthur L. Delcher

Second-generation sequencing technology can now be used to sequence an entire human genome in a matter of days and at low cost. Sequence read lengths, initially very short, have rapidly increased since the technology first appeared, and we now are seeing a growing number of efforts to sequence large genomes de novo from these short reads. In this Perspective, we describe the issues associated with short-read assembly, the different types of data produced by second-gen sequencers, and the latest assembly algorithms designed for these data. We also review the genomes that have been assembled recently from short reads and make recommendations for sequencing strategies that will yield a high-quality assembly.

BMC Bioinformatics | 2007

High-throughput sequence alignment using Graphics Processing Units

Michael C. Schatz; Cole Trapnell; Arthur L. Delcher; Amitabh Varshney

BackgroundThe recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies.ResultsThis paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies.ConclusionMUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

Explore More