Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where James Gurtowski is active.

Publication


Featured researches published by James Gurtowski.


Genome Research | 2015

Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome

Sara Goodwin; James Gurtowski; S. Ethe-Sayers; P. Deshpande; Michael C. Schatz; William R. McCombie

Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available, and we used this for sequencing the Saccharomyces cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr specifically for Oxford Nanopore reads, because existing packages were incapable of assembling the long read lengths (5-50 kbp) at such high error rates (between ∼5% and 40% error). With this new method, we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: The contig N50 length is more than ten times greater than an Illumina-only assembly (678 kb versus 59.9 kbp) and has >99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly.


Nature Genetics | 2015

The pineapple genome and the evolution of CAM photosynthesis

Ray Ming; Robert VanBuren; Ching Man Wai; Haibao Tang; Michael C. Schatz; John E. Bowers; Eric Lyons; Ming Li Wang; Jung Chen; Eric Biggers; Jisen Zhang; Lixian Huang; Lingmao Zhang; Wenjing Miao; Jian Zhang; Zhangyao Ye; Chenyong Miao; Zhicong Lin; Hao Wang; Hongye Zhou; Won Cheol Yim; Henry D. Priest; Chunfang Zheng; Margaret R. Woodhouse; Patrick P. Edger; Romain Guyot; Hao Bo Guo; Hong Guo; Guangyong Zheng; Ratnesh Singh

Pineapple (Ananas comosus (L.) Merr.) is the most economically valuable crop possessing crassulacean acid metabolism (CAM), a photosynthetic carbon assimilation pathway with high water-use efficiency, and the second most important tropical fruit. We sequenced the genomes of pineapple varieties F153 and MD2 and a wild pineapple relative, Ananas bracteatus accession CB5. The pineapple genome has one fewer ancient whole-genome duplication event than sequenced grass genomes and a conserved karyotype with seven chromosomes from before the ρ duplication event. The pineapple lineage has transitioned from C3 photosynthesis to CAM, with CAM-related genes exhibiting a diel expression pattern in photosynthetic tissues. CAM pathway genes were enriched with cis-regulatory elements associated with the regulation of circadian clock genes, providing the first cis-regulatory link between CAM and circadian clock regulation. Pineapple CAM photosynthesis evolved by the reconfiguration of pathways in C3 plants, through the regulatory neofunctionalization of preexisting genes and not through the acquisition of neofunctionalized genes via whole-genome or tandem gene duplication.


Genome Biology | 2014

Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica

Michael C. Schatz; Lyza G. Maron; Joshua C. Stein; Alejandro Hernandez Wences; James Gurtowski; Eric Biggers; Hayan Lee; Melissa Kramer; Eric Antoniou; Elena Ghiban; Mark H. Wright; Jer-Ming Chia; Doreen Ware; Susan R. McCouch; W. Richard McCombie

BackgroundThe use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. When the genomes of different strains of a given organism are compared, whole genome resequencing data are typically aligned to an established reference sequence. However, when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate.ResultsHere, we use rice as a model to demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared using whole genome alignment to provide an unbiased assessment. Using this approach, we are able to accurately assess the ‘pan-genome’ of three divergent rice varieties and document several megabases of each genome absent in the other two.ConclusionsMany of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard reference-mapping approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, including the S5 hybrid sterility locus, the Sub1 submergence tolerance locus, the LRK gene cluster associated with improved yield, and the Pup1 cluster associated with phosphorus deficiency, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.


bioRxiv | 2014

Error correction and assembly complexity of single molecule sequencing reads

Hayan Lee; James Gurtowski; Shinjae Yoo; Shoshana Marcus; W. Richard McCombie; Michael C. Schatz

Third generation single molecule sequencing technology is poised to revolutionize genomics by enabling the sequencing of long, individual molecules of DNA and RNA. These technologies now routinely produce reads exceeding 5,000 basepairs, and can achieve reads as long as 50,000 basepairs. Here we evaluate the limits of single molecule sequencing by assessing the impact of long read sequencing in the assembly of the human genome and 25 other important genomes across the tree of life. From this, we develop a new data-driven model using support vector regression that can accurately predict assembly performance. We also present a novel hybrid error correction algorithm for long PacBio sequencing reads that uses pre-assembled Illumina sequences for the error correction. We apply it several prokaryotic and eukaryotic genomes, and show it can achieve near-perfect assemblies of small genomes (< 100Mbp) and substantially improved assemblies of larger ones. All source code and the assembly model are available open-source.


Bioinformatics | 2017

GenomeScope: fast reference-free genome profiling from short reads

Gregory W. Vurture; Fritz J. Sedlazeck; Maria Nattestad; Charles J. Underwood; Han Fang; James Gurtowski; Michael C. Schatz

Summary: GenomeScope is an open‐source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate and repeat content from unprocessed short reads. These features are essential for studying genome evolution, and help to choose parameters for downstream analysis. We demonstrate its accuracy on 324 simulated and 16 real datasets with a wide range in genome sizes, heterozygosity levels and error rates. Availability and Implementation: http://genomescope.org, https://github.com/schatzlab/genomescope.git. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


bioRxiv | 2015

Oxford Nanopore Sequencing and de novo Assembly of a Eukaryotic Genome

Sara Goodwin; James Gurtowski; Scott Ethe-Sayers; Panchajanya Deshpande; Michael C. Schatz; W. Richard McCombie

Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available that we used for sequencing the S. cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr (https://github.com/jgurtowski/nanocorr) specifically for Oxford Nanopore reads, as existing packages were incapable of assembling the long read lengths (5-50kbp) at such high error rate (between ∼5 and 40% error). With this new method we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: the contig N50 length is more than ten-times greater than an Illumina-only assembly (678kb versus 59.9kbp), and has greater than 99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly. Reviewer link to data http://schatzlab.cshl.edu/data/nanocorr/


Proceedings of the National Academy of Sciences of the United States of America | 2015

Genome and transcriptome of the regeneration-competent flatworm, Macrostomum lignano

Kaja A. Wasik; James Gurtowski; Xin Zhou; Olivia Mendivil Ramos; M Joaquina Delás; Giorgia Battistoni; Osama El Demerdash; Ilaria Falciatori; Dita B. Vizoso; Andrew D. Smith; Peter Ladurner; Lukas Schärer; W. Richard McCombie; Gregory J. Hannon; Michael C. Schatz

Significance The availability of high-quality genome and transcriptome assemblies is critical for enabling full exploitation of any model organism. Here we present genome and transcriptome assemblies for Macrostomum lignano, a free-living flatworm that can regenerate nearly its entire body following injury. The resources we present here will promote not only the studies of mechanisms of stem cell self-renewal, but also of regeneration and differentiation. The free-living flatworm, Macrostomum lignano has an impressive regenerative capacity. Following injury, it can regenerate almost an entirely new organism because of the presence of an abundant somatic stem cell population, the neoblasts. This set of unique properties makes many flatworms attractive organisms for studying the evolution of pathways involved in tissue self-renewal, cell-fate specification, and regeneration. The use of these organisms as models, however, is hampered by the lack of a well-assembled and annotated genome sequences, fundamental to modern genetic and molecular studies. Here we report the genomic sequence of M. lignano and an accompanying characterization of its transcriptome. The genome structure of M. lignano is remarkably complex, with ∼75% of its sequence being comprised of simple repeats and transposon sequences. This has made high-quality assembly from Illumina reads alone impossible (N50 = 222 bp). We therefore generated 130× coverage by long sequencing reads from the Pacific Biosciences platform to create a substantially improved assembly with an N50 of 64 Kbp. We complemented the reference genome with an assembled and annotated transcriptome, and used both of these datasets in combination to probe gene-expression patterns during regeneration, examining pathways important to stem cell function.


bioRxiv | 2016

Third-generation sequencing and the future of genomics

Hayan Lee; James Gurtowski; Shinjae Yoo; Maria Nattestad; Shoshana Marcus; Sara Goodwin; W. Richard McCombie; Michael C. Schatz

Third-generation long-range DNA sequencing and mapping technologies are creating a renaissance in high-quality genome sequencing. Unlike second-generation sequencing, which produces short reads a few hundred base-pairs long, third-generation single-molecule technologies generate over 10,000 bp reads or map over 100,000 bp molecules. We analyze how increased read lengths can be used to address longstanding problems in de novo genome assembly, structural variation analysis and haplotype phasing.


Current protocols in human genetics | 2012

Genotyping in the Cloud with Crossbow

James Gurtowski; Michael C. Schatz; Ben Langmead

Crossbow is a scalable, portable, and automatic cloud computing tool for identifying SNPs from high‐coverage, short‐read resequencing data. It is built on Apache Hadoop, an implementation of the MapReduce software framework. Hadoop allows Crossbow to distribute read alignment and SNP calling subtasks over a cluster of commodity computers. Two robust tools, Bowtie and SOAPsnp, implement the fundamental alignment and variant calling operations respectively, and have demonstrated capabilities within Crossbow of analyzing approximately one billion short reads per hour on a commodity Hadoop cluster with 320 cores. Through protocol examples, this unit will demonstrate the use of Crossbow for identifying variations in three different operating modes: on a Hadoop cluster, on a single computer, and on the Amazon Elastic MapReduce cloud computing service. Curr. Protoc. Bioinform. 39:15.3.1‐15.3.15.


bioRxiv | 2016

The DOE Systems Biology Knowledgebase (KBase)

Adam P. Arkin; Rick Stevens; Robert W. Cottingham; Sergei Maslov; Christopher S. Henry; Paramvir Dehal; Doreen Ware; Fernando Perez; Nomi L. Harris; Shane Canon; Michael W Sneddon; Matthew L Henderson; William J Riehl; Dan Gunter; Dan Murphy-Olson; Stephen Chan; Roy T Kamimura; Thomas S Brettin; Folker Meyer; Dylan Chivian; David J. Weston; Elizabeth M. Glass; Brian H. Davison; Sunita Kumari; Benjamin H Allen; Jason K. Baumohl; Aaron A. Best; Ben Bowen; Steven E. Brenner; Christopher C Bun

The U.S. Department of Energy Systems Biology Knowledgebase (KBase) is an open-source software and data platform designed to meet the grand challenge of systems biology — predicting and designing biological function from the biomolecular (small scale) to the ecological (large scale). KBase is available for anyone to use, and enables researchers to collaboratively generate, test, compare, and share hypotheses about biological functions; perform large-scale analyses on scalable computing infrastructure; and combine experimental evidence and conclusions that lead to accurate models of plant and microbial physiology and community dynamics. The KBase platform has (1) extensible analytical capabilities that currently include genome assembly, annotation, ontology assignment, comparative genomics, transcriptomics, and metabolic modeling; (2) a web-browser-based user interface that supports building, sharing, and publishing reproducible and well-annotated analyses with integrated data; (3) access to extensive computational resources; and (4) a software development kit allowing the community to add functionality to the system.

Collaboration


Dive into the James Gurtowski's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

W. Richard McCombie

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Sara Goodwin

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Hayan Lee

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Maria Nattestad

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Doreen Ware

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Melissa Kramer

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Adam P. Arkin

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Ben Bowen

Lawrence Berkeley National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge