Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jason G. Underwood is active.

Publication


Featured researches published by Jason G. Underwood.


PLOS ONE | 2012

Improving PacBio Long Read Accuracy by Short Read Alignment

Kin Fai Au; Jason G. Underwood; Lawrence Lee; Wing Hung Wong

The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity.


Proceedings of the National Academy of Sciences of the United States of America | 2013

Characterization of the human ESC transcriptome by hybrid sequencing

Kin Fai Au; Vittorio Sebastiano; Pegah Tootoonchi Afshar; Jens Durruthy Durruthy; Lawrence Lee; Brian A. Williams; Harm van Bakel; Eric E. Schadt; Renee Reijo-Pera; Jason G. Underwood; Wing Hung Wong

Significance Isoform identification and discovery are an important goal for transcriptome analysis because the majority of human genes express multiple isoforms with context- and tissue-specific functions. Better annotation of isoforms will also benefit downstream analysis such as expression quantification. Current RNA-Seq methods based on short-read sequencing are not reliable for isoform discovery. In this study we developed a new method based on the combined analysis of short reads and long reads generated, respectively, by second- and third-generation sequencing and applied this method to obtain a comprehensive characterization of the transcriptome of the human embryonic stem cell. The results showed that large gain in sensitivity and specificity can be achieved with this strategy. Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.


Nature Biotechnology | 2014

Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study.

Sheng Li; Scott Tighe; Charles M. Nicolet; Deborah S. Grove; Shawn Levy; William G. Farmerie; Agnes Viale; Chris L. Wright; Peter A. Schweitzer; Yuan Gao; Dewey Kim; Joe Boland; Belynda Hicks; Ryan Kim; Sagar Chhangawala; Nadereh Jafari; Nalini Raghavachari; Jorge Gandara; Natàlia Garcia-Reyero; Cynthia Hendrickson; David Roberson; Jeffrey Rosenfeld; Todd Smith; Jason G. Underwood; May Wang; Paul Zumbo; Don Baldwin; George Grills; Christopher E. Mason

High-throughput RNA sequencing (RNA-seq) greatly expands the potential for genomics discoveries, but the wide variety of platforms, protocols and performance capabilitites has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We carried out replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (poly-A–selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies PGM and Proton, Pacific Biosciences RS and Roche 454). The results show high intraplatform (Spearman rank R > 0.86) and inter-platform (R > 0.83) concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. For intact RNA, gene expression profiles from rRNA-depletion and poly-A enrichment are similar. In addition, rRNA depletion enables effective analysis of degraded RNA samples. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.


PLOS ONE | 2015

Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing.

Sean P. Gordon; Elizabeth Tseng; Asaf Salamov; Jiwei Zhang; Xiandong Meng; Zhiying Zhao; Dongwan Kang; Jason G. Underwood; Igor V. Grigoriev; Melania Figueroa; Jonathan S. Schilling; Feng Chen; Zhong Wang

Genes in prokaryotic genomes are often arranged into clusters and co-transcribed into polycistronic RNAs. Isolated examples of polycistronic RNAs were also reported in some higher eukaryotes but their presence was generally considered rare. Here we developed a long-read sequencing strategy to identify polycistronic transcripts in several mushroom forming fungal species including Plicaturopsis crispa, Phanerochaete chrysosporium, Trametes versicolor, and Gloeophyllum trabeum. We found genome-wide prevalence of polycistronic transcription in these Agaricomycetes, involving up to 8% of the transcribed genes. Unlike polycistronic mRNAs in prokaryotes, these co-transcribed genes are also independently transcribed. We show that polycistronic transcription may interfere with expression of the downstream tandem gene. Further comparative genomic analysis indicates that polycistronic transcription is conserved among a wide range of mushroom forming fungi. In summary, our study revealed, for the first time, the genome prevalence of polycistronic transcription in a phylogenetic range of higher fungi. Furthermore, we systematically show that our long-read sequencing approach and combined bioinformatics pipeline is a generic powerful tool for precise characterization of complex transcriptomes that enables identification of mRNA isoforms not recovered via short-read assembly.


Nucleic Acids Research | 2015

Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing

Jason L. Weirather; Pegah Tootoonchi Afshar; Tyson A. Clark; Elizabeth Tseng; Linda S. Powers; Jason G. Underwood; Joseph Zabner; Jonas Korlach; Wing Hung Wong; Kin Fai Au

We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes.


PLOS ONE | 2014

Long-Read Sequencing of Chicken Transcripts and Identification of New Transcript Isoforms

Sean Thomas; Jason G. Underwood; Elizabeth Tseng; Alisha K. Holloway

The chicken has long served as an important model organism in many fields, and continues to aid our understanding of animal development. Functional genomics studies aimed at probing the mechanisms that regulate development require high-quality genomes and transcript annotations. The quality of these resources has improved dramatically over the last several years, but many isoforms and genes have yet to be identified. We hope to contribute to the process of improving these resources with the data presented here: a set of long cDNA sequencing reads, and a curated set of new genes and transcript isoforms not currently represented in the most up-to-date genome annotation currently available to the community of researchers who rely on the chicken genome.


Science | 2018

High-resolution comparative analysis of great ape genomes

Zev N. Kronenberg; Ian T Fiddes; David Gordon; Shwetha Murali; Stuart Cantsilieris; Olivia S. Meyerson; Jason G. Underwood; Bradley J. Nelson; Mark Chaisson; Max Dougherty; Katherine M. Munson; Alex Hastie; Mark Diekhans; Fereydoun Hormozdiari; Nicola Lorusso; Kendra Hoekzema; Ruolan Qiu; Karen Clark; Archana Raja; AnneMarie E. Welch; Melanie Sorensen; Carl Baker; Robert S. Fulton; Joel Armstrong; Tina A. Graves-Lindsay; Ahmet M. Denli; Emma R. Hoppe; Pinghsun Hsieh; Christopher M. Hill; Andy Wing Chun Pang

A spotlight on great ape genomes Most nonhuman primate genomes generated to date have been “humanized” owing to their many gaps and the reliance on guidance by the reference human genome. To remove this humanizing effect, Kronenberg et al. generated and assembled long-read genomes of a chimpanzee, an orangutan, and two humans and compared them with a previously generated gorilla genome. This analysis recognized genomic structural variation specific to humans and particular ape lineages. Comparisons between human and chimpanzee cerebral organoids showed down-regulation of the expression of specific genes in humans, relative to chimpanzees, related to noncoding variation identified in this analysis. Science, this issue p. eaar6343 Analysis of long-read great ape and human genomes identifies human-specific changes affecting brain gene expression. INTRODUCTION Understanding the genetic differences that make us human is a long-standing endeavor that requires the comprehensive discovery and comparison of all forms of genetic variation within great ape lineages. RATIONALE The varied quality and completeness of ape genomes have limited comparative genetic analyses. To eliminate this contiguity and quality disparity, we generated human and nonhuman ape genome assemblies without the guidance of the human reference genome. These new genome assemblies enable both coarse and fine-scale comparative genomic studies. RESULTS We sequenced and assembled two human, one chimpanzee, and one orangutan genome using high-coverage (>65x) single-molecule, real-time (SMRT) long-read sequencing technology. We also sequenced more than 500,000 full-length complementary DNA samples from induced pluripotent stem cells to construct de novo gene models, increasing our knowledge of transcript diversity in each ape lineage. The new nonhuman ape genome assemblies improve gene annotation and genomic contiguity (by 30- to 500-fold), resulting in the identification of larger synteny blocks (by 22- to 74-fold) when compared to earlier assemblies. Including the latest gorilla genome, we now estimate that 83% of the ape genomes can be compared in a multiple sequence alignment. We observe a modest increase in single-nucleotide variant divergence compared to previous genome analyses and estimate that 36% of human autosomal DNA is subject to incomplete lineage sorting. We fully resolve most common repeat differences, including full-length retrotransposons such as the African ape-specific endogenous retroviral element PtERV1. We show that the spread of this element independently in the gorilla and chimpanzee lineage likely resulted from a founder element that failed to segregate to the human lineage because of incomplete lineage sorting. The improved sequence contiguity allowed a more systematic discovery of structural variation (>50 base pairs in length) (see the figure). We detected 614,186 ape deletions, insertions, and inversions, assigning each to specific ape lineages. Unbiased genome scaffolding (optical maps, bacterial artificial chromosome sequencing, and fluorescence in situ hybridization) led to the discovery of large, unknown complex inversions in gene-rich regions. Of the 17,789 fixed human-specific insertions and deletions, we focus on those of potential functional effect. We identify 90 that are predicted to disrupt genes and an additional 643 that likely affect regulatory regions, more than doubling the number of human-specific deletions that remove regulatory sequence in the human lineage. We investigate the association of structural variation with changes in human-chimpanzee brain gene expression using cerebral organoids as a proxy for expression differences. Genes associated with fixed structural variants (SVs) show a pattern of down-regulation in human radial glial neural progenitors, whereas human-specific duplications are associated with up-regulated genes in human radial glial and excitatory neurons (see the figure). CONCLUSION The improved ape genome assemblies provide the most comprehensive view to date of intermediate-size structural variation and highlight several dozen genes associated with structural variation and brain-expression differences between humans and chimpanzees. These new references will provide a stepping stone for the completion of great ape genomes at a quality commensurate with the human reference genome and, ultimately, an understanding of the genetic differences that make us human. SMRT assemblies and SV analyses. (Top) Contiguity of the de novo assemblies. (Bottom, left to right) For each ape, SVdetection was done against the human reference genome as represented by a dot plot of an inversion). Human-specific SVs, identified by comparing ape SVs and population genotyping (0/0, homozygous reference),were compared to single-cell gene expression differences [range: low (dark blue) to high (dark red)] in primary and organoid tissues. Each heatmap row is a gene that intersects an insertion or deletion (green), duplication (cyan), or inversion (light green). Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single– to mega–base pair–sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.


bioRxiv | 2014

Widespread polycistronic transcripts in mushroom-forming fungi revealed by single-molecule long-read mRNA sequencing

Sean P. Gordon; Elizabeth Tseng; Asaf Salamov; Jiwei Zhang; Xiandong Meng; Zhiying Zhao; Dongwan Don Kang; Jason G. Underwood; Igor V. Grigoriev; Melania Figueroa; Jonathan S. Schilling; Feng Chen; Zhong Wang

Genes in prokaryotic genomes are often arranged into clusters and co-transcribed into polycistronic RNAs. Isolated examples of polycistronic RNAs were also reported in some eukaryotes but their presence was generally considered rare. Here we developed a long-read sequencing strategy to identify polycistronic transcripts in several mushroom forming fungal species including Plicaturopsis crispa, Phanerochaete chrysosporium, Trametes versicolor and Gloeophyllum trabeum1. We found genome-wide prevalence of polycistronic transcription in these Agaricomycetes, and it involves up to 8% of the transcribed genes. Unlike polycistronic mRNAs in prokaryotes, these co-transcribed genes are also independently transcribed, and upstream transcription may interfere downstream transcription. Further comparative genomic analysis indicates that polycistronic transcription is likely a feature unique to these fungi. In addition, we also systematically demonstrated that short-read assembly is insufficient for mRNA isoform discovery, especially for isoform-rich loci. In summary, our study revealed, for the first time, the genome prevalence of polycistronic transcription in a subset of fungi. Futhermore, our long-read sequencing approach combined with bioinformatics pipeline is a generic powerful tool for precise characterization of complex transcriptomes.


Genome Research | 2018

Comparative Annotation Toolkit (CAT) - Simultaneous Clade and Personal Genome Annotation

Ian T Fiddes; Joel Armstrong; Mark Diekhans; Stefanie Nachtweide; Zev N. Kronenberg; Jason G. Underwood; David Gordon; Dent Earl; Thomas Keane; Evan E. Eichler; David Haussler; Mario Stanke; Benedict Paten

The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.


Methods of Molecular Biology | 2016

High-Throughput Nuclease Probing of RNA Structures Using FragSeq

Andrew V. Uzilov; Jason G. Underwood

High-throughput sequencing of cDNA (RNA-Seq) can be used to generate nuclease accessibility data for many distinct transcripts in the same mixture simultaneously. Such assays accelerate RNA structure analysis and provide researchers with new technologies to tackle biological questions on a transcriptome-wide scale. FragSeq is an experimental assay for transcriptome-wide RNA structure probing using RNA-Seq, coupled with data analysis tools that allow quantitative determination of nuclease accessibility at single-base resolution. We provide a practical guide to designing and carrying out FragSeq experiments and data analysis.

Collaboration


Dive into the Jason G. Underwood's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Agnes Viale

Memorial Sloan Kettering Cancer Center

View shared research outputs
Top Co-Authors

Avatar

Asaf Salamov

United States Department of Energy

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Charles M. Nicolet

University of Wisconsin-Madison

View shared research outputs
Researchain Logo
Decentralizing Knowledge