Jeremy R. Wang
University of North Carolina at Chapel Hill
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jeremy R. Wang.
Nature Genetics | 2011
Hyuna Yang; Jeremy R. Wang; John P. Didion; Ryan J. Buus; Timothy A. Bell; Catherine E. Welsh; Franãois Bonhomme; Alex Hon-Tsen Yu; Michael W. Nachman; Jaroslav Piálek; Priscilla K. Tucker; Pierre Boursot; Leonard McMillan; Gary A. Churchill; Fernando Pardo-Manuel de Villena
Here we provide a genome-wide, high-resolution map of the phylogenetic origin of the genome of most extant laboratory mouse inbred strains. Our analysis is based on the genotypes of wild-caught mice from three subspecies of Mus musculus. We show that classical laboratory strains are derived from a few fancy mice with limited haplotype diversity. Their genomes are overwhelmingly Mus musculus domesticus in origin, and the remainder is mostly of Japanese origin. We generated genome-wide haplotype maps based on identity by descent from fancy mice and show that classical inbred strains have limited and non-randomly distributed genetic diversity. In contrast, wild-derived laboratory strains represent a broad sampling of diversity within M. musculus. Intersubspecific introgression is pervasive in these strains, and contamination by laboratory stocks has played a role in this process. The subspecific origin, haplotype diversity and identity by descent maps can be visualized using the Mouse Phylogeny Viewer (see URLs).
Nature Genetics | 2015
James J. Crowley; Vasyl Zhabotynsky; Wei Sun; Shunping Huang; Isa Kemal Pakatci; Yunjung Kim; Jeremy R. Wang; Andrew P. Morgan; John D. Calaway; David L. Aylor; Zaining Yun; Timothy A. Bell; Ryan J. Buus; Mark Calaway; John P. Didion; Terry J. Gooch; Stephanie D. Hansen; Nashiya N. Robinson; Ginger D. Shaw; Jason S. Spence; Corey R. Quackenbush; Cordelia J. Barrick; Randal J. Nonneman; Kyungsu Kim; James Xenakis; Yuying Xie; William Valdar; Alan B. Lenarcic; Wei Wang; Catherine E. Welsh
Complex human traits are influenced by variation in regulatory DNA through mechanisms that are not fully understood. Because regulatory elements are conserved between humans and mice, a thorough annotation of cis regulatory variants in mice could aid in further characterizing these mechanisms. Here we provide a detailed portrait of mouse gene expression across multiple tissues in a three-way diallel. Greater than 80% of mouse genes have cis regulatory variation. Effects from these variants influence complex traits and usually extend to the human ortholog. Further, we estimate that at least one in every thousand SNPs creates a cis regulatory effect. We also observe two types of parent-of-origin effects, including classical imprinting and a new global allelic imbalance in expression favoring the paternal allele. We conclude that, as with humans, pervasive regulatory variation influences complex genetic traits in mice and provide a new resource toward understanding the genetic control of transcription in mammals.
Proceedings of the National Academy of Sciences of the United States of America | 2015
Thomas C. Boothby; Jennifer R. Tenlen; Frank W. Smith; Jeremy R. Wang; Kiera A. Patanella; Erin Osborne Nishimura; Sophia C. Tintori; Qing Li; Corbin D. Jones; Mark Yandell; David N. Messina; Jarret Glasscock; Bob Goldstein
Significance Despite fascinating scientists for over 200 years, little at the molecular level is known about tardigrades, microscopic animals resistant to extreme stresses. We present the genome of a tardigrade. Approximately one-sixth of the genes in the tardigrade genome were found to have been acquired through horizontal transfer, a proportion nearly double the proportion of previous known cases of extreme horizontal gene transfer (HGT) in animals. Foreign genes have impacted the composition of the tardigrade genome: supplementing, expanding, and replacing endogenous gene families, including those families implicated in stress tolerance. Our results extend recent findings that HGT is more prevalent in animals than previously suspected, and they suggest that organisms that survive extreme stresses might be predisposed to acquiring foreign genes. Horizontal gene transfer (HGT), or the transfer of genes between species, has been recognized recently as more pervasive than previously suspected. Here, we report evidence for an unprecedented degree of HGT into an animal genome, based on a draft genome of a tardigrade, Hypsibius dujardini. Tardigrades are microscopic eight-legged animals that are famous for their ability to survive extreme conditions. Genome sequencing, direct confirmation of physical linkage, and phylogenetic analysis revealed that a large fraction of the H. dujardini genome is derived from diverse bacteria as well as plants, fungi, and Archaea. We estimate that approximately one-sixth of tardigrade genes entered by HGT, nearly double the fraction found in the most extreme cases of HGT into animals known to date. Foreign genes have supplemented, expanded, and even replaced some metazoan gene families within the tardigrade genome. Our results demonstrate that an unexpectedly large fraction of an animal genome can be derived from foreign sources. We speculate that animals that can survive extremes may be particularly prone to acquiring foreign genes.
Genetics | 2012
Jeremy R. Wang; Fernando Pardo-Manuel de Villena; Heather A. Lawson; James M. Cheverud; Gary A. Churchill; Leonard McMillan
We present full-genome genotype imputations for 100 classical laboratory mouse strains, using a novel method. Using genotypes at 549,683 SNP loci obtained with the Mouse Diversity Array, we partitioned the genome of 100 mouse strains into 40,647 intervals that exhibit no evidence of historical recombination. For each of these intervals we inferred a local phylogenetic tree. We combined these data with 12 million loci with sequence variations recently discovered by whole-genome sequencing in a common subset of 12 classical laboratory strains. For each phylogenetic tree we identified strains sharing a leaf node with one or more of the sequenced strains. We then imputed high- and medium-confidence genotypes for each of 88 nonsequenced genomes. Among inbred strains, we imputed 92% of SNPs genome-wide, with 71% in high-confidence regions. Our method produced 977 million new genotypes with an estimated per-SNP error rate of 0.083% in high-confidence regions and 0.37% genome-wide. Our analysis identified which of the 88 nonsequenced strains would be the most informative for improving full-genome imputation, as well as which additional strain sequences will reveal more new genetic variants. Imputed sequences and quality scores can be downloaded and visualized online.
BMC Bioinformatics | 2012
Jeremy R. Wang; Fernando Pardo-Manuel de Villena; Leonard McMillan
BackgroundGenome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research.ResultsWe have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations.ConclusionsUnlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains.
PLOS Genetics | 2013
John D. Calaway; Alan B. Lenarcic; John P. Didion; Jeremy R. Wang; Jeremy B. Searle; Leonard McMillan; William Valdar; Fernando Pardo-Manuel de Villena
X chromosome inactivation (XCI) is the mammalian mechanism of dosage compensation that balances X-linked gene expression between the sexes. Early during female development, each cell of the embryo proper independently inactivates one of its two parental X-chromosomes. In mice, the choice of which X chromosome is inactivated is affected by the genotype of a cis-acting locus, the X-chromosome controlling element (Xce). Xce has been localized to a 1.9 Mb interval within the X-inactivation center (Xic), yet its molecular identity and mechanism of action remain unknown. We combined genotype and sequence data for mouse stocks with detailed phenotyping of ten inbred strains and with the development of a statistical model that incorporates phenotyping data from multiple sources to disentangle sources of XCI phenotypic variance in natural female populations on X inactivation. We have reduced the Xce candidate 10-fold to a 176 kb region located approximately 500 kb proximal to Xist. We propose that structural variation in this interval explains the presence of multiple functional Xce alleles in the genus Mus. We have identified a new allele, Xcee present in Mus musculus and a possible sixth functional allele in Mus spicilegus. We have also confirmed a parent-of-origin effect on X inactivation choice and provide evidence that maternal inheritance magnifies the skewing associated with strong Xce alleles. Based on the phylogenetic analysis of 155 laboratory strains and wild mice we conclude that Xcea is either a derived allele that arose concurrently with the domestication of fancy mice but prior the derivation of most classical inbred strains or a rare allele in the wild. Furthermore, we have found that despite the presence of multiple haplotypes in the wild Mus musculus domesticus has only one functional Xce allele, Xceb. Lastly, we conclude that each mouse taxa examined has a different functional Xce allele.
BMC Bioinformatics | 2017
Jeremy R. Wang; Bryan Quach; Terrence S. Furey
BackgroundHigh-throughput sequence (HTS) data exhibit position-specific nucleotide biases that obscure the intended signal and reduce the effectiveness of these data for downstream analyses. These biases are particularly evident in HTS assays for identifying regulatory regions in DNA (DNase-seq, ChIP-seq, FAIRE-seq, ATAC-seq). Biases may result from many experiment-specific factors, including selectivity of DNA restriction enzymes and fragmentation method, as well as sequencing technology-specific factors, such as choice of adapters/primers and sample amplification methods.ResultsWe present a novel method to detect and correct position-specific nucleotide biases in HTS short read data. Our method calculates read-specific weights based on aligned reads to correct the over- or underrepresentation of position-specific nucleotide subsequences, both within and adjacent to the aligned read, relative to a baseline calculated in assay-specific enriched regions. Using HTS data from a variety of ChIP-seq, DNase-seq, FAIRE-seq, and ATAC-seq experiments, we show that our weight-adjusted reads reduce the position-specific nucleotide imbalance across reads and improve the utility of these data for downstream analyses, including identification and characterization of open chromatin peaks and transcription-factor binding sites.ConclusionsA general-purpose method to characterize and correct position-specific nucleotide sequence biases fills the need to recognize and deal with, in a systematic manner, binding-site preference for the growing number of HTS-based epigenetic assays. As the breadth and impact of these biases are better understood, the availability of a standard toolkit to correct them will be important.
Nature Genetics | 2015
James J. Crowley; Vasyl Zhabotynsky; Wei Sun; Shunping Huang; Isa Kemal Pakatci; Yunjung Kim; Jeremy R. Wang; Andrew P. Morgan; John D. Calaway; David L. Aylor; Zaining Yun; Timothy A. Bell; Ryan J. Buus; Mark Calaway; John P. Didion; Terry J. Gooch; Stephanie D. Hansen; Nashiya N. Robinson; Ginger D. Shaw; Jason S. Spence; Corey R. Quackenbush; Cordelia J. Barrick; Randal J. Nonneman; Kyungsu Kim; James Xenakis; Yuying Xie; William Valdar; Alan B. Lenarcic; Wei Wang; Catherine E. Welsh
Nat. Genet. 47, 353–360 (2015); published online 2 March 2015; corrected after print 16 April 2015 In the version of this article initially published, an accession number was not provided for RNA-seq data sets. The RNA-seq data sets that passed quality control are available at the Sequence Read Archive (SRA) under accession SRP056236.
BMC Bioinformatics | 2018
Jeremy R. Wang; James Holt; Leonard McMillan; Corbin D. Jones
BackgroundLong read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy.ResultsWe describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods.ConclusionOur method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.
bioRxiv | 2016
James Holt; Jeremy R. Wang; Corbin D. Jones; Leonard McMillan
Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging “hybrid” assemblies that use long reads for scaffolding and short reads for accuracy. To this end, we describe a novel application of a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We show that our method efficiently produces significantly higher quality corrected sequence than existing hybrid error-correction methods. We demonstrate the effectiveness of our method compared to state-of-the-art hybrid and long-read only de novo assembly methods.
Collaboration
Dive into the Jeremy R. Wang's collaboration.
Fernando Pardo-Manuel de Villena
University of North Carolina at Chapel Hill
View shared research outputs