Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where James Holt is active.

Publication


Featured researches published by James Holt.


Nature Genetics | 2015

Analyses of allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance

James J. Crowley; Vasyl Zhabotynsky; Wei Sun; Shunping Huang; Isa Kemal Pakatci; Yunjung Kim; Jeremy R. Wang; Andrew P. Morgan; John D. Calaway; David L. Aylor; Zaining Yun; Timothy A. Bell; Ryan J. Buus; Mark Calaway; John P. Didion; Terry J. Gooch; Stephanie D. Hansen; Nashiya N. Robinson; Ginger D. Shaw; Jason S. Spence; Corey R. Quackenbush; Cordelia J. Barrick; Randal J. Nonneman; Kyungsu Kim; James Xenakis; Yuying Xie; William Valdar; Alan B. Lenarcic; Wei Wang; Catherine E. Welsh

Complex human traits are influenced by variation in regulatory DNA through mechanisms that are not fully understood. Because regulatory elements are conserved between humans and mice, a thorough annotation of cis regulatory variants in mice could aid in further characterizing these mechanisms. Here we provide a detailed portrait of mouse gene expression across multiple tissues in a three-way diallel. Greater than 80% of mouse genes have cis regulatory variation. Effects from these variants influence complex traits and usually extend to the human ortholog. Further, we estimate that at least one in every thousand SNPs creates a cis regulatory effect. We also observe two types of parent-of-origin effects, including classical imprinting and a new global allelic imbalance in expression favoring the paternal allele. We conclude that, as with humans, pervasive regulatory variation influences complex genetic traits in mice and provide a new resource toward understanding the genetic control of transcription in mammals.


Molecular Biology and Evolution | 2016

R2d2 Drives Selfish Sweeps in the House Mouse

John P. Didion; Andrew P. Morgan; Liran Yadgary; Timothy A. Bell; Rachel C. McMullan; Lydia Ortiz de Solorzano; Janice Britton-Davidian; Karl J. Campbell; Riccardo Castiglia; Yung-Hao Ching; Amanda J. Chunco; James J. Crowley; Elissa J. Chesler; Daniel W. Förster; John E. French; Sofia I. Gabriel; Daniel M. Gatti; Theodore Garland; Eva B. Giagia-Athanasopoulou; Mabel D. Giménez; Sofia A. Grize; İslam Gündüz; Andrew Holmes; Heidi C. Hauffe; Jeremy S. Herman; James Holt; Kunjie Hua; Wesley J. Jolley; Anna K. Lindholm; María José López-Fuster

A selective sweep is the result of strong positive selection driving newly occurring or standing genetic variants to fixation, and can dramatically alter the pattern and distribution of allelic diversity in a population. Population-level sequencing data have enabled discoveries of selective sweeps associated with genes involved in recent adaptations in many species. In contrast, much debate but little evidence addresses whether “selfish” genes are capable of fixation—thereby leaving signatures identical to classical selective sweeps—despite being neutral or deleterious to organismal fitness. We previously described R2d2, a large copy-number variant that causes nonrandom segregation of mouse Chromosome 2 in females due to meiotic drive. Here we show population-genetic data consistent with a selfish sweep driven by alleles of R2d2 with high copy number (R2d2HC) in natural populations. We replicate this finding in multiple closed breeding populations from six outbred backgrounds segregating for R2d2 alleles. We find that R2d2HC rapidly increases in frequency, and in most cases becomes fixed in significantly fewer generations than can be explained by genetic drift. R2d2HC is also associated with significantly reduced litter sizes in heterozygous mothers, making it a true selfish allele. Our data provide direct evidence of populations actively undergoing selfish sweeps, and demonstrate that meiotic drive can rapidly alter the genomic landscape in favor of mutations with neutral or even negative effects on overall Darwinian fitness. Further study will reveal the incidence of selfish sweeps, and will elucidate the relative contributions of selfish genes, adaptation and genetic drift to evolution.


Database | 2014

A novel multi-alignment pipeline for high-throughput sequencing data

Shunping Huang; James Holt; Chia Yu Kao; Leonard McMillan; Wei Wang

Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo


Bioinformatics | 2014

Merging of Multi-String BWTs with Applications

James Holt; Leonard McMillan

MOTIVATION The throughput of genomic sequencing has increased to the point that is overrunning the rate of downstream analysis. This, along with the desire to revisit old data, has led to a situation where large quantities of raw, and nearly impenetrable, sequence data are rapidly filling the hard drives of modern biology labs. These datasets can be compressed via a multi-string variant of the Burrows-Wheeler Transform (BWT), which provides the side benefit of searches for arbitrary k-mers within the raw data as well as the ability to reconstitute arbitrary reads as needed. We propose a method for merging such datasets for both increased compression and downstream analysis. RESULTS We present a novel algorithm that merges multi-string BWTs in [Formula: see text] time where LCS is the length of their longest common substring between any of the inputs, and N is the total length of all inputs combined (number of symbols) using [Formula: see text] bits where F is the number of multi-string BWTs merged. This merged multi-string BWT is also shown to have a higher compressibility compared with the input multi-string BWTs separately. Additionally, we explore some uses of a merged multi-string BWT for bioinformatics applications.


international conference on bioinformatics | 2013

Read Annotation Pipeline for High-Throughput Sequencing Data

James Holt; Shunping Huang; Leonard McMillan; Wei Wang

Mapping reads to a reference sequence is a common step when analyzing allele effects in high throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending the genetic distances of the target sequences from the reference. To avoid this bias researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings, and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins.


G3: Genes, Genomes, Genetics | 2016

Whole Genome Sequence of Two Wild-Derived Mus musculus domesticus Inbred Strains, LEWES/EiJ and ZALENDE/EiJ, with Different Diploid Numbers

Andrew P. Morgan; John P. Didion; Anthony G. Doran; James Holt; Leonard McMillan; Thomas M. Keane; Fernando Pardo-Manuel de Villena

Wild-derived mouse inbred strains are becoming increasingly popular for complex traits analysis, evolutionary studies, and systems genetics. Here, we report the whole-genome sequencing of two wild-derived mouse inbred strains, LEWES/EiJ and ZALENDE/EiJ, of Mus musculus domesticus origin. These two inbred strains were selected based on their geographic origin, karyotype, and use in ongoing research. We generated 14× and 18× coverage sequence, respectively, and discovered over 1.1 million novel variants, most of which are private to one of these strains. This report expands the number of wild-derived inbred genomes in the Mus genus from six to eight. The sequence variation can be accessed via an online query tool; variant calls (VCF format) and alignments (BAM format) are available for download from a dedicated ftp site. Finally, the sequencing data have also been stored in a lossless, compressed, and indexed format using the multi-string Burrows-Wheeler transform. All data can be used without restriction.


international conference on bioinformatics | 2014

Constructing burrows-wheeler transforms of large string collections via merging

James Holt; Leonard McMillan

The throughput of biological sequencing technologies has led to the necessity for compressed and accessible sequencing formats. Recently, the Multi-String Burrows-Wheeler Transform (MSBWT) has risen in prevalence as a method for transforming sequence data to improve compression while providing access to the reads through an auxiliary FM-index. While there are many algorithms for building the MSBWT for a collection of strings, they do not scale well as the length of those strings increases. We propose a new method for constructing the MSBWT for a collection of strings based on previous work for merging two or more MSBWTs. It requires O(N * LCPavg * log(m)) time and O(N) bits of memory for a collection of m strings composed of N symbols where the average longest common prefix of all suffixes is LCPavg. We evaluate the speed of the algorithm on multiple datasets that vary in both quantity of strings and string length. Availability: https://code.google.com/p/msbwt/source/browse/MUSCython/MultimergeCython.pyx


bioinformatics and biomedicine | 2015

Short read error correction using an FM-index

Seth Greenstein; James Holt; Leonard McMillan

Whole genome sequencing is becoming more affordable, but sequencing errors complicate the analysis and diminish the utility of the data. We present FMRC, a new tool for correcting errors in DNA short reads from high-throughput sequencing. It uses a Burrows-Wheeler Transform and FM-index to enable a k-mer counting approach for correcting substitution, insertion, and deletion errors. In general, it corrects errors more effectively than other error correction tools, leading to better alignments and de novo assemblies. FMRC is freely available at https://github.com/sgreenstein/fmrc.


Nature Genetics | 2015

Erratum: Analyses of allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance (Nature Genetics (2015) 47 (353-360))

James J. Crowley; Vasyl Zhabotynsky; Wei Sun; Shunping Huang; Isa Kemal Pakatci; Yunjung Kim; Jeremy R. Wang; Andrew P. Morgan; John D. Calaway; David L. Aylor; Zaining Yun; Timothy A. Bell; Ryan J. Buus; Mark Calaway; John P. Didion; Terry J. Gooch; Stephanie D. Hansen; Nashiya N. Robinson; Ginger D. Shaw; Jason S. Spence; Corey R. Quackenbush; Cordelia J. Barrick; Randal J. Nonneman; Kyungsu Kim; James Xenakis; Yuying Xie; William Valdar; Alan B. Lenarcic; Wei Wang; Catherine E. Welsh

Nat. Genet. 47, 353–360 (2015); published online 2 March 2015; corrected after print 16 April 2015 In the version of this article initially published, an accession number was not provided for RNA-seq data sets. The RNA-seq data sets that passed quality control are available at the Sequence Read Archive (SRA) under accession SRP056236.


bioRxiv | 2018

Programmatic Detection of Diploid-Triploid Mixoploidy via Whole Genome Sequencing

James Holt; Camille L Birch; Donna M Brown; Joy D. Cogan; Rizwan Hamid; Naghmeh Dorrani; Matthew R. Herzog; Hane Lee; Julian A. Martinez; Katrina M. Dipple; Eric Vilain; John A. Phillips; Elizabeth A. Worthey

Purpose Mixoploidy is a type of mosaicism where an organism is a mixture of cells with different numbers of chromosomes. There are a broad range of phenotypes associated with mixoploidy that vary greatly depending on the fraction of cells that are non-diploid, their chromosome number, their distribution, and presumably the specific variation present in the patient. Clinical detection of mixoploidy is important for diagnosis. Methods We developed a method to detect mixoploidy from clinical whole genome sequencing (WGS) data through the identification of excess of variant calls centered on unusual B-allele frequencies. Our method isolates the signal from these variants using trio calls and then solves a basic linear equation to estimate levels of diploid-triploid mixoploidy within the sample. Results We show that our method reflects the results from a cytogenetic test. We provide examples detailing how our method has been used to identify diploid-triploid mixoploid individuals from within the NIH Undiagnosed Diseases Network. We present confirmatory findings obtained by clinical cytogenetic testing and show that our method can be used to identify the diploid-triploid ratio in these cases. Conclusion WGS data from patients with rare diseases can be used to identify mixoploid individuals. Individuals with certain characteristics as discussed should be tested for mixoploidy as part of standard clinical pipeline procedures. Scripts that perform this calculation are publicly available at https://github.com/HudsonAlpha/mixoviz.

Collaboration


Dive into the James Holt's collaboration.

Top Co-Authors

Avatar

Leonard McMillan

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Andrew P. Morgan

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

John P. Didion

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

James J. Crowley

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Jeremy R. Wang

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Shunping Huang

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Timothy A. Bell

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Wei Wang

University of California

View shared research outputs
Top Co-Authors

Avatar

Alan B. Lenarcic

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge