John Huddleston | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John Huddleston is active.

Explore More

Publication

Featured researches published by John Huddleston.

Nature Methods | 2013

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data

Chen Shan Chin; David H. Alexander; Patrick Marks; Aaron Klammer; James P Drake; Cheryl Heiner; Alicia Clum; Alex Copeland; John Huddleston; Evan E. Eichler; Stephen Turner; Jonas Korlach

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph–based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

Nature | 2015

An integrated map of structural variation in 2,504 human genomes

Peter H. Sudmant; Tobias Rausch; Eugene J. Gardner; Robert E. Handsaker; Alexej Abyzov; John Huddleston; Zhang Y; Kai Ye; Goo Jun; Markus His Yang Fritz; Miriam K. Konkel; Ankit Malhotra; Adrian M. Stütz; Xinghua Shi; Francesco Paolo Casale; Jieming Chen; Fereydoun Hormozdiari; Gargi Dayama; Ken Chen; Maika Malig; Mark Chaisson; Klaudia Walter; Sascha Meiers; Seva Kashin; Erik Garrison; Adam Auton; Hugo Y. K. Lam; Xinmeng Jasmine Mu; Can Alkan; Danny Antaki

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.

Genome Research | 2014

Reconstructing complex regions of genomes using long-read sequencing technology

John Huddleston; Swati Ranade; Maika Malig; Francesca Antonacci; Mark Chaisson; Lawrence Hon; Peter H. Sudmant; Tina Graves; Can Alkan; Megan Y. Dennis; Richard Wilson; Stephen Turner; Jonas Korlach; Evan E. Eichler

Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.

Nature | 2014

Gibbon genome and the fast karyotype evolution of small apes.

Lucia Carbone; R. Alan Harris; Sante Gnerre; Krishna R. Veeramah; Belen Lorente-Galdos; John Huddleston; Thomas J. Meyer; Javier Herrero; Christian Roos; Bronwen Aken; Fabio Anaclerio; Nicoletta Archidiacono; Carl Baker; Daniel Barrell; Mark A. Batzer; Kathryn Beal; Antoine Blancher; Craig Bohrson; Markus Brameier; Michael S. Campbell; Claudio Casola; Giorgia Chiatante; Andrew Cree; Annette Damert; Pieter J. de Jong; Laura Dumas; Marcos Fernandez-Callejo; Paul Flicek; Nina V. Fuchs; Ivo Gut

Gibbons are small arboreal apes that display an accelerated rate of evolutionary chromosomal rearrangement and occupy a key node in the primate phylogeny between Old World monkeys and great apes. Here we present the assembly and analysis of a northern white-cheeked gibbon (Nomascus leucogenys) genome. We describe the propensity for a gibbon-specific retrotransposon (LAVA) to insert into chromosome segregation genes and alter transcription by providing a premature termination site, suggesting a possible molecular mechanism for the genome plasticity of the gibbon lineage. We further show that the gibbon genera (Nomascus, Hylobates, Hoolock and Symphalangus) experienced a near-instantaneous radiation ∼5 million years ago, coincident with major geographical changes in southeast Asia that caused cycles of habitat compression and expansion. Finally, we identify signatures of positive selection in genes important for forelimb development (TBX5) and connective tissues (COL1A1) that may have been involved in the adaptation of gibbons to their arboreal habitat.

Science | 2015

Global diversity, population stratification, and selection of human copy-number variation.

Peter H. Sudmant; Swapan Mallick; Bradley J. Nelson; Fereydoun Hormozdiari; Niklas Krumm; John Huddleston; Bradley P. Coe; Carl Baker; Michael J. Bamshad; Lynn B. Jorde; Olga L. Posukh; Hovhannes Sahakyan; W. Scott Watkins; Levon Yepiskoposyan; M. Syafiq Abdullah; Claudio M. Bravi; Cristian Capelli; Tor Hervig; Joseph Wee; Chris Tyler-Smith; George van Driem; Irene Gallego Romero; Aashish R. Jha; Sena Karachanak-Yankova; Draga Toncheva; David Comas; Brenna M. Henn; Toomas Kivisild; Andres Ruiz-Linares; Antti Sajantila

Duplications and deletions in the human genome Duplications and deletions can lead to variation in copy number for genes and genomic loci among humans. Such variants can reveal evolutionary patterns and have implications for human health. Sudmant et al. examined copy-number variation across 236 individual genomes from 125 human populations. Deletions were under more selection, whereas duplications showed more population-specific structure. Interestingly, Oceanic populations retain large duplications postulated to have originated in an ancient Denisovan lineage. Science, this issue 10.1126/science.aab3761 Copy-number variation reveals how selection affects the human genome across the globe. INTRODUCTION Most studies of human genetic variation have focused on single-nucleotide variants (SNVs). However, copy-number variants (CNVs) affect more base pairs of DNA among humans, and yet our understanding of CNV diversity among human populations is limited. RATIONALE We aimed to understand the pattern, selection, and diversity of copy-number variation by analyzing deeply sequenced genomes representing the diversity of all humans. We compared the selective constraints of deletions versus duplications to understand population stratification in the context of the ancestral human genome and to assess differences in CNV load between African and non-African populations. RESULTS We sequenced 236 individual genomes from 125 distinct human populations and identified 14,467 autosomal CNVs and 545 X-linked CNVs with a sequence read-depth approach. Deletions exhibit stronger selective pressure and are better phylogenetic markers of population relationships than duplication polymorphisms. We identified 1036 population-stratified copy-number–variable regions, 295 of which intersect coding regions and 199 of which exhibit extreme signatures of differentiation. Duplicated loci were 1.8-fold more likely to be stratified than deletions but were poorly correlated with flanking genetic diversity. Among these, we highlight a duplication polymorphism restricted to modern Oceanic populations yet also present in the genome of the archaic Denisova hominin. This 225–kilo–base pair (kbp) duplication includes two microRNA genes and is almost fixed among human Papuan-Bougainville genomes. The data allowed us to reconstruct the ancestral human genome and create a more accurate evolutionary framework for the gain and loss of sequences during human evolution. We identified 571 loci that segregate in the human population and another 2026 loci of fixed-copy 2 in all human genomes but absent from the reference genome. The total deletion and duplication load between African and non-African population groups showed no difference after we account for ancestral sequences missing from the human reference. However, we did observe that the relative number of base pairs affected by CNVs compared to single-nucleotide polymorphisms is higher among non-Africans than Africans. CONCLUSION Deletions, duplications, and CNVs have shaped, to different extents, the genetic diversity of human populations by the combined forces of mutation, selection, and demography. Figure Global human CNV diversity and archaic introgression of a chromosome 16 duplication. (Left) The geographic coordinates of populations sampled are indicated on a world map (colored dots). The pie charts show the continental population allele frequency of a single ~225-kbp duplication polymorphism found exclusively among Oceanic populations and an archaic Denisova. (Right) The ancestral structure of this duplication locus (1) and the Denisova duplication structure (2) are shown in relation to their position on chromosome 16. We estimate that the duplication emerged ~440 thousand years ago (ka) in the Denisova and then introgressed into ancestral Papuan populations ~40 ka. In order to explore the diversity and selective signatures of duplication and deletion human copy-number variants (CNVs), we sequenced 236 individuals from 125 distinct human populations. We observed that duplications exhibit fundamentally different population genetic and selective signatures than deletions and are more likely to be stratified between human populations. Through reconstruction of the ancestral human genome, we identify megabases of DNA lost in different human lineages and pinpoint large duplications that introgressed from the extinct Denisova lineage now found at high frequency exclusively in Oceanic populations. We find that the proportion of CNV base pairs to single-nucleotide–variant base pairs is greater among non-Africans than it is among African populations, but we conclude that this difference is likely due to unique aspects of non-African population history as opposed to differences in CNV load.

Science | 2016

Long-read sequence assembly of the gorilla genome

David Gordon; John Huddleston; Mark Chaisson; Christopher M. Hill; Zev N. Kronenberg; Katherine M. Munson; Maika Malig; Archana Raja; Ian T Fiddes; LaDeana W. Hillier; Christopher P. Dunn; Carl Baker; Joel Armstrong; Mark Diekhans; Benedict Paten; Jay Shendure; Richard Wilson; David Haussler; Chen Shan Chin; Evan E. Eichler

Improving on the gorilla genome Access to complete, high-quality genomes of nonhuman primates will also help us understand human biology. Gordon et al. used long-read sequencing technology to improve genome data on our close relative the gorilla. Sequencing from a single individual decreased assembly fragmentation and recovered previously missed genes and noncoding loci. Mapping short-read sequences from additional gorillas helped reconstruct a “pan” gorilla sequence documenting genetic variation. Comparison with human genomes revealed species-specific differences ranging in size from one to thousands of bases in length, including some that are likely to affect gene regulation. Science, this issue p. 10.1126/science.aae0344 A new approach to looking at the gorilla genome improves estimates of the differences between humans and gorillas. INTRODUCTION The accurate sequence and assembly of genomes is critical to our understanding of evolution and genetic variation. Despite advances in short-read sequencing technology that have decreased cost and increased throughput, whole-genome assembly of mammalian genomes remains problematic because of the presence of repetitive DNA. RATIONALE The goal of this study was to sequence and assemble the genome of the western lowland gorilla by using primarily single-molecule, real-time (SMRT) sequencing technology and a novel assembly algorithm that takes advantage of long (>10 kbp) sequence reads. We specifically compare the properties of this assembly to gorilla genome assemblies that were generated by using more routine short sequence read approaches in order to determine the value and biological impact of a long-read genome assembly. RESULTS We generated 74.8-fold SMRT whole-genome shotgun sequence from peripheral blood DNA isolated from a western lowland gorilla (Gorilla gorilla gorilla) named Susie. We applied a string graph assembly algorithm, Falcon, and consensus algorithm, Quiver, to generate a 3.1-Gbp assembly with a contig N50 of 9.6 Mbp. Short-read sequence data from an additional six gorilla genomes was mapped so as to reduce indel errors and improve the accuracy of the final assembly. We estimate that 98.9% of the gorilla euchromatin has been assembled into 1854 sequence contigs. The assembly represents an improvement in contiguity: >800-fold with respect to the published gorilla genome assembly and >180-fold with respect to a more recently released upgrade of the gorilla assembly. Most of the sequence gaps are now closed, considerably increasing the yield of complete gene models. We estimate that 87% of the missing exons and 94% of the incomplete genes are recovered. We find that the sequence of most full-length common repeats is resolved, with the most significant gains occurring for the longest and most G+C–rich retrotransposons. Although complex regions such as the major histocompatibility locus are accurately sequenced and assembled, both heterochromatin and large, high-identity segmental duplications are not because read lengths are insufficiently long to traverse these repetitive structures. The long-read assembly produces a much finer map of structural variation down to 50 bp in length, facilitating the discovery of thousands of lineage-specific structural variant differences that have occurred since divergence from the human and chimpanzee lineages. This includes the disruption of specific genes and loss of predicted regulatory regions between the two species. We show that use of the new gorilla genome assembly changes estimates of divergence and diversity, resulting in subtle but substantial effects on previous population genetic inferences, such as the timing of species bottlenecks and changes in the effective population size over the course of evolution. CONCLUSION The genome assembly that results from using the long-read data provides a more complete picture of gene content, structural variation, and repeat biology, improving population genetic and evolutionary inferences. Long-read sequencing technology now makes it practical for individual laboratories to generate high-quality reference genomes for complex mammalian genomes. Long-read sequence assembly of the gorilla genome. (A) Susie, a female Western lowland gorilla, was used as the reference sample for full-genome sequencing and assembly [photograph courtesy of Max Block]. (B and C) A treemaps representing the differences in fragmentation of the long-read and short-read gorilla genome assemblies. The rectangles are the largest contigs that cumulatively make up 300 Mbp (~10%) of the assembly. Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome.

American Journal of Human Genetics | 2013

Complete Haplotype Sequence of the Human Immunoglobulin Heavy-Chain Variable, Diversity, and Joining Genes and Characterization of Allelic and Copy-Number Variation

Corey T. Watson; Karyn Meltz Steinberg; John Huddleston; René L. Warren; Maika Malig; Jacqueline E. Schein; A. Jeremy Willsey; Jeffrey B. Joy; Jamie K. Scott; Tina Graves; Richard Wilson; Robert A. Holt; Evan E. Eichler; Felix Breden

The immunoglobulin heavy-chain locus (IGH) encodes variable (IGHV), diversity (IGHD), joining (IGHJ), and constant (IGHC) genes and is responsible for antibody heavy-chain biosynthesis, which is vital to the adaptive immune response. Programmed V-(D)-J somatic rearrangement and the complex duplicated nature of the locus have impeded attempts to reconcile its genomic organization based on traditional B-lymphocyte derived genetic material. As a result, sequence descriptions of germline variation within IGHV are lacking, haplotype inference using traditional linkage disequilibrium methods has been difficult, and the human genome reference assembly is missing several expressed IGHV genes. By using a hydatidiform mole BAC clone resource, we present the most complete haplotype of IGHV, IGHD, and IGHJ gene regions derived from a single chromosome, representing an alternate assembly of ∼1 Mbp of high-quality finished sequence. From this we add 101 kbp of previously uncharacterized sequence, including functional IGHV genes, and characterize four large germline copy-number variants (CNVs). In addition to this germline reference, we identify and characterize eight CNV-containing haplotypes from a panel of nine diploid genomes of diverse ethnic origin, discovering previously unmapped IGHV genes and an additional 121 kbp of insertion sequence. We genotype four of these CNVs by using PCR in 425 individuals from nine human populations. We find that all four are highly polymorphic and show considerable evidence of stratification (Fst = 0.3-0.5), with the greatest differences observed between African and Asian populations. These CNVs exhibit weak linkage disequilibrium with SNPs from two commercial arrays in most of the populations tested.

American Journal of Human Genetics | 2016

Genome Sequencing of Autism-Affected Families Reveals Disruption of Putative Noncoding Regulatory DNA.

Tychele N. Turner; Fereydoun Hormozdiari; Michael H. Duyzend; Sarah A. McClymont; Paul W. Hook; Ivan Iossifov; Archana Raja; Carl Baker; Kendra Hoekzema; Holly A.F. Stessman; Michael C. Zody; Bradley J. Nelson; John Huddleston; Richard Sandstrom; Joshua D. Smith; David S. Hanna; James M. Swanson; Elaine M. Faustman; Michael J. Bamshad; John A. Stamatoyannopoulos; Deborah A. Nickerson; Andrew S. McCallion; Robert Darnell; Evan E. Eichler

We performed whole-genome sequencing (WGS) of 208 genomes from 53 families affected by simplex autism. For the majority of these families, no copy-number variant (CNV) or candidate de novo gene-disruptive single-nucleotide variant (SNV) had been detected by microarray or whole-exome sequencing (WES). We integrated multiple CNV and SNV analyses and extensive experimental validation to identify additional candidate mutations in eight families. We report that compared to control individuals, probands showed a significant (p = 0.03) enrichment of de novo and private disruptive mutations within fetal CNS DNase I hypersensitive sites (i.e., putative regulatory regions). This effect was only observed within 50 kb of genes that have been previously associated with autism risk, including genes where dosage sensitivity has already been established by recurrent disruptive de novo protein-coding mutations (ARID1B, SCN2A, NR3C2, PRKCA, and DSCAM). In addition, we provide evidence of gene-disruptive CNVs (in DISC1, WNT7A, RBFOX1, and MBD5), as well as smaller de novo CNVs and exon-specific SNVs missed by exome sequencing in neurodevelopmental genes (e.g., CANX, SAE1, and PIK3CA). Our results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.

Genome Research | 2013

Evolution and diversity of copy number variation in the great ape lineage

Peter H. Sudmant; John Huddleston; Claudia Rita Catacchio; Maika Malig; LaDeana W. Hillier; Carl Baker; Kiana Mohajeri; Ivanela Kondova; Ronald E. Bontrop; Stephan Persengiev; Francesca Antonacci; Mario Ventura; Javier Prado-Martinez; Great Ape Genome; Tomas Marques-Bonet; Evan E. Eichler

Copy number variation (CNV) contributes to disease and has restructured the genomes of great apes. The diversity and rate of this process, however, have not been extensively explored among great ape lineages. We analyzed 97 deeply sequenced great ape and human genomes and estimate 16% (469 Mb) of the hominid genome has been affected by recent CNV. We identify a comprehensive set of fixed gene deletions (n = 340) and duplications (n = 405) as well as >13.5 Mb of sequence that has been specifically lost on the human lineage. We compared the diversity and rates of copy number and single nucleotide variation across the hominid phylogeny. We find that CNV diversity partially correlates with single nucleotide diversity (r(2) = 0.5) and recapitulates the phylogeny of apes with few exceptions. Duplications significantly outpace deletions (2.8-fold). The load of segregating duplications remains significantly higher in bonobos, Western chimpanzees, and Sumatran orangutans-populations that have experienced recent genetic bottlenecks (P = 0.0014, 0.02, and 0.0088, respectively). The rate of fixed deletion has been more clocklike with the exception of the chimpanzee lineage, where we observe a twofold increase in the chimpanzee-bonobo ancestor (P = 4.79 × 10(-9)) and increased deletion load among Western chimpanzees (P = 0.002). The latter includes the first genomic disorder in a chimpanzee with features resembling Smith-Magenis syndrome mediated by a chimpanzee-specific increase in segmental duplication complexity. We hypothesize that demographic effects, such as bottlenecks, have contributed to larger and more gene-rich segments being deleted in the chimpanzee lineage and that this effect, more generally, may account for episodic bursts in CNV during hominid evolution.

Nature Communications | 2016

Long-read sequencing and de novo assembly of a Chinese genome

Lingling Shi; Yunfei Guo; Chengliang Dong; John Huddleston; Hui Yang; Xiaolu Han; Aisi Fu; Quan Li; Na Li; Siyi Gong; Katherine E Lintner; Qiong Ding; Zou Wang; Jiang Hu; Depeng Wang; Feng Wang; Lin Wang; Gholson J. Lyon; Yongtao Guan; Yufeng Shen; Oleg V. Evgrafov; James A. Knowles; Françoise Thibaud-Nissen; Valerie Schneider; Chack Yung Yu; Libing Zhou; Evan E. Eichler; Kf So; Kai Wang

Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.

Explore More