John Novembre | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John Novembre is active.

Explore More

Publication

Featured researches published by John Novembre.

Genome Research | 2009

Fast model-based estimation of ancestry in unrelated individuals

David H. Alexander; John Novembre; Kenneth Lange

Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure. Another approach, implemented in the program EIGENSTRAT, relies on Principal Component Analysis rather than model-based estimation and does not directly deliver admixture fractions. EIGENSTRAT has gained in popularity in part owing to its remarkable speed in comparison to structure. We present a new algorithm and a program, ADMIXTURE, for model-based estimation of ancestry in unrelated individuals. ADMIXTURE adopts the likelihood model embedded in structure. However, ADMIXTURE runs considerably faster, solving problems in minutes that take structure hours. In many of our experiments, we have found that ADMIXTURE is almost as fast as EIGENSTRAT. The runtime improvements of ADMIXTURE rely on a fast block relaxation scheme using sequential quadratic programming for block updates, coupled with a novel quasi-Newton acceleration of convergence. Our algorithm also runs faster and with greater accuracy than the implementation of an Expectation-Maximization (EM) algorithm incorporated in the program FRAPPE. Our simulations show that ADMIXTUREs maximum likelihood estimates of the underlying admixture coefficients and ancestral allele frequencies are as accurate as structures Bayesian estimates. On real-world data sets, ADMIXTUREs estimates are directly comparable to those from structure and EIGENSTRAT. Taken together, our results show that ADMIXTUREs computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.

Nature | 2008

Genes mirror geography within Europe.

John Novembre; Toby Johnson; Katarzyna Bryc; Zoltán Kutalik; Adam R. Boyko; Adam Auton; Amit Indap; Karen S. King; Sven Bergmann; Matthew R. Nelson; Matthew Stephens; Carlos Bustamante

Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome. Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing; an individual’s DNA can be used to infer their geographic origin with surprising accuracy—often to within a few hundred kilometres.

Genome Research | 2009

Signals of recent positive selection in a worldwide sample of human populations

Joseph K. Pickrell; Graham Coop; John Novembre; Sridhar Kudaravalli; Jun Li; Devin Absher; Balaji S. Srinivasan; Gregory S. Barsh; Richard M. Myers; Marcus W. Feldman; Jonathan K. Pritchard

Genome-wide scans for recent positive selection in humans have yielded insight into the mechanisms underlying the extensive phenotypic diversity in our species, but have focused on a limited number of populations. Here, we present an analysis of recent selection in a global sample of 53 populations, using genotype data from the Human Genome Diversity-CEPH Panel. We refine the geographic distributions of known selective sweeps, and find extensive overlap between these distributions for populations in the same continental region but limited overlap between populations outside these groupings. We present several examples of previously unrecognized candidate targets of selection, including signals at a number of genes in the NRG-ERBB4 developmental pathway in non-African populations. Analysis of recently identified genes involved in complex diseases suggests that there has been selection on loci involved in susceptibility to type II diabetes. Finally, we search for local adaptation between geographically close populations, and highlight several examples.

Science | 2012

An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People

Matthew R. Nelson; Daniel Wegmann; Margaret G. Ehm; Darren Kessner; Pamela L. St. Jean; Claudio Verzilli; Judong Shen; Zhengzheng Tang; Silviu Alin Bacanu; Dana Fraser; Liling Warren; Jennifer L. Aponte; Matthew Zawistowski; Xiao Liu; Hao Zhang; Yong Zhang; Jun Li; Yun Li; Li Li; Peter Woollard; Simon Topp; Matthew D. Hall; Keith Nangle; Jun Wang; Gonçalo R. Abecasis; Lon R. Cardon; Sebastian Zöllner; John C. Whittaker; Stephanie L. Chissoe; John Novembre

A Deep Look Into Our Genes Recent debates have focused on the degree of genetic variation and its impact upon health at the genomic level in humans (see the Perspective by Casals and Bertranpetit). Tennessen et al. (p. 64, published online 17 May), looking at all of the protein-coding genes in the human genome, and Nelson et al. (p. 100, published online 17 May), looking at genes that encode drug targets, address this question through deep sequencing efforts on samples from multiple individuals. The findings suggest that most human variation is rare, not shared between populations, and that rare variants are likely to play a role in human health. A pharmacogenomics analysis shows how challenging it will be to associate rare variants with phenotypes. Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.

Nature | 2010

Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication

Bridgett M. vonHoldt; John P. Pollinger; Kirk E. Lohmueller; Eunjung Han; Heidi G. Parker; Pascale Quignon; Jeremiah D. Degenhardt; Adam R. Boyko; Dent Earl; Adam Auton; Andrew R. Reynolds; Kasia Bryc; Abra Brisbin; James C. Knowles; Dana S. Mosher; Tyrone C. Spady; Abdel G. Elkahloun; Eli Geffen; Malgorzata Pilot; Włodzimierz Jędrzejewski; Claudia Greco; Ettore Randi; Danika L. Bannasch; Alan N. Wilton; Jeremy Shearman; Marco Musiani; Michelle Cargill; Paul Glyn Jones; Zuwei Qian; Wei Huang

Advances in genome technology have facilitated a new understanding of the historical and genetic processes crucial to rapid phenotypic evolution under domestication. To understand the process of dog diversification better, we conducted an extensive genome-wide survey of more than 48,000 single nucleotide polymorphisms in dogs and their wild progenitor, the grey wolf. Here we show that dog breeds share a higher proportion of multi-locus haplotypes unique to grey wolves from the Middle East, indicating that they are a dominant source of genetic diversity for dogs rather than wolves from east Asia, as suggested by mitochondrial DNA sequence data. Furthermore, we find a surprising correspondence between genetic and phenotypic/functional breed groupings but there are exceptions that suggest phenotypic diversification depended in part on the repeated crossing of individuals with novel phenotypes. Our results show that Middle Eastern wolves were a critical source of genome diversity, although interbreeding with local wolf populations clearly occurred elsewhere in the early history of specific lineages. More recently, the evolution of modern dog breeds seems to have been an iterative process that drew on a limited genetic toolkit to create remarkable phenotypic diversity.

Nature Genetics | 2008

Interpreting principal component analyses of spatial population genetic variation

John Novembre; Matthew Stephens

Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events. These interpretations have been controversial, but influential, and the use of PCA has become widespread in analysis of population genetics data. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.

PLOS Genetics | 2009

The Role of Geography in Human Adaptation

Graham Coop; Joseph K. Pickrell; John Novembre; Sridhar Kudaravalli; Jun Li; Devin Absher; Richard M. Myers; Luigi Luca Cavalli-Sforza; Marcus W. Feldman; Jonathan K. Pritchard

Various observations argue for a role of adaptation in recent human evolution, including results from genome-wide studies and analyses of selection signals at candidate genes. Here, we use genome-wide SNP data from the HapMap and CEPH-Human Genome Diversity Panel samples to study the geographic distributions of putatively selected alleles at a range of geographic scales. We find that the average allele frequency divergence is highly predictive of the most extreme FST values across the whole genome. On a broad scale, the geographic distribution of putatively selected alleles almost invariably conforms to population clusters identified using randomly chosen genetic markers. Given this structure, there are surprisingly few fixed or nearly fixed differences between human populations. Among the nearly fixed differences that do exist, nearly all are due to fixation events that occurred outside of Africa, and most appear in East Asia. These patterns suggest that selection is often weak enough that neutral processes—especially population history, migration, and drift—exert powerful influences over the fate and geographic distribution of selected alleles.

PLOS Biology | 2010

A Simple Genetic Architecture Underlies Morphological Variation in Dogs

Adam R. Boyko; Pascale Quignon; Lin Li; Jeffrey J. Schoenebeck; Jeremiah D. Degenhardt; Kirk E. Lohmueller; Keyan Zhao; Abra Brisbin; Heidi G. Parker; Bridgett M. vonHoldt; Michele Cargill; Adam Auton; Andrew R. Reynolds; Abdel G. Elkahloun; Marta Castelhano; Dana S. Mosher; Nathan B. Sutter; Gary S. Johnson; John Novembre; Melissa J. Hubisz; Adam Siepel; Robert K. Wayne; Carlos Bustamante; Elaine A. Ostrander

The largest genetic study to date of morphology in domestic dogs identifies genes controlling nearly 100 morphological traits and identifies important trends in phenotypic variation within this species.

PLOS Genetics | 2014

Genome Sequencing Highlights the Dynamic Early History of Dogs

Adam H. Freedman; Ilan Gronau; Rena M. Schweizer; Diego Ortega-Del Vecchyo; Eunjung Han; Pedro Miguel Silva; Marco Galaverni; Zhenxin Fan; Peter Marx; Belen Lorente-Galdos; Holly C. Beale; Oscar Ramirez; Farhad Hormozdiari; Can Alkan; Carles Vilà; Kevin Squire; Eli Geffen; Josip Kusak; Adam R. Boyko; Heidi G. Parker; Clarence Lee; Vasisht Tadigotla; Adam Siepel; Carlos Bustamante; Timothy T. Harkins; Stanley F. Nelson; Elaine A. Ostrander; Tomas Marques-Bonet; Robert K. Wayne; John Novembre

To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11–16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary.

American Journal of Human Genetics | 2008

The Population Reference Sample, POPRES: A Resource for Population, Disease, and Pharmacological Genetics Research

Matthew R. Nelson; Katarzyna Bryc; Karen S. King; Amit Indap; Adam R. Boyko; John Novembre; Linda P. Briley; Yuka Maruyama; Dawn M. Waterworth; Gérard Waeber; Peter Vollenweider; Jorge R. Oksenberg; Stephen L. Hauser; Heide A. Stirnadel; Jaspal S. Kooner; John Chambers; Brendan Jones; Vincent Mooser; Carlos Bustamante; Allen D. Roses; Daniel K. Burns; Margaret G. Ehm; Eric Lai

Technological and scientific advances, stemming in large part from the Human Genome and HapMap projects, have made large-scale, genome-wide investigations feasible and cost effective. These advances have the potential to dramatically impact drug discovery and development by identifying genetic factors that contribute to variation in disease risk as well as drug pharmacokinetics, treatment efficacy, and adverse drug reactions. In spite of the technological advancements, successful application in biomedical research would be limited without access to suitable sample collections. To facilitate exploratory genetics research, we have assembled a DNA resource from a large number of subjects participating in multiple studies throughout the world. This growing resource was initially genotyped with a commercially available genome-wide 500,000 single-nucleotide polymorphism panel. This project includes nearly 6,000 subjects of African-American, East Asian, South Asian, Mexican, and European origin. Seven informative axes of variation identified via principal-component analysis (PCA) of these data confirm the overall integrity of the data and highlight important features of the genetic structure of diverse populations. The potential value of such extensively genotyped collections is illustrated by selection of genetically matched population controls in a genome-wide analysis of abacavir-associated hypersensitivity reaction. We find that matching based on country of origin, identity-by-state distance, and multidimensional PCA do similarly well to control the type I error rate. The genotype and demographic data from this reference sample are freely available through the NCBI database of Genotypes and Phenotypes (dbGaP).

Explore More