Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Noah Zaitlen is active.

Publication


Featured researches published by Noah Zaitlen.


Nature Genetics | 2010

Variance component model to account for sample structure in genome-wide association studies

Hyun Min Kang; Jae Hoon Sul; Noah Zaitlen; Sit Yee Kong; Nelson B. Freimer; Chiara Sabatti; Eleazar Eskin

Although genome-wide association studies (GWASs) have identified numerous loci associated with complex traits, imprecise modeling of the genetic relatedness within study samples may cause substantial inflation of test statistics and possibly spurious associations. Variance component approaches, such as efficient mixed-model association (EMMA), can correct for a wide range of sample structures by explicitly accounting for pairwise relatedness between individuals, using high-density markers to model the phenotype distribution; but such approaches are computationally impractical. We report here a variance component approach implemented in publicly available software, EMMA eXpedited (EMMAX), that reduces the computational time for analyzing large GWAS data sets from years to hours. We apply this method to two human GWAS data sets, performing association analysis for ten quantitative traits from the Northern Finland Birth Cohort and seven common diseases from the Wellcome Trust Case Control Consortium. We find that EMMAX outperforms both principal component analysis and genomic control in correcting for sample structure.


Genetics | 2008

Efficient Control of Population Structure in Model Organism Association Mapping

Hyun Min Kang; Noah Zaitlen; Claire M. Wade; Andrew Kirby; David Heckerman; Mark J. Daly; Eleazar Eskin

Genomewide association mapping in model organisms such as inbred mouse strains is a promising approach for the identification of risk factors related to human diseases. However, genetic association studies in inbred model organisms are confronted by the problem of complex population structure among strains. This induces inflated false positive rates, which cannot be corrected using standard approaches applied in human association studies such as genomic control or structured association. Recent studies demonstrated that mixed models successfully correct for the genetic relatedness in association mapping in maize and Arabidopsis panel data sets. However, the currently available mixed-model methods suffer from computational inefficiency. In this article, we propose a new method, efficient mixed-model association (EMMA), which corrects for population structure and genetic relatedness in model organism association mapping. Our method takes advantage of the specific nature of the optimization problem in applying mixed models for association mapping, which allows us to substantially increase the computational speed and reliability of the results. We applied EMMA to in silico whole-genome association mapping of inbred mouse strains involving hundreds of thousands of SNPs, in addition to Arabidopsis and maize data sets. We also performed extensive simulation studies to estimate the statistical power of EMMA under various SNP effects, varying degrees of population structure, and differing numbers of multiple measurements per strain. Despite the limited power of inbred mouse association mapping due to the limited number of available inbred strains, we are able to identify significantly associated SNPs, which fall into known QTL or genes identified through previous studies while avoiding an inflation of false positives. An R package implementation and webserver of our EMMA method are publicly available.


Nature Genetics | 2009

Genome-wide association analysis of metabolic traits in a birth cohort from a founder population.

Chiara Sabatti; Anna-Liisa Hartikainen; Anneli Pouta; Samuli Ripatti; Jae Brodsky; Christopher Jones; Noah Zaitlen; Teppo Varilo; Marika Kaakinen; Ulla Sovio; Aimo Ruokonen; Jaana Laitinen; Eveliina Jakkula; Lachlan Coin; Clive J. Hoggart; Andrew Collins; Hannu Turunen; Stacey Gabriel; Paul Elliot; Mark I. McCarthy; Mark J. Daly; Marjo-Riitta Järvelin; Nelson B. Freimer; Leena Peltonen

Genome-wide association studies (GWAS) of longitudinal birth cohorts enable joint investigation of environmental and genetic influences on complex traits. We report GWAS results for nine quantitative metabolic traits (triglycerides, high-density lipoprotein, low-density lipoprotein, glucose, insulin, C-reactive protein, body mass index, and systolic and diastolic blood pressure) in the Northern Finland Birth Cohort 1966 (NFBC1966), drawn from the most genetically isolated Finnish regions. We replicate most previously reported associations for these traits and identify nine new associations, several of which highlight genes with metabolic functions: high-density lipoprotein with NR1H3 (LXRA), low-density lipoprotein with AR and FADS1-FADS2, glucose with MTNR1B, and insulin with PANK1. Two of these new associations emerged after adjustment of results for body mass index. Gene–environment interaction analyses suggested additional associations, which will require validation in larger samples. The currently identified loci, together with quantified environmental exposures, explain little of the trait variation in NFBC1966. The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.


Nature Reviews Genetics | 2010

New approaches to population stratification in genome-wide association studies

Alkes L. Price; Noah Zaitlen; David Reich; Nick Patterson

Genome-wide association (GWA) studies are an effective approach for identifying genetic variants associated with disease risk. GWA studies can be confounded by population stratification — systematic ancestry differences between cases and controls — which has previously been addressed by methods that infer genetic ancestry. Those methods perform well in data sets in which population structure is the only kind of structure present but are inadequate in data sets that also contain family structure or cryptic relatedness. Here, we review recent progress on methods that correct for stratification while accounting for these additional complexities. systematic ancestry differences between cases and controls — which has previously been addressed by methods that infer genetic ancestry. Those methods perform well in data sets in which population structure is the only kind of structure present but are inadequate in data sets that also contain family structure or cryptic relatedness. Here, we review recent progress on methods that correct for stratification while accounting for these additional complexities.


Nature Genetics | 2014

Advantages and pitfalls in the application of mixed model association methods

Jian Yang; Noah Zaitlen; Michael E. Goddard; Peter M. Visscher; Alkes L. Price

Mixed linear models are emerging as a method of choice for conducting genetic association studies in humans and other organisms. The advantages of the mixed-linear-model association (MLMA) method include the prevention of false positive associations due to population or relatedness structure and an increase in power obtained through the application of a correction that is specific to this structure. An underappreciated point is that MLMA can also increase power in studies without sample structure by implicitly conditioning on associated loci other than the candidate locus. Numerous variations on the standard MLMA approach have recently been published, with a focus on reducing computational cost. These advances provide researchers applying MLMA methods with many options to choose from, but we caution that MLMA methods are still subject to potential pitfalls. Here we describe and quantify the advantages and pitfalls of MLMA methods as a function of study design and provide recommendations for the application of these methods in practical settings.


PLOS Genetics | 2013

Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits.

Noah Zaitlen; P. Kraft; Nick Patterson; Bogdan Pasaniuc; Gaurav Bhatia; Samuela Pollack; Alkes L. Price

Important knowledge about the determinants of complex human phenotypes can be obtained from the estimation of heritability, the fraction of phenotypic variation in a population that is determined by genetic factors. Here, we make use of extensive phenotype data in Iceland, long-range phased genotypes, and a population-wide genealogical database to examine the heritability of 11 quantitative and 12 dichotomous phenotypes in a sample of 38,167 individuals. Most previous estimates of heritability are derived from family-based approaches such as twin studies, which may be biased upwards by epistatic interactions or shared environment. Our estimates of heritability, based on both closely and distantly related pairs of individuals, are significantly lower than those from previous studies. We examine phenotypic correlations across a range of relationships, from siblings to first cousins, and find that the excess phenotypic correlation in these related individuals is predominantly due to shared environment as opposed to dominance or epistasis. We also develop a new method to jointly estimate narrow-sense heritability and the heritability explained by genotyped SNPs. Unlike existing methods, this approach permits the use of information from both closely and distantly related pairs of individuals, thereby reducing the variance of estimates of heritability explained by genotyped SNPs while preventing upward bias. Our results show that common SNPs explain a larger proportion of the heritability than previously thought, with SNPs present on Illumina 300K genotyping arrays explaining more than half of the heritability for the 23 phenotypes examined in this study. Much of the remaining heritability is likely to be due to rare alleles that are not captured by standard genotyping arrays.


Nature Genetics | 2012

Extremely low-coverage sequencing and imputation increases power for genome-wide association studies

Bogdan Pasaniuc; Nadin Rohland; Paul J. McLaren; Kiran Garimella; Noah Zaitlen; Heng Li; Namrata Gupta; Benjamin M. Neale; Mark J. Daly; Pamela Sklar; Patrick F. Sullivan; Sarah E. Bergen; Jennifer L. Moran; Christina M. Hultman; Paul Lichtenstein; Patrik K. E. Magnusson; Shaun Purcell; David W. Haas; Liming Liang; Shamil R. Sunyaev; Nick Patterson; Paul I. W. de Bakker; David Reich; Alkes L. Price

Genome-wide association studies (GWAS) have proven to be a powerful method to identify common genetic variants contributing to susceptibility to common diseases. Here, we show that extremely low-coverage sequencing (0.1–0.5×) captures almost as much of the common (>5%) and low-frequency (1–5%) variation across the genome as SNP arrays. As an empirical demonstration, we show that genome-wide SNP genotypes can be inferred at a mean r2 of 0.71 using off-target data (0.24× average coverage) in a whole-exome study of 909 samples. Using both simulated and real exome-sequencing data sets, we show that association statistics obtained using extremely low-coverage sequencing data attain similar P values at known associated variants as data from genotyping arrays, without an excess of false positives. Within the context of reductions in sample preparation and sequencing costs, funds invested in extremely low-coverage sequencing can yield several times the effective sample size of GWAS based on SNP array data and a commensurate increase in statistical power.


American Journal of Human Genetics | 2010

Leveraging Genetic Variability across Populations for the Identification of Causal Variants

Noah Zaitlen; Bogdan Pasaniuc; Tom Gur; Elad Ziv; Eran Halperin

Genome-wide association studies have been performed extensively in the last few years, resulting in many new discoveries of genomic regions that are associated with complex traits. It is often the case that a SNP found to be associated with the condition is not the causal SNP, but a proxy to it as a result of linkage disequilibrium. For the identification of the actual causal SNP, fine-mapping follow-up is performed, either with the use of dense genotyping or by sequencing of the region. In either case, if the causal SNP is in high linkage disequilibrium with other SNPs, the fine-mapping procedure will require a very large sample size for the identification of the causal SNP. Here, we show that by leveraging genetic variability across populations, we significantly increase the localization success rate (LSR) for a causal SNP in a follow-up study that involves multiple populations as compared to a study that involves only one population. Thus, the average power for detection of the causal variant will be higher in a joint analysis than that in studies in which only one population is analyzed at a time. On the basis of this observation, we developed a framework to efficiently search for a follow-up study design: our framework searches for the best combination of populations from a pool of available populations to maximize the LSR for detection of a causal variant. This framework and its accompanying software can be used to considerably enhance the power of fine-mapping studies.


Human Genetics | 2012

Heritability in the genome-wide association era

Noah Zaitlen; Peter Kraft

Heritability, the fraction of phenotypic variation explained by genetic variation, has been estimated for many phenotypes in a range of populations, organisms, and time points. The recent development of efficient genotyping and sequencing technology has led researchers to attempt to identify the genetic variants responsible for the genetic component of phenotype directly via GWAS. The gap between the phenotypic variance explained by GWAS results and those estimated from classical heritability methods has been termed the “missing heritability problem”. In this work, we examine modern methods for estimating heritability, which use the genotype and sequence data directly. We discuss them in the context of classical heritability methods, the missing heritability problem, and describe their implications for understanding the genetic architecture of complex phenotypes.


PLOS Genetics | 2011

Enhanced Statistical Tests for GWAS in Admixed Populations: Assessment using African Americans from CARe and a Breast Cancer Consortium

Bogdan Pasaniuc; Noah Zaitlen; Guillaume Lettre; Gary K. Chen; Arti Tandon; W.H. Linda Kao; Ingo Ruczinski; Myriam Fornage; David S. Siscovick; Xiaofeng Zhu; Emma K. Larkin; Leslie A. Lange; L. Adrienne Cupples; Qiong Yang; Ermeg L. Akylbekova; Solomon K. Musani; Jasmin Divers; Joe Mychaleckyj; Mingyao Li; George J. Papanicolaou; Robert C. Millikan; Christine B. Ambrosone; Esther M. John; Leslie Bernstein; Wei Zheng; Jennifer J. Hu; Regina G. Ziegler; Sarah J. Nyante; Elisa V. Bandera; Sue A. Ingles

While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations.

Collaboration


Dive into the Noah Zaitlen's collaboration.

Top Co-Authors

Avatar

Eleazar Eskin

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Celeste Eng

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eran Halperin

University of California

View shared research outputs
Top Co-Authors

Avatar

Donglei Hu

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge