Alkes L. Price | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alkes L. Price is active.

Explore More

Publication

Featured researches published by Alkes L. Price.

Nature Genetics | 2006

Principal components analysis corrects for stratification in genome-wide association studies

Alkes L. Price; Nick Patterson; Robert M. Plenge; Michael E. Weinblatt; Nancy A. Shadick; David Reich

Population stratification—allele frequency differences between cases and controls due to systematic ancestry differences—can cause spurious associations in disease studies. We describe a method that enables explicit detection and correction of population stratification on a genome-wide scale. Our method uses principal components analysis to explicitly model ancestry differences between cases and controls. The resulting correction is specific to a candidate markers variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers.

PLOS Genetics | 2006

Population structure and eigenanalysis.

Nick Patterson; Alkes L. Price; David Reich

Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general “phase change” phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like FST) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.

Nature | 2010

Integrating common and rare genetic variation in diverse human populations.

David Altshuler; Richard A. Gibbs; Leena Peltonen; Emmanouil T. Dermitzakis; Stephen F. Schaffner; Fuli Yu; Penelope E. Bonnen; de Bakker Pi; Panos Deloukas; Stacey Gabriel; R. Gwilliam; Sarah Hunt; Michael Inouye; Xiaoming Jia; Aarno Palotie; Melissa Parkin; Pamela Whittaker; Kyle Chang; Alicia Hawes; Lora Lewis; Yanru Ren; David A. Wheeler; Donna M. Muzny; C. Barnes; Katayoon Darvishi; Joshua M. Korn; Kristiansson K; Cin-Ty A. Lee; McCarrol Sa; James Nemesh

Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called ‘HapMap 3’, includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of ≤5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.

Nature | 2009

Reconstructing Indian population history.

David Reich; Kumarasamy Thangaraj; Nick Patterson; Alkes L. Price; Lalji Singh

India has been underrepresented in genome-wide surveys of human variation. We analyse 25 diverse groups in India to provide strong evidence for two ancient populations, genetically divergent, that are ancestral to most Indians today. One, the ‘Ancestral North Indians’ (ANI), is genetically close to Middle Easterners, Central Asians, and Europeans, whereas the other, the ‘Ancestral South Indians’ (ASI), is as distinct from ANI and East Asians as they are from each other. By introducing methods that can estimate ancestry without accurate ancestral populations, we show that ANI ancestry ranges from 39–71% in most Indian groups, and is higher in traditionally upper caste and Indo-European speakers. Groups with only ASI ancestry may no longer exist in mainland India. However, the indigenous Andaman Islanders are unique in being ASI-related groups without ANI ancestry. Allele frequency differences between groups in India are larger than in Europe, reflecting strong founder effects whose signatures have been maintained for thousands of years owing to endogamy. We therefore predict that there will be an excess of recessive diseases in India, which should be possible to screen and map genetically.

intelligent systems in molecular biology | 2005

De novo identification of repeat families in large genomes

Alkes L. Price; Neil C. Jones; Pavel A. Pevzner

MOTIVATION De novo repeat family identification is a challenging algorithmic problem of great practical importance. As the number of genome sequencing projects increases, there is a pressing need to identify the repeat families present in large, newly sequenced genomes. We develop a new method for de novo identification of repeat families via extension of consensus seeds; our method enables a rigorous definition of repeat boundaries, a key issue in repeat analysis. RESULTS Our RepeatScout algorithm is more sensitive and is orders of magnitude faster than RECON, the dominant tool for de novo repeat family identification in newly sequenced genomes. Using RepeatScout, we estimate that approximately 2% of the human genome and 4% of mouse and rat genomes consist of previously unannotated repetitive sequence. AVAILABILITY Source code is available for download at http://www-cse.ucsd.edu/groups/bioinformatics/software.html

Nature Reviews Genetics | 2010

New approaches to population stratification in genome-wide association studies

Alkes L. Price; Noah Zaitlen; David Reich; Nick Patterson

Genome-wide association (GWA) studies are an effective approach for identifying genetic variants associated with disease risk. GWA studies can be confounded by population stratification — systematic ancestry differences between cases and controls — which has previously been addressed by methods that infer genetic ancestry. Those methods perform well in data sets in which population structure is the only kind of structure present but are inadequate in data sets that also contain family structure or cryptic relatedness. Here, we review recent progress on methods that correct for stratification while accounting for these additional complexities. systematic ancestry differences between cases and controls — which has previously been addressed by methods that infer genetic ancestry. Those methods perform well in data sets in which population structure is the only kind of structure present but are inadequate in data sets that also contain family structure or cryptic relatedness. Here, we review recent progress on methods that correct for stratification while accounting for these additional complexities.

Nature Genetics | 2015

LD Score regression distinguishes confounding from polygenicity in genome-wide association studies

Brendan Bulik-Sullivan; Po-Ru Loh; Hilary Finucane; Stephan Ripke; Jian Yang; Nick Patterson; Mark J. Daly; Alkes L. Price; Benjamin M. Neale

Both polygenicity (many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from a true polygenic signal and bias. We have developed an approach, LD Score regression, that quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD). The LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of the inflation in test statistics in many GWAS of large sample size.

Nature Genetics | 2007

Two independent alleles at 6q23 associated with risk of rheumatoid arthritis

Robert M. Plenge; Chris Cotsapas; Leela Davies; Alkes L. Price; Paul I. W. de Bakker; Julian Maller; Itsik Pe'er; Noël P. Burtt; Brendan Blumenstiel; Matt DeFelice; Melissa Parkin; Rachel Barry; Wendy Winslow; Claire Healy; Robert R. Graham; Benjamin M. Neale; Elena Izmailova; Ronenn Roubenoff; Alex Parker; Roberta Glass; Elizabeth W. Karlson; Nancy E. Maher; David A. Hafler; David M. Lee; Michael F. Seldin; Elaine F. Remmers; Annette Lee; Leonid Padyukov; Lars Alfredsson; Jonathan S. Coblyn

To identify susceptibility alleles associated with rheumatoid arthritis, we genotyped 397 individuals with rheumatoid arthritis for 116,204 SNPs and carried out an association analysis in comparison to publicly available genotype data for 1,211 related individuals from the Framingham Heart Study. After evaluating and adjusting for technical and population biases, we identified a SNP at 6q23 (rs10499194, ∼150 kb from TNFAIP3 and OLIG3) that was reproducibly associated with rheumatoid arthritis both in the genome-wide association (GWA) scan and in 5,541 additional case-control samples (P = 10−3, GWA scan; P < 10−6, replication; P = 10−9, combined). In a concurrent study, the Wellcome Trust Case Control Consortium (WTCCC) has reported strong association of rheumatoid arthritis susceptibility to a different SNP located 3.8 kb from rs10499194 (rs6920220; P = 5 × 10−6 in WTCCC). We show that these two SNP associations are statistically independent, are each reproducible in the comparison of our data and WTCCC data, and define risk and protective haplotypes for rheumatoid arthritis at 6q23.

Nature Genetics | 2015

An atlas of genetic correlations across human diseases and traits

Brendan Bulik-Sullivan; Hilary Finucane; Verneri Anttila; Alexander Gusev; Felix R. Day; Po-Ru Loh; Laramie Duncan; John Perry; Nick Patterson; Elise B. Robinson; Mark J. Daly; Alkes L. Price; Benjamin M. Neale

Identifying genetic correlations between complex traits and diseases can provide useful etiological insights and help prioritize likely causal relationships. The major challenges preventing estimation of genetic correlation from genome-wide association study (GWAS) data with current methods are the lack of availability of individual-level genotype data and widespread sample overlap among meta-analyses. We circumvent these difficulties by introducing a technique—cross-trait LD Score regression—for estimating genetic correlation that requires only GWAS summary statistics and is not biased by sample overlap. We use this method to estimate 276 genetic correlations among 24 traits. The results include genetic correlations between anorexia nervosa and schizophrenia, anorexia and obesity, and educational attainment and several diseases. These results highlight the power of genome-wide analyses, as there currently are no significantly associated SNPs for anorexia nervosa and only three for educational attainment.

PLOS ONE | 2008

Concept, design and implementation of a cardiovascular gene-centric 50 k SNP array for large-scale genomic association studies.

Brendan J. Keating; Sam E. Tischfield; Sarah S. Murray; Tushar Bhangale; Thomas S. Price; Joseph T. Glessner; Luana Galver; Jeffrey C. Barrett; Struan F. A. Grant; Deborah N. Farlow; Hareesh R. Chandrupatla; Mark Hansen; Saad Ajmal; George J. Papanicolaou; Yiran Guo; Mingyao Li; Paul I. W. de Bakker; Swneke D. Bailey; Alexandre Montpetit; Andrew C. Edmondson; Kent D. Taylor; Xiaowu Gai; Susanna S. Wang; Myriam Fornage; Tamim H. Shaikh; Leif Groop; Michael Boehnke; Alistair S. Hall; Andrew T. Hattersley; Edward C. Frackelton

A wealth of genetic associations for cardiovascular and metabolic phenotypes in humans has been accumulating over the last decade, in particular a large number of loci derived from recent genome wide association studies (GWAS). True complex disease-associated loci often exert modest effects, so their delineation currently requires integration of diverse phenotypic data from large studies to ensure robust meta-analyses. We have designed a gene-centric 50 K single nucleotide polymorphism (SNP) array to assess potentially relevant loci across a range of cardiovascular, metabolic and inflammatory syndromes. The array utilizes a “cosmopolitan” tagging approach to capture the genetic diversity across ∼2,000 loci in populations represented in the HapMap and SeattleSNPs projects. The array content is informed by GWAS of vascular and inflammatory disease, expression quantitative trait loci implicated in atherosclerosis, pathway based approaches and comprehensive literature searching. The custom flexibility of the array platform facilitated interrogation of loci at differing stringencies, according to a gene prioritization strategy that allows saturation of high priority loci with a greater density of markers than the existing GWAS tools, particularly in African HapMap samples. We also demonstrate that the IBC array can be used to complement GWAS, increasing coverage in high priority CVD-related loci across all major HapMap populations. DNA from over 200,000 extensively phenotyped individuals will be genotyped with this array with a significant portion of the generated data being released into the academic domain facilitating in silico replication attempts, analyses of rare variants and cross-cohort meta-analyses in diverse populations. These datasets will also facilitate more robust secondary analyses, such as explorations with alternative genetic models, epistasis and gene-environment interactions.

Explore More