Eric S. Torstenson
Vanderbilt University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eric S. Torstenson.
PLOS Genetics | 2013
Sarah A. Pendergrass; Kristin Brown-Gentry; Scott M. Dudek; Alex T. Frase; Eric S. Torstenson; Robert Goodloe; José Luis Ambite; Christy L. Avery; Steve Buyske; Petra Bůžková; Ewa Deelman; Megan D. Fesinmeyer; Christopher A. Haiman; Gerardo Heiss; Lucia A. Hindorff; Chu Nan Hsu; Rebecca D. Jackson; Charles Kooperberg; Loic Le Marchand; Yi Lin; Tara C. Matise; Kristine R. Monroe; Larry W. Moreland; Sungshim Lani Park; Alex P. Reiner; Robert B. Wallace; Lynn R. Wilkens; Dana C. Crawford; Marylyn D. Ritchie
Using a phenome-wide association study (PheWAS) approach, we comprehensively tested genetic variants for association with phenotypes available for 70,061 study participants in the Population Architecture using Genomics and Epidemiology (PAGE) network. Our aim was to better characterize the genetic architecture of complex traits and identify novel pleiotropic relationships. This PheWAS drew on five population-based studies representing four major racial/ethnic groups (European Americans (EA), African Americans (AA), Hispanics/Mexican-Americans, and Asian/Pacific Islanders) in PAGE, each site with measurements for multiple traits, associated laboratory measures, and intermediate biomarkers. A total of 83 single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) were genotyped across two or more PAGE study sites. Comprehensive tests of association, stratified by race/ethnicity, were performed, encompassing 4,706 phenotypes mapped to 105 phenotype-classes, and association results were compared across study sites. A total of 111 PheWAS results had significant associations for two or more PAGE study sites with consistent direction of effect with a significance threshold of p<0.01 for the same racial/ethnic group, SNP, and phenotype-class. Among results identified for SNPs previously associated with phenotypes such as lipid traits, type 2 diabetes, and body mass index, 52 replicated previously published genotype–phenotype associations, 26 represented phenotypes closely related to previously known genotype–phenotype associations, and 33 represented potentially novel genotype–phenotype associations with pleiotropic effects. The majority of the potentially novel results were for single PheWAS phenotype-classes, for example, for CDKN2A/B rs1333049 (previously associated with type 2 diabetes in EA) a PheWAS association was identified for hemoglobin levels in AA. Of note, however, GALNT2 rs2144300 (previously associated with high-density lipoprotein cholesterol levels in EA) had multiple potentially novel PheWAS associations, with hypertension related phenotypes in AA and with serum calcium levels and coronary artery disease phenotypes in EA. PheWAS identifies associations for hypothesis generation and exploration of the genetic architecture of complex traits.
Genetic Epidemiology | 2011
Sarah A. Pendergrass; Kristin Brown-Gentry; Scott M. Dudek; Eric S. Torstenson; José Luis Ambite; Christy L. Avery; Steven Buyske; C. Cai; Megan D. Fesinmeyer; Christopher A. Haiman; Gerardo Heiss; Lucia A. Hindorff; Chun-Nan Hsu; Rebecca D. Jackson; Charles Kooperberg; Loic Le Marchand; Yi Lin; Tara C. Matise; Larry W. Moreland; Kristine R. Monroe; Alex P. Reiner; Robert B. Wallace; Lynne R. Wilkens; Dana C. Crawford; Marylyn D. Ritchie
The field of phenomics has been investigating network structure among large arrays of phenotypes, and genome‐wide association studies (GWAS) have been used to investigate the relationship between genetic variation and single diseases/outcomes. A novel approach has emerged combining both the exploration of phenotypic structure and genotypic variation, known as the phenome‐wide association study (PheWAS). The Population Architecture using Genomics and Epidemiology (PAGE) network is a National Human Genome Research Institute (NHGRI)‐supported collaboration of four groups accessing eight extensively characterized epidemiologic studies. The primary focus of PAGE is deep characterization of well‐replicated GWAS variants and their relationships to various phenotypes and traits in diverse epidemiologic studies that include European Americans, African Americans, Mexican Americans/Hispanics, Asians/Pacific Islanders, and Native Americans. The rich phenotypic resources of PAGE studies provide a unique opportunity for PheWAS as each genotyped variant can be tested for an association with the wide array of phenotypic measurements available within the studies of PAGE, including prevalent and incident status for multiple common clinical conditions and risk factors, as well as clinical parameters and intermediate biomarkers. The results of PheWAS can be used to discover novel relationships between SNPs, phenotypes, and networks of interrelated phenotypes; identify pleiotropy; provide novel mechanistic insights; and foster hypothesis generation. The PAGE network has developed infrastructure to support and perform PheWAS in a high‐throughput manner. As implementing the PheWAS approach has presented several challenges, the infrastructure and methodology, as well as insights gained in this project, are presented herein to benefit the larger scientific community. Genet. Epidemiol. 2011.
PLOS Genetics | 2013
David C. Samuels; Chun Li; Bingshan Li; Zhuo Song; Eric S. Torstenson; Hayley B. Clay; Antonis Rokas; Tricia A. Thornton-Wells; Jason H. Moore; Tia M. Hughes; Robert D. Hoffman; Jonathan L. Haines; Deborah G. Murdock; Douglas P. Mortlock; Scott M. Williams
Mitochondrial DNA (mtDNA) variation can affect phenotypic variation; therefore, knowing its distribution within and among individuals is of importance to understanding many human diseases. Intra-individual mtDNA variation (heteroplasmy) has been generally assumed to be random. We used massively parallel sequencing to assess heteroplasmy across ten tissues and demonstrate that in unrelated individuals there are tissue-specific, recurrent mutations. Certain tissues, notably kidney, liver and skeletal muscle, displayed the identical recurrent mutations that were undetectable in other tissues in the same individuals. Using RFLP analyses we validated one of the tissue-specific mutations in the two sequenced individuals and replicated the patterns in two additional individuals. These recurrent mutations all occur within or in very close proximity to sites that regulate mtDNA replication, strongly implying that these variations alter the replication dynamics of the mutated mtDNA genome. These recurrent variants are all independent of each other and do not occur in the mtDNA coding regions. The most parsimonious explanation of the data is that these frequently repeated mutations experience tissue-specific positive selection, probably through replication advantage.
Journal of the American Medical Informatics Association | 2012
Carrie C. Buchanan; Eric S. Torstenson; William S. Bush; Marylyn D. Ritchie
Background Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5% in one or more ethnic groups). HapMap along with advances in genotyping technology led to genome-wide association studies which have identified common variants associated with many traits and diseases. In 2008 the 1000 Genomes Project aimed to sequence 2500 individuals and identify rare variants and 99% of variants with a MAF of <1%. Methods To determine whether the 1000 Genomes Project includes all the variants in HapMap, we examined the overlap between single nucleotide polymorphisms (SNPs) genotyped in the two resources using merged phase II/III HapMap data and low coverage pilot data from 1000 Genomes. Results Comparison of the two data sets showed that approximately 72% of HapMap SNPs were also found in 1000 Genomes Project pilot data. After filtering out HapMap variants with a MAF of <5% (separately for each population), 99% of HapMap SNPs were found in 1000 Genomes data. Conclusions Not all variants cataloged in HapMap are also cataloged in 1000 Genomes. This could affect decisions about which resource to use for SNP queries, rare variant validation, or imputation. Both the HapMap and 1000 Genomes Project databases are useful resources for human genetics, but it is important to understand the assumptions made and filtering strategies employed by these projects.
Human Genetics | 2011
Brian L. Yaspan; William S. Bush; Eric S. Torstenson; Deqiong Ma; Margaret A. Pericak-Vance; Marylyn D. Ritchie; James S. Sutcliffe; Jonathan L. Haines
Genome Wide Association Studies (GWAS) are a standard approach for large-scale common variation characterization and for identification of single loci predisposing to disease. However, due to issues of moderate sample sizes and particularly multiple testing correction, many variants of smaller effect size are not detected within a single allele analysis framework. Thus, small main effects and potential epistatic effects are not consistently observed in GWAS using standard analytical approaches that consider only single SNP alleles. Here, we propose unique methodology that aggregates variants of interest (for example, genes in a biological pathway) using GWAS results. Multiple testing and type I error concerns are minimized using empirical genomic randomization to estimate significance. Randomization corrects for common pathway-based analysis biases, such as SNP coverage and density, linkage disequilibrium, gene size and pathway size. Pathway Analysis by Randomization Incorporating Structure (PARIS) applies this randomization and in doing so directly accounts for linkage disequilibrium effects. PARIS is independent of association analysis method and is thus applicable to GWAS datasets of all study designs. Using the KEGG database as an example, we apply PARIS to the publicly available Autism Genetic Resource Exchange GWAS dataset, revealing pathways with a significant enrichment of positive association results.
bioRxiv | 2016
Alvaro N. Barbeira; Scott P. Dickinson; Jason Torres; Eric S. Torstenson; Jiamao Zheng; Heather E. Wheeler; Kaanan P. Shah; Todd L. Edwards; Dan L. Nicolae; Nancy J. Cox; Hae Kyung Im
Scalable, integrative methods to understand mechanisms that link genetic variants with phenotypes are needed. Here we derive a mathematical expression to compute PrediXcan (a gene mapping approach) results using summary data (S-PrediXcan) and show its accuracy and general robustness to misspecified reference sets. We apply this framework to 44 GTEx tissues and 100+ phenotypes from GWAS and meta-analysis studies, creating a growing public catalog of associations that seeks to capture the effects of gene expression variation on human phenotypes. Replication in an independent cohort is shown. Most of the associations were tissue specific, suggesting context specificity of the trait etiology. Colocalized significant associations in unexpected tissues underscore the need for an agnostic scanning of multiple contexts to improve our ability to detect causal regulatory mechanisms. Monogenic disease genes are enriched among significant associations for related traits, suggesting that smaller alterations of these genes may cause a spectrum of milder phenotypes.To gain biological insight into the discoveries made by GWAS and meta-analysis studies, effective integration of functional data generated by large-scale efforts such as the GTEx Project is needed. PrediXcan is a gene-level approach that addresses this need by estimating the genetically determined component of gene expression. These predicted expression traits can then be tested for association with phenotype in order to test for mediating role of gene expression levels. Furthermore, due to the polygenic nature of many complex traits, efforts to aggregate multiple GWAS studies and conduct meta-analyses have successfully increased our ability to identify variants of small effect sizes. To take advantage of the results generated by these efforts and to avoid the problems associated with accessing and handling individual-level data (e.g. consent limitations, large computational/storage costs) we have developed an extension of PrediXcan. The new method, MetaXcan, infers the results of PrediXcan using only summary statistics from large-scale GWAS or meta-analyses. Here we show that the concordance between PrediXcan and MetaXcan is excellent when the right reference population is used (R2 > 0.95) and robust to population mismatches (R2 > 0.85). We provide open source local and web-based software for easy implementation through (https://github.com/hakyimlab/MetaXcan)To understand the mechanistic underpinnings of type 2 diabetes (T2D) loci mapped through GWAS, we performed a tissue-specific gene association study in a cohort of over 100K individuals (ncases ≈ 26K, ncontrols ≈ 84K) across 44 human tissues using MetaXcan, a summary statistics extension of PrediXcan. We found that 90 genes significantly (FDR < 0.05) associated with T2D, of which 24 are previously reported T2D genes, 29 are novel in established T2D loci, and 37 are novel genes in novel loci. Of these, 13 reported genes, 15 novel genes in known loci, and 6 genes in novel loci replicated (FDRrep < 0.05) in an independent study (ncases ≈ 10K, ncontrols ≈ 62K). We also found enrichment of significant associations in expected tissues such as liver, pancreas, adipose, and muscle but also in tibial nerve, fibroblasts, and breast. Finally, we found that monogenic diabetes genes are enriched in T2D genes from our analysis suggesting that moderate alterations in monogenic (severe) diabetes genes may promote milder and later onset type 2 diabetes.To understand the biological mechanisms underlying thousands of genetic variants robustly associated with complex traits, scalable methods that integrate GWAS and functional data generated by large-scale efforts are needed. Here we propose a method termed MetaXcan that addresses this need by inferring the downstream consequences of genetically regulated components of molecular traits on complex phenotypes using summary data only. MetaXcan allows multiple causal variants and flexible multivariate models extending the capabilities of existing methods and enabling the testing of more complex processes. As an example application, we trained prediction models of gene expression levels in 44 human tissues and inferred the consequences of their regulation in 40 complex phenotypes. Our examination of this broad set of human tissues revealed many novel genes and re-identified known ones with patterns of regulation in expected as well as unexpected tissues.
evolutionary computation machine learning and data mining in bioinformatics | 2008
Todd L. Edwards; William S. Bush; Stephen D. Turner; Scott M. Dudek; Eric S. Torstenson; Mike Schmidt; Eden R. Martin; Marylyn D. Ritchie
Whole-genome association (WGA) studies are becoming a common tool for the exploration of the genetic components of common disease. The analysis of such large scale data presents unique analytical challenges, including problems of multiple testing, correlated independent variables, and large multivariate model spaces. These issues have prompted the development of novel computational approaches. Thorough, extensive simulation studies are a necessity for methods development work to evaluate the power and validity of novel approaches. Many data simulation packages exist, however, the resulting data is often overly simplistic and does not compare to the complexity of real data; especially with respect to linkage disequilibrium (LD). To overcome this limitation, we have developed genomeSIMLA. GenomeSIMLA is a forward-time population simulation method that can simulate realistic patterns of LD in both family-based and case-control datasets. In this manuscript, we demonstrate how LD patterns of the simulated data change under different population growth curve parameter initialization settings. These results provide guidelines to simulate WGA datasets whose properties resemble the HapMap.
pacific symposium on biocomputing | 2012
William S. Bush; Jonathan Boston; Sarah A. Pendergrass; Logan Dumitrescu; Robert Goodloe; Kristin Brown-Gentry; Sarah Wilson; Bob McClellan; Eric S. Torstenson; Melissa A. Basford; Kylee L. Spencer; Marylyn D. Ritchie; Dana C. Crawford
Genetic association studies have rapidly become a major tool for identifying the genetic basis of common human diseases. The advent of cost-effective genotyping coupled with large collections of samples linked to clinical outcomes and quantitative traits now make it possible to systematically characterize genotype-phenotype relationships in diverse populations and extensive datasets. To capitalize on these advancements, the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) project, as part of the collaborative Population Architecture using Genomics and Epidemiology (PAGE) study, accesses two collections: the National Health and Nutrition Examination Surveys (NHANES) and BioVU, Vanderbilt Universitys biorepository linked to de-identified electronic medical records. We describe herein the workflows for accessing and using the epidemiologic (NHANES) and clinical (BioVU) collections, where each workflow has been customized to reflect the content and data access limitations of each respective source. We also describe the process by which these data are generated, standardized, and shared for meta-analysis among the PAGE study sites. As a specific example of the use of BioVU, we describe the data mining efforts to define cases and controls for genetic association studies of common cancers in PAGE. Collectively, the efforts described here are a generalized outline for many of the successful approaches that can be used in the era of high-throughput genotype-phenotype associations for moving biomedical discovery forward to new frontiers of data generation and analysis.
BMC Proceedings | 2007
Marylyn D. Ritchie; Jacquelaine Bartlett; William S. Bush; Todd L. Edwards; Alison A. Motsinger; Eric S. Torstenson
The identification of susceptibility genes for common, chronic disease presents great challenges. The development of novel statistical and computational methodologies to help identify these genes is an area of great necessity. Much research is ongoing and the Genetic Analysis Workshop (GAW) is a venue for the dissemination and comparison of many of these methods. GAW15 included real data sets to look for disease susceptibility genes for rheumatoid arthritis (RA). RA is a complex, chronic inflammatory disease with several replicated disease genes, but much of the genetic variation in the phenotype remains unexplained. We applied two computational methods, namely multifactor dimensionality reduction (MDR) and grammatical evolution neural networks (GENN), to three data sets from GAW15. While these analytic methods were applied with the intention of detecting of multilocus models of association, both methods identified a strong single locus effect of a single-nucleotide polymorphism (SNP) in PTPN22 that is significantly associated with RA. This SNP has previously been associated with RA in several other published studies. These results demonstrate that both MDR and GENN are capable of identifying a single-locus main effect, in addition to multilocus models of association. This is the first published comparison of the two methods. Because GENN employs an evolutionary computation search strategy in comparison to the exhaustive search strategy of MDR, it is encouraging that the two methods produced similar results. This comparison should be extended in future studies with both simulated and real data.
Biodata Mining | 2011
Benjamin J. Grady; Eric S. Torstenson; Marylyn D. Ritchie
BackgroundIn the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package.ResultsFour relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases.ConclusionsHigher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.