Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Emily Rose Holzinger is active.

Publication


Featured researches published by Emily Rose Holzinger.


American Journal of Human Genetics | 2016

REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants

Nilah M. Ioannidis; Joseph H. Rothstein; Vikas Pejaver; Sumit Middha; Shannon K. McDonnell; Saurabh Baheti; Anthony M. Musolf; Qing Li; Emily Rose Holzinger; Danielle M. Karyadi; Lisa A. Cannon-Albright; Craig Teerlink; Janet L. Stanford; William B. Isaacs; Jianfeng F. Xu; Kathleen A. Cooney; Ethan M. Lange; Johanna Schleutker; John D. Carpten; Isaac J. Powell; Olivier Cussenot; Geraldine Cancel-Tassin; Graham G. Giles; Robert J. MacInnis; Christiane Maier; Chih-Lin Hsieh; Fredrik Wiklund; William J. Catalona; William D. Foulkes; Diptasri Mandal

The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10-12) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.


Pharmacogenetics and Genomics | 2012

Genome-wide association study of plasma efavirenz pharmacokinetics in AIDS Clinical Trials Group protocols implicates several CYP2B6 variants.

Emily Rose Holzinger; Benjamin J. Grady; Marylyn D. Ritchie; Heather J. Ribaudo; Edward P. Acosta; Gene D. Morse; Roy M. Gulick; Gregory K. Robbins; David B. Clifford; Eric S. Daar; Paul J. McLaren; David W. Haas

Objectives Prior candidate gene studies have associated CYP2B6 516G→T [rs3745274] and 983T→C [rs28399499] with increased plasma efavirenz exposure. We sought to identify novel variants associated with efavirenz pharmacokinetics. Materials and methods Antiretroviral therapy-naive AIDS Clinical Trials Group studies A5202, A5095, and ACTG 384 included plasma sampling for efavirenz pharmacokinetics. Log-transformed trough efavirenz concentrations (Cmin) were previously estimated by population pharmacokinetic modeling. Stored DNA was genotyped with Illumina HumanHap 650Y or 1MDuo platforms, complemented by additional targeted genotyping of CYP2B6 and CYP2A6 with MassARRAY iPLEX Gold. Associations were identified by linear regression, which included principal component vectors to adjust for genetic ancestry. Results Among 856 individuals, CYP2B6 516G→T was associated with efavirenz estimated Cmin (P=8.5×10−41). After adjusting for CYP2B6 516G→T, CYP2B6 983T→C was associated (P=9.9×10−11). After adjusting for both CYP2B6 516G→T and 983T→C, a CYP2B6 variant (rs4803419) in intron 3 was associated (P=4.4×10−15). After adjusting for all the three variants, non-CYP2B6 polymorphisms were associated at P-value less than 5×10−8. In a separate cohort of 240 individuals, only the three CYP2B6 polymorphisms replicated. These three polymorphisms explained 34% of interindividual variability in efavirenz estimated Cmin. The extensive metabolizer phenotype was best defined by the absence of all three polymorphisms. Conclusion Three CYP2B6 polymorphisms were independently associated with efavirenz estimated Cmin at genome-wide significance, and explained one-third of interindividual variability. These data will inform continued efforts to translate pharmacogenomic knowledge into optimal efavirenz utilization.


BMC Genetics | 2016

Machine learning and data mining in complex genomic data—a review on the lessons learned in Genetic Analysis Workshop 19

Inke R. König; Jonathan Auerbach; Damian Gola; Elizabeth Held; Emily Rose Holzinger; Marc Andre Legault; Rui Sun; Nathan L. Tintle; Hsin-Chou Yang

In the analysis of current genomic data, application of machine learning and data mining techniques has become more attractive given the rising complexity of the projects. As part of the Genetic Analysis Workshop 19, approaches from this domain were explored, mostly motivated from two starting points. First, assuming an underlying structure in the genomic data, data mining might identify this and thus improve downstream association analyses. Second, computational methods for machine learning need to be developed further to efficiently deal with the current wealth of data.In the course of discussing results and experiences from the machine learning and data mining approaches, six common messages were extracted. These depict the current state of these approaches in the application to complex genomic data. Although some challenges remain for future studies, important forward steps were taken in the integration of different data types and the evaluation of the evidence. Mining the data for underlying genetic or phenotypic structure and using this information in subsequent analyses proved to be extremely helpful and is likely to become of even greater use with more complex data sets.


Bioinformatics | 2014

ATHENA: the analysis tool for heritable and environmental network associations

Emily Rose Holzinger; Scott M. Dudek; Alex T. Frase; Sarah A. Pendergrass; Marylyn D. Ritchie

MOTIVATION Advancements in high-throughput technology have allowed researchers to examine the genetic etiology of complex human traits in a robust fashion. Although genome-wide association studies have identified many novel variants associated with hundreds of traits, a large proportion of the estimated trait heritability remains unexplained. One hypothesis is that the commonly used statistical techniques and study designs are not robust to the complex etiology that may underlie these human traits. This etiology could include non-linear gene × gene or gene × environment interactions. Additionally, other levels of biological regulation may play a large role in trait variability. RESULTS To address the need for computational tools that can explore enormous datasets to detect complex susceptibility models, we have developed a software package called the Analysis Tool for Heritable and Environmental Network Associations (ATHENA). ATHENA combines various variable filtering methods with machine learning techniques to analyze high-throughput categorical (i.e. single nucleotide polymorphisms) and quantitative (i.e. gene expression levels) predictor variables to generate multivariable models that predict either a categorical (i.e. disease status) or quantitative (i.e. cholesterol levels) outcomes. The goal of this article is to demonstrate the utility of ATHENA using simulated and biological datasets that consist of both single nucleotide polymorphisms and gene expression variables to identify complex prediction models. Importantly, this method is flexible and can be expanded to include other types of high-throughput data (i.e. RNA-seq data and biomarker measurements). AVAILABILITY ATHENA is freely available for download. The software, user manual and tutorial can be downloaded from http://ritchielab.psu.edu/ritchielab/software.


pacific symposium on biocomputing | 2012

ATHENA: a tool for meta-dimensional analysis applied to genotypes and gene expression data to predict HDL cholesterol levels.

Emily Rose Holzinger; Scott M. Dudek; Alex T. Frase; Ronald M. Krauss; Marisa W. Medina; Marylyn D. Ritchie

Technology is driving the field of human genetics research with advances in techniques to generate high-throughput data that interrogate various levels of biological regulation. With this massive amount of data comes the important task of using powerful bioinformatics techniques to sift through the noise to find true signals that predict various human traits. A popular analytical method thus far has been the genome-wide association study (GWAS), which assesses the association of single nucleotide polymorphisms (SNPs) with the trait of interest. Unfortunately, GWAS has not been able to explain a substantial proportion of the estimated heritability for most complex traits. Due to the inherently complex nature of biology, this phenomenon could be a factor of the simplistic study design. A more powerful analysis may be a systems biology approach that integrates different types of data, or a meta-dimensional analysis. For this study we used the Analysis Tool for Heritable and Environmental Network Associations (ATHENA) to integrate high-throughput SNPs and gene expression variables (EVs) to predict high-density lipoprotein cholesterol (HDL-C) levels. We generated multivariable models that consisted of SNPs only, EVs only, and SNPs + EVs with testing r-squared values of 0.16, 0.11, and 0.18, respectively. Additionally, using just the SNPs and EVs from the best models, we generated a model with a testing r-squared of 0.32. A linear regression model with the same variables resulted in an adjusted r-squared of 0.23. With this systems biology approach, we were able to integrate different types of high-throughput data to generate meta-dimensional models that are predictive for the HDL-C in our data set. Additionally, our modeling method was able to capture more of the HDL-C variation than a linear regression model that included the same variables.


Biodata Mining | 2016

r2VIM: A new variable selection method for random forests in genome-wide association studies.

Silke Szymczak; Emily Rose Holzinger; Abhijit Dasgupta; James D. Malley; Anne M. Molloy; James L. Mills; Lawrence C. Brody; Dwight Stambolian; Joan E. Bailey-Wilson

BackgroundMachine learning methods and in particular random forests (RFs) are a promising alternative to standard single SNP analyses in genome-wide association studies (GWAS). RFs provide variable importance measures (VIMs) to rank SNPs according to their predictive power. However, in contrast to the established genome-wide significance threshold, no clear criteria exist to determine how many SNPs should be selected for downstream analyses.ResultsWe propose a new variable selection approach, recurrent relative variable importance measure (r2VIM). Importance values are calculated relative to an observed minimal importance score for several runs of RF and only SNPs with large relative VIMs in all of the runs are selected as important. Evaluations on simulated GWAS data show that the new method controls the number of false-positives under the null hypothesis. Under a simple alternative hypothesis with several independent main effects it is only slightly less powerful than logistic regression. In an experimental GWAS data set, the same strong signal is identified while the approach selects none of the SNPs in an underpowered GWAS.ConclusionsThe novel variable selection method r2VIM is a promising extension to standard RF for objectively selecting relevant SNPs in GWAS while controlling the number of false-positive results.


pacific symposium on biocomputing | 2012

Next-generation analysis of cataracts: determining knowledge driven gene-gene interactions using biofilter, and gene-environment interactions using the Phenx Toolkit*.

Sarah A. Pendergrass; Shefali S. Verma; Emily Rose Holzinger; Carrie B. Moore; John R. Wallace; Scott M. Dudek; Wayne Huggins; Terrie Kitchner; Carol Waudby; Richard L. Berg; Catherine A. McCarty; Marylyn D. Ritchie

Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, 2580 cataract cases and 1367 controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) Biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 529,431 single nucleotide polymorphisms (SNPs) with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using the Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 5 statistically significant models with an interaction term with p-value < 0.05, as well as an overall model with p-value < 0.05 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use; these environmental factors have been previously associated with the formation of cataracts. We found a total of 288 models that exhibit an interaction term with a p-value ≤ 1×10(-4) associated with cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.


PLOS ONE | 2014

Genetic variation in iron metabolism Is associated with neuropathic pain and pain severity in HIV- infected patients on antiretroviral therapy

Asha R. Kallianpur; Peilin Jia; Ronald J. Ellis; Zhongming Zhao; Cinnamon S. Bloss; Wanqing Wen; Christina M. Marra; Todd Hulgan; David M. Simpson; Susan Morgello; Justin C. McArthur; David B. Clifford; Ann C. Collier; Benjamin B. Gelman; J. Allen McCutchan; Donald R. Franklin; David C. Samuels; Debralee Rosario; Emily Rose Holzinger; Deborah G. Murdock; Scott Letendre; Igor Grant

HIV sensory neuropathy and distal neuropathic pain (DNP) are common, disabling complications associated with combination antiretroviral therapy (cART). We previously associated iron-regulatory genetic polymorphisms with a reduced risk of HIV sensory neuropathy during more neurotoxic types of cART. We here evaluated the impact of polymorphisms in 19 iron-regulatory genes on DNP in 560 HIV-infected subjects from a prospective, observational study, who underwent neurological examinations to ascertain peripheral neuropathy and structured interviews to ascertain DNP. Genotype-DNP associations were explored by logistic regression and permutation-based analytical methods. Among 559 evaluable subjects, 331 (59%) developed HIV-SN, and 168 (30%) reported DNP. Fifteen polymorphisms in 8 genes (p<0.05) and 5 variants in 4 genes (p<0.01) were nominally associated with DNP: polymorphisms in TF, TFRC, BMP6, ACO1, SLC11A2, and FXN conferred reduced risk (adjusted odds ratios [ORs] ranging from 0.2 to 0.7, all p<0.05); other variants in TF, CP, ACO1, BMP6, and B2M conferred increased risk (ORs ranging from 1.3 to 3.1, all p<0.05). Risks associated with some variants were statistically significant either in black or white subgroups but were consistent in direction. ACO1 rs2026739 remained significantly associated with DNP in whites (permutation p<0.0001) after correction for multiple tests. Several of the same iron-regulatory-gene polymorphisms, including ACO1 rs2026739, were also associated with severity of DNP (all p<0.05). Common polymorphisms in iron-management genes are associated with DNP and with DNP severity in HIV-infected persons receiving cART. Consistent risk estimates across population subgroups and persistence of the ACO1 rs2026739 association after adjustment for multiple testing suggest that genetic variation in iron-regulation and transport modulates susceptibility to DNP.


pacific symposium on biocomputing | 2014

VARIABLE SELECTION METHOD FOR THE IDENTIFICATION OF EPISTATIC MODELS

Emily Rose Holzinger; Silke Szymczak; Abhijit Dasgupta; James D. Malley; Qing Li; Joan E. Bailey-Wilson

Standard analysis methods for genome wide association studies (GWAS) are not robust to complex disease models, such as interactions between variables with small main effects. These types of effects likely contribute to the heritability of complex human traits. Machine learning methods that are capable of identifying interactions, such as Random Forests (RF), are an alternative analysis approach. One caveat to RF is that there is no standardized method of selecting variables so that false positives are reduced while retaining adequate power. To this end, we have developed a novel variable selection method called relative recurrency variable importance metric (r2VIM). This method incorporates recurrency and variance estimation to assist in optimal threshold selection. For this study, we specifically address how this method performs in data with almost completely epistatic effects (i.e. no marginal effects). Our results show that with appropriate parameter settings, r2VIM can identify interaction effects when the marginal effects are virtually nonexistent. It also outperforms logistic regression, which has essentially no power under this type of model when the number of potential features (genetic variants) is large. (All Supplementary Data can be found here: http://research.nhgri.nih.gov/manuscripts/Bailey-Wilson/r2VIM_epi/).


evolutionary computation machine learning and data mining in bioinformatics | 2011

ATHENA optimization: the effect of initial parameter settings across different genetic models

Emily Rose Holzinger; Scott M. Dudek; Eric S. Torstenson; Marylyn D. Ritchie

Rapidly advancing technology has allowed for the generation of massive amounts data assessing variation across the human genome. One analysis method for this type of data is the genome-wide association study (GWAS) where each variation is assessed individually for association to disease. While these studies have elucidated novel etiology, much of the variation due to genetics remains unexplained. One hypothesis is that some of the variation lies in gene-gene interactions. An impediment to testing for interactions is the infeasibility of exhaustively searching all multi-locus models. Novel methods are being developed that perform a non-exhaustive search. Because these methods are new to genetic studies, rigorous parameter optimization is necessary. Here, we assess genotype encodings, function sets, and cross-over in two algorithms which use grammatical evolution to optimize neural networks or symbolic regression formulas in the ATHENA software package. Our results show that the effect of these parameters is highly dependent on the underlying disease model.

Collaboration


Dive into the Emily Rose Holzinger's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Scott M. Dudek

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Shefali S. Verma

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Carrie B. Moore

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Alex T. Frase

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Joan E. Bailey-Wilson

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Molly A. Hall

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Qing Li

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Sarah A. Pendergrass

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge