Barbara E. Engelhardt

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Barbara E. Engelhardt is active.

Explore More

Publication

Featured researches published by Barbara E. Engelhardt.

Nature | 2010

Understanding mechanisms underlying human gene expression variation with RNA sequencing

Joseph K. Pickrell; John C. Marioni; Athma A. Pai; Jacob F. Degner; Barbara E. Engelhardt; Everlyne Nkadori; Jean-Baptiste Veyrieras; Matthew Stephens; Yoav Gilad; Jonathan K. Pritchard

Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project. By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals.

Nature | 2017

Genetic effects on gene expression across human tissues

Lead analysts; Alexis Battle; Christopher D. Brown; Barbara E. Engelhardt; Stephen B. Montgomery

Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

PLOS Computational Biology | 2005

Protein Molecular Function Prediction by Bayesian Phylogenomics

Barbara E. Engelhardt; Michael I. Jordan; Kathryn E. Muratore; Steven E. Brenner

We present a statistical graphical model to infer specific molecular function for unannotated protein sequences using homology. Based on phylogenomic principles, SIFTER (Statistical Inference of Function Through Evolutionary Relationships) accurately predicts molecular function for members of a protein family given a reconciled phylogeny and available function annotations, even when the data are sparse or noisy. Our method produced specific and consistent molecular function predictions across 100 Pfam families in comparison to the Gene Ontology annotation database, BLAST, GOtcha, and Orthostrapper. We performed a more detailed exploration of functional predictions on the adenosine-5′-monophosphate/adenosine deaminase family and the lactate/malate dehydrogenase family, in the former case comparing the predictions against a gold standard set of published functional characterizations. Given function annotations for 3% of the proteins in the deaminase family, SIFTER achieves 96% accuracy in predicting molecular function for experimentally characterized proteins as reported in the literature. The accuracy of SIFTER on this dataset is a significant improvement over other currently available methods such as BLAST (75%), GeneQuiz (64%), GOtcha (89%), and Orthostrapper (11%). We also experimentally characterized the adenosine deaminase from Plasmodium falciparum, confirming SIFTERs prediction. The results illustrate the predictive power of exploiting a statistical model of function evolution in phylogenomic problems. A software implementation of SIFTER is available from the authors.

Nature | 2013

A statin-dependent QTL for GATM expression is associated with statin-induced myopathy.

Lara M. Mangravite; Barbara E. Engelhardt; Marisa W. Medina; Joshua D. Smith; Christopher D. Brown; Daniel I. Chasman; Brigham Mecham; Bryan Howie; Heejung Shim; Devesh Naidoo; QiPing Feng; Mark J. Rieder; Yii-Der Ida Chen; Jerome I. Rotter; Paul M. Ridker; Jemma C. Hopewell; Sarah Parish; Jane Armitage; Rory Collins; Russell A. Wilke; Deborah A. Nickerson; Matthew Stephens; Ronald M. Krauss

Statins are prescribed widely to lower plasma low-density lipoprotein (LDL) concentrations and cardiovascular disease risk and have been shown to have beneficial effects in a broad range of patients. However, statins are associated with an increased risk, albeit small, of clinical myopathy and type 2 diabetes. Despite evidence for substantial genetic influence on LDL concentrations, pharmacogenomic trials have failed to identify genetic variations with large effects on either statin efficacy or toxicity, and have produced little information regarding mechanisms that modulate statin response. Here we identify a downstream target of statin treatment by screening for the effects of in vitro statin exposure on genetic associations with gene expression levels in lymphoblastoid cell lines derived from 480 participants of a clinical trial of simvastatin treatment. This analysis identified six expression quantitative trait loci (eQTLs) that interacted with simvastatin exposure, including rs9806699, a cis-eQTL for the gene glycine amidinotransferase (GATM) that encodes the rate-limiting enzyme in creatine synthesis. We found this locus to be associated with incidence of statin-induced myotoxicity in two separate populations (meta-analysis odds ratio = 0.60). Furthermore, we found that GATM knockdown in hepatocyte-derived cell lines attenuated transcriptional response to sterol depletion, demonstrating that GATM may act as a functional link between statin-mediated lowering of cholesterol and susceptibility to statin-induced myopathy.

PLOS Genetics | 2010

Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis

Barbara E. Engelhardt; Matthew Stephens

We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more “continuous,” as in isolation-by-distance models.

PLOS Genetics | 2013

Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs.

Christopher D. Brown; Lara M. Mangravite; Barbara E. Engelhardt

Genetic variants in cis-regulatory elements or trans-acting regulators frequently influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To fully exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and causal mechanisms of cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in three parts: first we identified eQTLs from eleven studies on seven cell types; then we integrated eQTL data with cis-regulatory element (CRE) data from the ENCODE project; finally we built a set of classifiers to predict the cell type specificity of eQTLs. The cell type specificity of eQTLs is associated with eQTL SNP overlap with hundreds of cell type specific CRE classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. These associations provide insight into the molecular mechanisms generating the cell type specificity of eQTLs and the mode of regulation of corresponding eQTLs. Using a random forest classifier with cell specific CRE-SNP overlap as features, we demonstrate the feasibility of predicting the cell type specificity of eQTLs. We then demonstrate that CREs from a trait-associated cell type can be used to annotate GWAS associations in the absence of eQTL data for that cell type. We anticipate that such integrative, predictive modeling of cell specificity will improve our ability to understand the mechanistic basis of human complex phenotypic variation.

Genome Biology | 2015

Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements

Weiwei Zhang; Tim D. Spector; Panos Deloukas; Jordana T. Bell; Barbara E. Engelhardt

BackgroundRecent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is critical to enable genome-wide analyses, but current approaches tackle average methylation within a locus and are often limited to specific genomic regions.ResultsWe characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict methylation levels at CpG site resolution using features including neighboring CpG site methylation levels and genomic distance, co-localization with coding regions, CpG islands (CGIs), and regulatory elements from the ENCODE project. Our approach achieves 92% prediction accuracy of genome-wide methylation levels at single-CpG-site precision. The accuracy increases to 98% when restricted to CpG sites within CGIs and is robust across platform and cell-type heterogeneity. Our classifier outperforms other types of classifiers and identifies features that contribute to prediction accuracy: neighboring CpG site methylation, CGIs, co-localized DNase I hypersensitive sites, transcription factor binding sites, and histone modifications were found to be most predictive of methylation levels.ConclusionsOur observations of DNA methylation patterns led us to develop a classifier to predict DNA methylation levels at CpG site resolution with high accuracy. Furthermore, our method identified genomic features that interact with DNA methylation, suggesting mechanisms involved in DNA methylation modification and regulation, and linking diverse epigenetic processes.

PLOS ONE | 2012

Genome-Wide Association Study of d -Amphetamine Response in Healthy Volunteers Identifies Putative Associations, Including Cadherin 13 ( CDH13 )

Amy B. Hart; Barbara E. Engelhardt; Margaret C. Wardle; Greta Sokoloff; Matthew Stephens; Harriet de Wit; Abraham A. Palmer

Both the subjective response to d-amphetamine and the risk for amphetamine addiction are known to be heritable traits. Because subjective responses to drugs may predict drug addiction, identifying alleles that influence acute response may also provide insight into the genetic risk factors for drug abuse. We performed a Genome Wide Association Study (GWAS) for the subjective responses to amphetamine in 381 non-drug abusing healthy volunteers. Responses to amphetamine were measured using a double-blind, placebo-controlled, within-subjects design. We used sparse factor analysis to reduce the dimensionality of the data to ten factors. We identified several putative associations; the strongest was between a positive subjective drug-response factor and a SNP (rs3784943) in the 8th intron of cadherin 13 (CDH13; P = 4.58×10−8), a gene previously associated with a number of psychiatric traits including methamphetamine dependence. Additionally, we observed a putative association between a factor representing the degree of positive affect at baseline and a SNP (rs472402) in the 1st intron of steroid-5-alpha-reductase-α-polypeptide-1 (SRD5A1; P = 2.53×10−7), a gene whose protein product catalyzes the rate-limiting step in synthesis of the neurosteroid allopregnanolone. This SNP belongs to an LD-block that has been previously associated with the expression of SRD5A1 and differences in SRD5A1 enzymatic activity. The purpose of this study was to begin to explore the genetic basis of subjective responses to stimulant drugs using a GWAS approach in a modestly sized sample. Our approach provides a case study for analysis of high-dimensional intermediate pharmacogenomic phenotypes, which may be more tractable than clinical diagnoses.

Genome Research | 2011

Genome-scale phylogenetic function annotation of large and diverse protein families

Barbara E. Engelhardt; Michael I. Jordan; John R. Srouji; Steven E. Brenner

The Statistical Inference of Function Through Evolutionary Relationships (SIFTER) framework uses a statistical graphical model that applies phylogenetic principles to automate precise protein function prediction. Here we present a revised approach (SIFTER version 2.0) that enables annotations on a genomic scale. SIFTER 2.0 produces equivalently precise predictions compared to the earlier version on a carefully studied family and on a collection of 100 protein families. We have added an approximation method to SIFTER 2.0 and show a 500-fold improvement in speed with minimal impact on prediction results in the functionally diverse sulfotransferase protein family. On the Nudix protein family, previously inaccessible to the SIFTER framework because of the 66 possible molecular functions, SIFTER achieved 47.4% accuracy on experimental data (where BLAST achieved 34.0%). Finally, we used SIFTER to annotate all of the Schizosaccharomyces pombe proteins with experimental functional characterizations, based on annotations from proteins in 46 fungal genomes. SIFTER precisely predicted molecular function for 45.5% of the characterized proteins in this genome, as compared with four current function prediction methods that precisely predicted function for 62.6%, 30.6%, 6.0%, and 5.7% of these proteins. We use both precision-recall curves and ROC analyses to compare these genome-scale predictions across the different methods and to assess performance on different types of applications. SIFTER 2.0 is capable of predicting protein molecular function for large and functionally diverse protein families using an approximate statistical model, enabling phylogenetics-based protein function prediction for genome-wide analyses. The code for SIFTER and protein family data are available at http://sifter.berkeley.edu.

Bioinformatics | 2013

Stability selection for regression-based models of transcription factor–DNA binding specificity

Fantine Mordelet; John Horton; Alexander J. Hartemink; Barbara E. Engelhardt; Raluca Gordân

Motivation: The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. Results: We propose novel regression-based models of TF–DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF–DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF–DNA binding specificity. Availability: Our code is available at http://genome.duke.edu/labs/gordan/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026. Contact: [email protected]

Explore More