Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Huwenbo Shi is active.

Publication


Featured researches published by Huwenbo Shi.


Nature Genetics | 2016

Integrative approaches for large-scale transcriptome-wide association studies

Alexander Gusev; Arthur Ko; Huwenbo Shi; Gaurav Bhatia; Wonil Chung; Brenda W.J.H. Penninx; Rick Jansen; Eco J. C. de Geus; Dorret I. Boomsma; Fred A. Wright; Patrick F. Sullivan; Elina Nikkola; Marcus Alvarez; Mete Civelek; Aldons J. Lusis; Terho Lehtimäki; Emma Raitoharju; Mika Kähönen; Ilkka Seppälä; Olli T. Raitakari; Johanna Kuusisto; Markku Laakso; Alkes L. Price; Päivi Pajukanta; Bogdan Pasaniuc

Many genetic variants influence complex traits by modulating gene expression, thus altering the abundance of one or multiple proteins. Here we introduce a powerful strategy that integrates gene expression measurements with summary association statistics from large-scale genome-wide association studies (GWAS) to identify genes whose cis-regulated expression is associated with complex traits. We leverage expression imputation from genetic data to perform a transcriptome-wide association study (TWAS) to identify significant expression-trait associations. We applied our approaches to expression data from blood and adipose tissue measured in ∼3,000 individuals overall. We imputed gene expression into GWAS data from over 900,000 phenotype measurements to identify 69 new genes significantly associated with obesity-related traits (BMI, lipids and height). Many of these genes are associated with relevant phenotypes in the Hybrid Mouse Diversity Panel. Our results showcase the power of integrating genotype, gene expression and phenotype to gain insights into the genetic basis of complex traits.


Bioinformatics | 2014

Fast and accurate imputation of summary statistics enhances evidence of functional enrichment

Bogdan Pasaniuc; Noah Zaitlen; Huwenbo Shi; Gaurav Bhatia; Alexander Gusev; Joseph Pickrell; Joel N. Hirschhorn; David P. Strachan; Nick Patterson; Alkes L. Price

MOTIVATION Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. RESULTS In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. AVAILABILITY AND IMPLEMENTATION Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. CONTACT [email protected] or [email protected] SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.


American Journal of Human Genetics | 2016

Contrasting the Genetic Architecture of 30 Complex Traits from Summary Association Data

Huwenbo Shi; Gleb Kichaev; Bogdan Pasaniuc

Variance-component methods that estimate the aggregate contribution of large sets of variants to the heritability of complex traits have yielded important insights into the genetic architecture of common diseases. Here, we introduce methods that estimate the total trait variance explained by the typed variants at a single locus in the genome (local SNP heritability) from genome-wide association study (GWAS) summary data while accounting for linkage disequilibrium among variants. We applied our estimator to ultra-large-scale GWAS summary data of 30 common traits and diseases to gain insights into their local genetic architecture. First, we found that common SNPs have a high contribution to the heritability of all studied traits. Second, we identified traits for which the majority of the SNP heritability can be confined to a small percentage of the genome. Third, we identified GWAS risk loci where the entire locus explains significantly more variance in the trait than the GWAS reported variants. Finally, we identified loci that explain a significant amount of heritability across multiple traits.


American Journal of Human Genetics | 2017

Local Genetic Correlation Gives Insights into the Shared Genetic Architecture of Complex Traits

Huwenbo Shi; Nicholas Mancuso; Sarah Spendlove; Bogdan Pasaniuc

Although genetic correlations between complex traits provide valuable insights into epidemiological and etiological studies, a precise quantification of which genomic regions disproportionately contribute to the genome-wide correlation is currently lacking. Here, we introduce ρ-HESS, a technique to quantify the correlation between pairs of traits due to genetic variation at a small region in the genome. Our approach requires GWAS summary data only and makes no distributional assumption on the causal variant effect sizes while accounting for linkage disequilibrium (LD) and overlapping GWAS samples. We analyzed large-scale GWAS summary data across 36 quantitative traits, and identified 25 genomic regions that contribute significantly to the genetic correlation among these traits. Notably, we find 6 genomic regions that contribute to the genetic correlation of 10 pairs of traits that show negligible genome-wide correlation, further showcasing the power of local genetic correlation analyses. Finally, we report the distribution of local genetic correlations across the genome for 55 pairs of traits that show putative causal relationships.


Bioinformatics | 2018

A Bayesian framework for multiple trait colocalization from summary association statistics

Claudia Giambartolomei; Jimmy Z. Liu; Wen Zhang; Mads Hauberg; Huwenbo Shi; James Boocock; Joe Pickrell; Andrew E. Jaffe; Bogdan Pasaniuc; Panos Roussos

Motivation: Most genetic variants implicated in complex diseases by genome‐wide association studies (GWAS) are non‐coding, making it challenging to understand the causative genes involved in disease. Integrating external information such as quantitative trait locus (QTL) mapping of molecular traits (e.g. expression, methylation) is a powerful approach to identify the subset of GWAS signals explained by regulatory effects. In particular, expression QTLs (eQTLs) help pinpoint the responsible gene among the GWAS regions that harbor many genes, while methylation QTLs (mQTLs) help identify the epigenetic mechanisms that impact gene expression which in turn affect disease risk. In this work, we propose multiple‐trait‐coloc (moloc), a Bayesian statistical framework that integrates GWAS summary data with multiple molecular QTL data to identify regulatory effects at GWAS risk loci. Results: We applied moloc to schizophrenia (SCZ) and eQTL/mQTL data derived from human brain tissue and identified 52 candidate genes that influence SCZ through methylation. Our method can be applied to any GWAS and relevant functional data to help prioritize disease associated genes. Availability and implementation: moloc is available for download as an R package (https://github.com/clagiamba/moloc). We also developed a web site to visualize the biological findings (icahn.mssm.edu/moloc). The browser allows searches by gene, methylation probe and scenario of interest. Supplementary information: Supplementary data are available at Bioinformatics online.


bioRxiv | 2016

Integrating gene expression with summary association statistics to identify susceptibility genes for 30 complex traits

Nicholas Mancuso; Huwenbo Shi; Pagé Goddard; Gleb Kichaev; Alexander Gusev; Bogdan Pasaniuc

Although genome-wide association studies (GWASs) have identified thousands of risk loci for many complex traits and diseases, the causal variants and genes at these loci remain largely unknown. We leverage recently introduced methods to integrate gene expression measurements from 45 expression panels with summary GWAS data to perform 30 transcriptome-wide association studies (TWASs). We identify 1,196 susceptibility genes whose expression is associated with these traits; of these, 168 reside more than 0.5Mb away from any previously reported GWAS significant variant, thus providing new risk loci. Second, we find 43 pairs of traits with significant genetic correlation at the level of predicted expression; of these, 8 are not found through genetic correlation at the SNP level. Third, we use bi-directional regression to find evidence for BMI causally influencing triglyceride levels, and triglyceride levels causally influencing LDL. Taken together, our results provide insights into the role of expression to susceptibility of complex traits and diseases.


intelligent systems in molecular biology | 2018

A unifying framework for joint trait analysis under a non-infinitesimal model

Ruth Johnson; Huwenbo Shi; Bogdan Pasaniuc; Sriram Sankararaman

Motivation A large proportion of risk regions identified by genome‐wide association studies (GWAS) are shared across multiple diseases and traits. Understanding whether this clustering is due to sharing of causal variants or chance colocalization can provide insights into shared etiology of complex traits and diseases. Results In this work, we propose a flexible, unifying framework to quantify the overlap between a pair of traits called UNITY (Unifying Non‐Infinitesimal Trait analYsis). We formulate a Bayesian generative model that relates the overlap between pairs of traits to GWAS summary statistic data under a non‐infinitesimal genetic architecture underlying each trait. We propose a Metropolis‐Hastings sampler to compute the posterior density of the genetic overlap parameters in this model. We validate our method through comprehensive simulations and analyze summary statistics from height and body mass index GWAS to show that it produces estimates consistent with the known genetic makeup of both traits. Availability and implementation The UNITY software is made freely available to the research community at: https://github.com/bogdanlab/UNITY.


bioRxiv | 2018

Probabilistic fine-mapping of transcriptome-wide association studies

Nicholas Mancuso; Gleb Kichaev; Huwenbo Shi; Malika Freund; Alexander Gusev; Bogdan Pasaniuc

Transcriptome-wide association studies (TWAS) using predicted expression have identified thousands of genes whose locally-regulated expression is associated to complex traits and diseases. In this work, we show that linkage disequilibrium (LD) among SNPs induce significant gene-trait associations at non-causal genes as a function of the overlap between eQTL weights used in expression prediction. We introduce a probabilistic framework that models the induced correlation among TWAS signals to assign a probability for every gene in the risk region to explain the observed association signal. Our approach yields credible sets of genes containing the causal gene at a nominal confidence level (e.g., 90%) that can be used to prioritize and select genes for functional assays. Importantly, our approach remains accurate when expression data for causal genes are not available in the casual tissue by leveraging expression prediction from other tissues. We illustrate our approach using an integrative analysis of lipids traits where we correctly identify known causal genes.


American Journal of Human Genetics | 2018

Phenotype-Specific Enrichment of Mendelian Disorder Genes near GWAS Regions across 62 Complex Traits

Malika Freund; Kathryn S. Burch; Huwenbo Shi; Nicholas Mancuso; Gleb Kichaev; Kristina M. Garske; David Z. Pan; Zong Miao; Karen L. Mohlke; Markku Laakso; Päivi Pajukanta; Bogdan Pasaniuc; Valerie A. Arboleda

Although recent studies provide evidence for a common genetic basis between complex traits and Mendelian disorders, a thorough quantification of their overlap in a phenotype-specific manner remains elusive. Here, we have quantified the overlap of genes identified through large-scale genome-wide association studies (GWASs) for 62 complex traits and diseases with genes containing mutations known to cause 20 broad categories of Mendelian disorders. We identified a significant enrichment of genes linked to phenotypically matched Mendelian disorders in GWAS gene sets; of the total 1,240 comparisons, a higher proportion of phenotypically matched or related pairs (n = 50 of 92 [54%]) than phenotypically unmatched pairs (n = 27 of 1,148 [2%]) demonstrated significant overlap, confirming a phenotype-specific enrichment pattern. Further, we observed elevated GWAS effect sizes near genes linked to phenotypically matched Mendelian disorders. Finally, we report examples of GWAS variants localized at the transcription start site or physically interacting with the promoters of genes linked to phenotypically matched Mendelian disorders. Our results are consistent with the hypothesis that genes that are disrupted in Mendelian disorders are dysregulated by non-coding variants in complex traits and demonstrate how leveraging findings from related Mendelian disorders and functional genomic datasets can prioritize genes that are putatively dysregulated by local and distal non-coding GWAS variants.


Bioinformatics | 2015

A multivariate Bernoulli model to predict DNaseI hypersensitivity status from haplotype data

Huwenbo Shi; Bogdan Pasaniuc; Kenneth Lange

MOTIVATION Haplotype models enjoy a wide range of applications in population inference and disease gene discovery. The hidden Markov models traditionally used for haplotypes are hindered by the dubious assumption that dependencies occur only between consecutive pairs of variants. In this article, we apply the multivariate Bernoulli (MVB) distribution to model haplotype data. The MVB distribution relies on interactions among all sets of variants, thus allowing for the detection and exploitation of long-range and higher-order interactions. We discuss penalized estimation and present an efficient algorithm for fitting sparse versions of the MVB distribution to haplotype data. Finally, we showcase the benefits of the MVB model in predicting DNaseI hypersensitivity (DH) status--an epigenetic mark describing chromatin accessibility--from population-scale haplotype data. RESULTS We fit the MVB model to real data from 59 individuals on whom both haplotypes and DH status in lymphoblastoid cell lines are publicly available. The model allows prediction of DH status from genetic data (prediction R2=0.12 in cross-validations). Comparisons of prediction under the MVB model with prediction under linear regression (best linear unbiased prediction) and logistic regression demonstrate that the MVB model achieves about 10% higher prediction R2 than the two competing methods in empirical data. AVAILABILITY AND IMPLEMENTATION Software implementing the method described can be downloaded at http://bogdan.bioinformatics.ucla.edu/software/. CONTACT [email protected] or [email protected].

Collaboration


Dive into the Huwenbo Shi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gleb Kichaev

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Malika Freund

University of California

View shared research outputs
Top Co-Authors

Avatar

Pagé Goddard

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Markku Laakso

University of Eastern Finland

View shared research outputs
Researchain Logo
Decentralizing Knowledge