Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andrey A. Shabalin is active.

Publication


Featured researches published by Andrey A. Shabalin.


Nature | 2015

An integrated map of structural variation in 2,504 human genomes

Peter H. Sudmant; Tobias Rausch; Eugene J. Gardner; Robert E. Handsaker; Alexej Abyzov; John Huddleston; Zhang Y; Kai Ye; Goo Jun; Markus His Yang Fritz; Miriam K. Konkel; Ankit Malhotra; Adrian M. Stütz; Xinghua Shi; Francesco Paolo Casale; Jieming Chen; Fereydoun Hormozdiari; Gargi Dayama; Ken Chen; Maika Malig; Mark Chaisson; Klaudia Walter; Sascha Meiers; Seva Kashin; Erik Garrison; Adam Auton; Hugo Y. K. Lam; Xinmeng Jasmine Mu; Can Alkan; Danny Antaki

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.


Nature Genetics | 2014

Heritability and genomics of gene expression in peripheral blood

Fred A. Wright; Patrick F. Sullivan; Andrew I. Brooks; Fei Zou; Wei Sun; Kai Xia; Vered Madar; Rick Jansen; Wonil Chung; Yi Hui Zhou; Abdel Abdellaoui; Sandra Batista; Casey Butler; Guanhua Chen; Ting-huei Chen; David B. D'Ambrosio; Paul J. Gallins; Min Jin Ha; Jouke-Jan Hottenga; Shunping Huang; Mathijs Kattenberg; Jaspreet Kochar; Christel M. Middeldorp; Ani Qu; Andrey A. Shabalin; Jay A. Tischfield; Laura Todd; Jung-Ying Tzeng; Gerard van Grootheest; Jacqueline M. Vink

We assessed gene expression profiles in 2,752 twins, using a classic twin design to quantify expression heritability and quantitative trait loci (eQTLs) in peripheral blood. The most highly heritable genes (∼777) were grouped into distinct expression clusters, enriched in gene-poor regions, associated with specific gene function or ontology classes, and strongly associated with disease designation. The design enabled a comparison of twin-based heritability to estimates based on dizygotic identity-by-descent sharing and distant genetic relatedness. Consideration of sampling variation suggests that previous heritability estimates have been upwardly biased. Genotyping of 2,494 twins enabled powerful identification of eQTLs, which we further examined in a replication set of 1,895 unrelated subjects. A large number of non-redundant local eQTLs (6,756) met replication criteria, whereas a relatively small number of distant eQTLs (165) met quality control and replication standards. Our results provide a new resource toward understanding the genetic control of transcription.


The Annals of Applied Statistics | 2009

FINDING LARGE AVERAGE SUBMATRICES IN HIGH DIMENSIONAL DATA

Andrey A. Shabalin; Victor J. Weigman; Charles M. Perou; Andrew B. Nobel

The search for sample-variable associations is an important problem in the exploratory analysis of high dimensional data. Biclustering methods search for sample-variable associations in the form of distinguished submatrices of the data matrix. (The rows and columns of a submatrix need not be contiguous.) In this paper we propose and evaluate a statistically motivated biclustering procedure (LAS) that finds large average submatrices within a given real-valued data matrix. The procedure operates in an iterative-residual fashion, and is driven by a Bonferroni-based significance score that effectively trades off between submatrix size and average value. We examine the performance and potential utility of LAS, and compare it with a number of existing methods, through an extensive three-part validation study using two gene expression datasets. The validation study examines quantitative properties of biclusters, biological and clinical assessments using auxiliary information, and classification of disease subtypes using bicluster membership. In addition, we carry out a simulation study to assess the effectiveness and noise sensitivity of the LAS search procedure. These results suggest that LAS is an effective exploratory tool for the discovery of biologically relevant structures in high dimensional data. Software is available at https://genome.unc.edu/las/.


Nucleic Acids Research | 2013

Resolving the polymorphism-in-probe problem is critical for correct interpretation of expression QTL studies

Adaikalavan Ramasamy; Daniah Trabzuni; J. Raphael Gibbs; Allissa Dillman; Dena Hernandez; Sampath Arepalli; Robert Walker; Colin Smith; Gigaloluwa Peter Ilori; Andrey A. Shabalin; Yun Li; Andrew Singleton; Mark R. Cookson; for Nabec; John Hardy; for Ukbec; Mina Ryten; Michael E. Weale

Polymorphisms in the target mRNA sequence can greatly affect the binding affinity of microarray probe sequences, leading to false-positive and false-negative expression quantitative trait locus (QTL) signals with any other polymorphisms in linkage disequilibrium. We provide the most complete solution to this problem, by using the latest genome and exome sequence reference data to identify almost all common polymorphisms (frequency >1% in Europeans) in probe sequences for two commonly used microarray panels (the gene-based Illumina Human HT12 array, which uses 50-mer probes, and exon-based Affymetrix Human Exon 1.0 ST array, which uses 25-mer probes). We demonstrate the impact of this problem using cerebellum and frontal cortex tissues from 438 neuropathologically normal individuals. We find that although only a small proportion of the probes contain polymorphisms, they account for a large proportion of apparent expression QTL signals, and therefore result in many false signals being declared as real. We find that the polymorphism-in-probe problem is insufficiently controlled by previous protocols, and illustrate this using some notable false-positive and false-negative examples in MAPT and PRICKLE1 that can be found in many eQTL databases. We recommend that both new and existing eQTL data sets should be carefully checked in order to adequately address this issue.


Biostatistics | 2018

An empirical Bayes approach for multiple tissue eQTL analysis

Gen Li; Andrey A. Shabalin; Ivan Rusyn; Fred A. Wright; Andrew B. Nobel

SUMMARY Expression quantitative trait locus (eQTL) analyses identify genetic markers associated with the expression of a gene. Most up‐to‐date eQTL studies consider the connection between genetic variation and expression in a single tissue. Multi‐tissue analyses have the potential to improve findings in a single tissue, and elucidate the genotypic basis of differences between tissues. In this article, we develop a hierarchical Bayesian model (MT‐eQTL) for multi‐tissue eQTL analysis. MT‐eQTL explicitly captures patterns of variation in the presence or absence of eQTL, as well as the heterogeneity of effect sizes across tissues. We devise an efficient Expectation‐Maximization (EM) algorithm for model fitting. Inferences concerning eQTL detection and the configuration of eQTL across tissues are derived from the adaptive thresholding of local false discovery rates, and maximum a posteriori estimation, respectively. We also provide theoretical justification of the adaptive procedure. We investigate the MT‐eQTL model through an extensive analysis of a 9‐tissue data set from the GTEx initiative.


bioRxiv | 2016

Local genetic effects on gene expression across 44 human tissues

François Aguet; Andrew Anand Brown; Stephane E. Castel; Joe R. Davis; Pejman Mohammadi; Ayellet V. Segrè; Zachary Zappala; Nathan S. Abell; Laure Frésard; Eric R. Gamazon; Ellen T. Gelfand; Machael J Gloudemans; Yuan He; Farhad Hormozdiari; Xiao Li; Xin Li; Boxiang Liu; Diego Garrido-Martín; Halit Ongen; John Palowitch; YoSon Park; Christine B. Peterson; Gerald Quon; Stephan Ripke; Andrey A. Shabalin; Tyler C. Shimko; Benjamin J. Strober; Timothy J. Sullivan; Nicole A. Teran; Emily K. Tsang

Expression quantitative trait locus (eQTL) mapping provides a powerful means to identify functional variants influencing gene expression and disease pathogenesis. We report the identification of cis-eQTLs from 7,051 post-mortem samples representing 44 tissues and 449 individuals as part of the Genotype-Tissue Expression (GTEx) project. We find a cis-eQTL for 88% of all annotated protein-coding genes, with one-third having multiple independent effects. We identify numerous tissue-specific cis-eQTLs, highlighting the unique functional impact of regulatory variation in diverse tissues. By integrating large-scale functional genomics data and state-of-the-art fine-mapping algorithms, we identify multiple features predictive of tissue-specific and shared regulatory effects. We improve estimates of cis-eQTL sharing and effect sizes using allele specific expression across tissues. Finally, we demonstrate the utility of this large compendium of cis-eQTLs for understanding the tissue-specific etiology of complex traits, including coronary artery disease. The GTEx project provides an exceptional resource that has improved our understanding of gene regulation across tissues and the role of regulatory variation in human genetic diseases.


Schizophrenia Bulletin | 2016

A Whole Methylome CpG-SNP Association Study of Psychosis in Blood and Brain Tissue

Edwin J. C. G. van den Oord; Shaunna L. Clark; Lin Ying Xie; Andrey A. Shabalin; Mikhail G. Dozmorov; Gaurav Kumar; Vladimir I. Vladimirov; Patrik K. E. Magnusson; Karolina A. Aberg

Mutated CpG sites (CpG-SNPs) are potential hotspots for human diseases because in addition to the sequence variation they may show individual differences in DNA methylation. We performed methylome-wide association studies (MWAS) to test whether methylation differences at those sites were associated with schizophrenia. We assayed all common CpG-SNPs with methyl-CpG binding domain protein-enriched genome sequencing (MBD-seq) using DNA extracted from 1408 blood samples and 66 postmortem brain samples (BA10) of schizophrenia cases and controls. Seven CpG-SNPs passed our FDR threshold of 0.1 in the blood MWAS. Of the CpG-SNPs methylated in brain, 94% were also methylated in blood. This significantly exceeded the 46.2% overlap expected by chance (P-value < 1.0×10(-8)) and justified replicating findings from blood in brain tissue. CpG-SNP rs3796293 in IL1RAP replicated (P-value = .003) with the same direction of effects. This site was further validated through targeted bisulfite pyrosequencing in 736 independent case-control blood samples (P-value < 9.5×10(-4)). Our top result in the brain MWAS (P-value = 8.8×10(-7)) was CpG-SNP rs16872141 located in the potential promoter of ENC1. Overall, our results suggested that CpG-SNP methylation may reflect effects of environmental insults and can provide biomarkers in blood that could potentially improve disease management.


Genome Biology | 2017

Correcting for cell-type effects in DNA methylation studies: reference-based method outperforms latent variable approaches in empirical studies

Mohammad W. Hattab; Andrey A. Shabalin; Shaunna L. Clark; Min Zhao; Gaurav Kumar; Robin F. Chan; Lin Ying Xie; Rick Jansen; Laura K. M. Han; Patrik K. E. Magnusson; Gerard van Grootheest; Christina M. Hultman; Brenda W.J.H. Penninx; Karolina A. Aberg; Edwin J. C. G. van den Oord

Based on an extensive simulation study, McGregor and colleagues recently recommended the use of surrogate variable analysis (SVA) to control for the confounding effects of cell-type heterogeneity in DNA methylation association studies in scenarios where no cell-type proportions are available. As their recommendation was mainly based on simulated data, we sought to replicate findings in two large-scale empirical studies. In our empirical data, SVA did not fully correct for cell-type effects, its performance was somewhat unstable, and it carried a risk of missing true signals caused by removing variation that might be linked to actual disease processes. By contrast, a reference-based correction method performed well and did not show these limitations. A disadvantage of this approach is that if reference methylomes are not (publicly) available, they will need to be generated once for a small set of samples. However, given the notable risk we observed for cell-type confounding, we argue that, to avoid introducing false-positive findings into the literature, it could be well worth making this investment.Please see related Correspondence article: https://genomebiology.biomedcentral.com/articles/10/1186/s13059-017-1149-7 and related Research article: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0935-y


Nicotine & Tobacco Research | 2016

Deep Sequencing of Three Loci Implicated in Large-Scale Genome-Wide Association Study Smoking Meta-Analyses.

Shaunna L. Clark; Joseph L. McClay; Daniel E. Adkins; Karolina A. Aberg; Gaurav Kumar; Srilaxmi Nerella; Linying Xie; Ann L. Collins; James J. Crowley; Quakenbush Cr; Hillard Ce; Guimin Gao; Andrey A. Shabalin; Roseann E. Peterson; William E. Copeland; Judy L. Silberg; Hermine H. Maes; Patrick F. Sullivan; Elizabeth J. Costello; van den Oord Ej

INTRODUCTION Genome-wide association study meta-analyses have robustly implicated three loci that affect susceptibility for smoking: CHRNA5\CHRNA3\CHRNB4, CHRNB3\CHRNA6 and EGLN2\CYP2A6. Functional follow-up studies of these loci are needed to provide insight into biological mechanisms. However, these efforts have been hampered by a lack of knowledge about the specific causal variant(s) involved. In this study, we prioritized variants in terms of the likelihood they account for the reported associations. METHODS We employed targeted capture of the CHRNA5\CHRNA3\CHRNB4, CHRNB3\CHRNA6, and EGLN2\CYP2A6 loci and flanking regions followed by next-generation deep sequencing (mean coverage 78×) to capture genomic variation in 363 individuals. We performed single locus tests to determine if any single variant accounts for the association, and examined if sets of (rare) variants that overlapped with biologically meaningful annotations account for the associations. RESULTS In total, we investigated 963 variants, of which 71.1% were rare (minor allele frequency < 0.01), 6.02% were insertion/deletions, and 51.7% were catalogued in dbSNP141. The single variant results showed that no variant fully accounts for the association in any region. In the variant set results, CHRNB4 accounts for most of the signal with significant sets consisting of directly damaging variants. CHRNA6 explains most of the signal in the CHRNB3\CHRNA6 locus with significant sets indicating a regulatory role for CHRNA6. Significant sets in CYP2A6 involved directly damaging variants while the significant variant sets suggested a regulatory role for EGLN2. CONCLUSIONS We found that multiple variants implicating multiple processes explain the signal. Some variants can be prioritized for functional follow-up.


Nucleic Acids Research | 2017

Enrichment methods provide a feasible approach to comprehensive and adequately powered investigations of the brain methylome

Robin F. Chan; Andrey A. Shabalin; Lin Y. Xie; Daniel E. Adkins; Min Zhao; Gustavo Turecki; Shaunna L. Clark; Karolina A. Aberg; Edwin J. C. G. van den Oord

Abstract Methylome-wide association studies are typically performed using microarray technologies that only assay a very small fraction of the CG methylome and entirely miss two forms of methylation that are common in brain and likely of particular relevance for neuroscience and psychiatric disorders. The alternative is to use whole genome bisulfite (WGB) sequencing but this approach is not yet practically feasible with sample sizes required for adequate statistical power. We argue for revisiting methylation enrichment methods that, provided optimal protocols are used, enable comprehensive, adequately powered and cost-effective genome-wide investigations of the brain methylome. To support our claim we use data showing that enrichment methods approximate the sensitivity obtained with WGB methods and with slightly better specificity. However, this performance is achieved at <5% of the reagent costs. Furthermore, because many more samples can be sequenced simultaneously, projects can be completed about 15 times faster. Currently the only viable option available for comprehensive brain methylome studies, enrichment methods may be critical for moving the field forward.

Collaboration


Dive into the Andrey A. Shabalin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Karolina A. Aberg

Virginia Commonwealth University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gaurav Kumar

Virginia Commonwealth University

View shared research outputs
Top Co-Authors

Avatar

Robin F. Chan

Virginia Commonwealth University

View shared research outputs
Top Co-Authors

Avatar

Min Zhao

Virginia Commonwealth University

View shared research outputs
Top Co-Authors

Avatar

Patrick F. Sullivan

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Linying Xie

Virginia Commonwealth University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joseph L. McClay

Virginia Commonwealth University

View shared research outputs
Researchain Logo
Decentralizing Knowledge