Kwangbom Choi
University of Rochester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kwangbom Choi.
Nature | 2016
Joel M. Chick; Steven C. Munger; Petr Simecek; Edward L. Huttlin; Kwangbom Choi; Daniel M. Gatti; Narayanan Raghupathy; Karen L. Svenson; Gary A. Churchill; Steven P. Gygi
Genetic variation modulates protein expression through both transcriptional and post-transcriptional mechanisms. To characterize the consequences of natural genetic diversity on the proteome, here we combine a multiplexed, mass spectrometry-based method for protein quantification with an emerging outbred mouse model containing extensive genetic variation from eight inbred founder strains. By measuring genome-wide transcript and protein expression in livers from 192 Diversity outbred mice, we identify 2,866 protein quantitative trait loci (pQTL) with twice as many local as distant genetic variants. These data support distinct transcriptional and post-transcriptional models underlying the observed pQTL effects. Using a sensitive approach to mediation analysis, we often identified a second protein or transcript as the causal mediator of distant pQTL. Our analysis reveals an extensive network of direct protein–protein interactions. Finally, we show that local genotype can provide accurate predictions of protein abundance in an independent cohort of collaborative cross mice.
PLOS Genetics | 2014
John P. Kemp; Carolina Medina-Gomez; Karol Estrada; Beate St Pourcain; Denise H. M. Heppe; Nicole M. Warrington; Ling Oei; Susan M. Ring; Claudia J. Kruithof; Nicholas J. Timpson; Lisa E. Wolber; Sjur Reppe; Kaare M. Gautvik; Elin Grundberg; Bing Ge; Bram C. J. van der Eerden; Jeroen van de Peppel; Matthew A. Hibbs; Cheryl L. Ackert-Bicknell; Kwangbom Choi; Daniel L. Koller; Michael J. Econs; Frances M. K. Williams; Tatiana Foroud; M. Carola Zillikens; Claes Ohlsson; Albert Hofman; André G. Uitterlinden; George Davey Smith; Vincent W. V. Jaddoe
Heritability of bone mineral density (BMD) varies across skeletal sites, reflecting different relative contributions of genetic and environmental influences. To quantify the degree to which common genetic variants tag and environmental factors influence BMD, at different sites, we estimated the genetic (rg) and residual (re) correlations between BMD measured at the upper limbs (UL-BMD), lower limbs (LL-BMD) and skull (SK-BMD), using total-body DXA scans of ∼4,890 participants recruited by the Avon Longitudinal Study of Parents and their Children (ALSPAC). Point estimates of rg indicated that appendicular sites have a greater proportion of shared genetic architecture (LL-/UL-BMD rg = 0.78) between them, than with the skull (UL-/SK-BMD rg = 0.58 and LL-/SK-BMD rg = 0.43). Likewise, the residual correlation between BMD at appendicular sites (re = 0.55) was higher than the residual correlation between SK-BMD and BMD at appendicular sites (re = 0.20–0.24). To explore the basis for the observed differences in rg and re, genome-wide association meta-analyses were performed (n∼9,395), combining data from ALSPAC and the Generation R Study identifying 15 independent signals from 13 loci associated at genome-wide significant level across different skeletal regions. Results suggested that previously identified BMD-associated variants may exert site-specific effects (i.e. differ in the strength of their association and magnitude of effect across different skeletal sites). In particular, variants at CPED1 exerted a larger influence on SK-BMD and UL-BMD when compared to LL-BMD (P = 2.01×10−37), whilst variants at WNT16 influenced UL-BMD to a greater degree when compared to SK- and LL-BMD (P = 2.31×10−14). In addition, we report a novel association between RIN3 (previously associated with Pagets disease) and LL-BMD (rs754388: β = 0.13, SE = 0.02, P = 1.4×10−10). Our results suggest that BMD at different skeletal sites is under a mixture of shared and specific genetic and environmental influences. Allowing for these differences by performing genome-wide association at different skeletal sites may help uncover new genetic influences on BMD.
PLOS Genetics | 2015
Christopher L. Baker; Shimpei Kajita; Michael D. Walker; Ruth L. Saxl; Narayanan Raghupathy; Kwangbom Choi; Petko M. Petkov; Kenneth Paigen
Meiotic recombination generates new genetic variation and assures the proper segregation of chromosomes in gametes. PRDM9, a zinc finger protein with histone methyltransferase activity, initiates meiotic recombination by binding DNA at recombination hotspots and directing the position of DNA double-strand breaks (DSB). The DSB repair mechanism suggests that hotspots should eventually self-destruct, yet genome-wide recombination levels remain constant, a conundrum known as the hotspot paradox. To test if PRDM9 drives this evolutionary erosion, we measured activity of the Prdm9 Cst allele in two Mus musculus subspecies, M.m. castaneus, in which Prdm9Cst arose, and M.m. domesticus, into which Prdm9Cst was introduced experimentally. Comparing these two strains, we find that haplotype differences at hotspots lead to qualitative and quantitative changes in PRDM9 binding and activity. Using Mus spretus as an outlier, we found most variants affecting PRDM9Cst binding arose and were fixed in M.m. castaneus, suppressing hotspot activity. Furthermore, M.m. castaneus×M.m. domesticus F1 hybrids exhibit novel hotspots, with large haplotype biases in both PRDM9 binding and chromatin modification. These novel hotspots represent sites of historic evolutionary erosion that become activated in hybrids due to crosstalk between one parents Prdm9 allele and the opposite parents chromosome. Together these data support a model where haplotype-specific PRDM9 binding directs biased gene conversion at hotspots, ultimately leading to hotspot erosion.
Genetics | 2014
Steven C. Munger; Narayanan Raghupathy; Kwangbom Choi; Allen K. Simons; Daniel M. Gatti; Douglas Hinerfeld; Karen L. Svenson; Mark P. Keller; Alan D. Attie; Matthew A. Hibbs; Joel H. Graber; Elissa J. Chesler; Gary A. Churchill
Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations.
Epigenetics & Chromatin | 2015
Michael D. Walker; Timothy Billings; Christopher L. Baker; Natalie Powers; Hui Tian; Ruth L. Saxl; Kwangbom Choi; Matthew A. Hibbs; Gregory W. Carter; Mary Ann Handel; Kenneth Paigen; Petko M. Petkov
BackgroundGenetic recombination plays an important role in evolution, facilitating the creation of new, favorable combinations of alleles and the removal of deleterious mutations by unlinking them from surrounding sequences. In most mammals, the placement of genetic crossovers is determined by the binding of PRDM9, a highly polymorphic protein with a long zinc finger array, to its cognate binding sites. It is one of over 800 genes encoding proteins with zinc finger domains in the human genome.ResultsWe report a novel technique, Affinity-seq, that for the first time identifies both the genome-wide binding sites of DNA-binding proteins and quantitates their relative affinities. We have applied this in vitro technique to PRDM9, the zinc-finger protein that activates genetic recombination, obtaining new information on the regulation of hotspots, whose locations and activities determine the recombination landscape. We identified 31,770 binding sites in the mouse genome for the PRDM9Dom2 variant. Comparing these results with hotspot usage in vivo, we find that less than half of potential PRDM9 binding sites are utilized in vivo. We show that hotspot usage is increased in actively transcribed genes and decreased in genomic regions containing H3K9me2/3 histone marks or bound to the nuclear lamina.ConclusionsThese results show that a major factor determining whether a binding site will become an active hotspot and what its activity will be are constraints imposed by prior chromatin modifications on the ability of PRDM9 to bind to DNA in vivo. These constraints lead to the presence of long genomic regions depleted of recombination.
Bioinformatics | 2018
Narayanan Raghupathy; Kwangbom Choi; Matthew J Vincent; Glen L. Beane; Keith Sheppard; Steven C. Munger; Ron Korstanje; Fernando Pardo-Manual de Villena; Gary A. Churchill
Motivation Allele‐specific expression (ASE) refers to the differential abundance of the allelic copies of a transcript. RNA sequencing (RNA‐seq) can provide quantitative estimates of ASE for genes with transcribed polymorphisms. When short‐read sequences are aligned to a diploid transcriptome, read‐mapping ambiguities confound our ability to directly count reads. Multi‐mapping reads aligning equally well to multiple genomic locations, isoforms or alleles can comprise the majority (>85%) of reads. Discarding them can result in biases and substantial loss of information. Methods have been developed that use weighted allocation of read counts but these methods treat the different types of multi‐reads equivalently. We propose a hierarchical approach to allocation of read counts that first resolves ambiguities among genes, then among isoforms, and lastly between alleles. We have implemented our model in EMASE software (Expectation‐Maximization for Allele Specific Expression) to estimate total gene expression, isoform usage and ASE based on this hierarchical allocation. Results Methods that align RNA‐seq reads to a diploid transcriptome incorporating known genetic variants improve estimates of ASE and total gene expression compared to methods that use reference genome alignments. Weighted allocation methods outperform methods that discard multi‐reads. Hierarchical allocation of reads improves estimation of ASE even when data are simulated from a non‐hierarchical model. Analysis of RNA‐seq data from F1 hybrid mice using EMASE reveals widespread ASE associated with cis‐acting polymorphisms and a small number of parent‐of‐origin effects. Availability and implementation EMASE software is available at https://github.com/churchill‐lab/emase.
bioRxiv | 2018
Kwangbom Choi; Narayanan Raghupathy; Gary A. Churchill
Single-cell RNA sequencing (scRNA-Seq) can reveal features of cellular gene expression that cannot be observed in whole-tissue analysis. Allele-specific expression in single cells can provide an even richer picture of the stochastic and dynamic features of gene expression. The trend in single-cell technologies is moving toward sequencing larger numbers of cells with low depth of coverage per cell. Low coverage results in increased sampling variability and frequent occurrence of zero counts for genes that are expressed at low levels or that are dynamically expressed in short bursts. The problems associated with low coverage are exacerbated in allele-specific analysis by the almost universal practice of discarding reads that cannot be unambiguously aligned to one allele of one gene (multi-reads). We demonstrate that discarding multi-reads leads to higher variability in estimates of allelic proportions, an increased frequency of sampling zeros, and can lead to spurious findings of dynamic and monoallelic gene expression. We propose a weighted-allocation method of counting reads that substantially improves estimation of allelic proportions and reduces spurious zeros in the allele-specific read counts. We further demonstrate that combining information across cells using a hierarchical mixture model reduces sampling variability without sacrificing cell-to-cell heterogeneity. We applied our approach to track changes in the allele-specific expression patterns of cells sampled over a developmental time course. We implemented these methods in extensible open-source software scBASE, which is available at https://github.com/churchill-lab/scBASE
bioRxiv | 2018
Christopher L. Baker; Michael D. Walker; Seda Arat; Guruprasad Ananda; Pavlina Petkova; Natalie Powers; Hui Tian; Catrina Spruce; Bo Ji; Dylan Rausch; Kwangbom Choi; Petko M. Petkov; Gregory W. Carter; Kenneth Paigen
Although a variety of writers, readers, and erasers of epigenetic modifications are known, we have little information about the underlying regulatory systems controlling the establishment and maintenance of the epigenetic landscape, which varies greatly among cell types. Here, we have explored how natural genetic variation impacts the epigenome in mice. Studying levels of H3K4me3, a histone modification at sites such as promoters, enhancers, and recombination hotspots, we found tissue-specific trans-regulation of H3K4me3 levels in four highly diverse cell types: male germ cells, embryonic stem (ES) cells, hepatocytes and cardiomyocytes. To identify the genetic loci involved, we measured H3K4me3 levels in male germ cells in a mapping population of 60 BXD recombinant inbred lines, identifying extensive trans-regulation primarily controlled by six major histone quantitative trait loci (hQTL). These chromatin regulatory loci act dominantly to suppress H3K4me3, which at hotspots reduces the likelihood of subsequent DNA double-strand breaks. QTL locations do not correspond with enzyme known to metabolize chromatin features. Instead their locations match clusters of zinc finger genes, making these possible candidates that explain the dominant suppression of H3K4me3. Collectively, these data describe an extensive, tissue-specific set of chromatin regulatory loci that control functionally related chromatin sites.
bioRxiv | 2017
Daniel M. Gatti; Petr Simecek; Lisa Somes; Clifton T Jeffery; Matthew J Vincent; Kwangbom Choi; Xingyao Chen; Gary A. Churchill; Karen L. Svenson
Inter-individual variation in metabolic health and adiposity is driven by many factors. Diet composition and genetic background and the interactions between these two factors affect adiposity and related traits such as circulating cholesterol levels. In this study, we fed 850 Diversity Outbred mice, half females and half males, with either a standard chow diet or a high fat, high sucrose diet beginning at weaning and aged them to 26 weeks. We measured clinical chemistry and body composition at early and late time points during the study, and liver transcription at euthanasia. Males weighed more than females and mice on a high fat diet generally weighed more than those on chow. Many traits showed sex- or diet-specific changes as well as more complex sex by diet interactions. We mapped both the physiological and molecular traits and found that the genetic architecture of the physiological traits is complex, with many single locus associations potentially being driven by more than one polymorphism. For liver transcription, we find that local polymorphisms affect constitutive and sex-specific transcription, but that the response to diet is not affected by local polymorphisms. We identified two loci for circulating cholesterol levels. We performed mediation analysis by mapping the physiological traits, given liver transcript abundance and propose several genes that may be modifiers of the physiological traits. By including both physiological and molecular traits in our analyses, we have created deeper phenotypic profiles to identify additional significant contributors to complex metabolic outcomes such as polygenic obesity. We make the phenotype, liver transcript and genotype data publicly available as a resource for the research community.
bioRxiv | 2017
Narayanan Raghupathy; Kwangbom Choi; Matthew J Vincent; Glen L. Beane; Keith Sheppard; Steven C. Munger; Ron Korstanje; Fernando Pardo-Manuel de Villena; Gary A. Churchill
Allele-specific expression (ASE) refers to the differential abundance of the allelic copies of a transcript. Direct RNA sequencing (RNA-Seq) can provide quantitative estimates of ASE for genes with transcribed polymorphisms. However, estimating ASE is challenging due to ambiguities in read alignment. Current approaches do not account for the hierarchy of multiple read alignments to genes, isoforms, and alleles. We have developed EMASE (Expectation-Maximization for Allele Specific Expression), an integrated approach to estimate total gene expression, ASE, and isoform usage based on hierarchical allocation of multi-mapping reads. In simulations, EMASE outperforms standard ASE estimation methods. We apply EMASE to RNA-Seq data from F1 hybrid mice where we observe widespread ASE associated with cis-acting polymorphisms and a small number of parent-of-origin effects at known imprinted genes. The EMASE software is freely available under GNU license at https://github.com/churchill-lab/emase and it can be adapted to other sequencing applications.