Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jae Hoon Sul is active.

Publication


Featured researches published by Jae Hoon Sul.


Nature Genetics | 2010

Variance component model to account for sample structure in genome-wide association studies

Hyun Min Kang; Jae Hoon Sul; Noah Zaitlen; Sit Yee Kong; Nelson B. Freimer; Chiara Sabatti; Eleazar Eskin

Although genome-wide association studies (GWASs) have identified numerous loci associated with complex traits, imprecise modeling of the genetic relatedness within study samples may cause substantial inflation of test statistics and possibly spurious associations. Variance component approaches, such as efficient mixed-model association (EMMA), can correct for a wide range of sample structures by explicitly accounting for pairwise relatedness between individuals, using high-density markers to model the phenotype distribution; but such approaches are computationally impractical. We report here a variance component approach implemented in publicly available software, EMMA eXpedited (EMMAX), that reduces the computational time for analyzing large GWAS data sets from years to hours. We apply this method to two human GWAS data sets, performing association analysis for ten quantitative traits from the Northern Finland Birth Cohort and seven common diseases from the Wellcome Trust Case Control Consortium. We find that EMMAX outperforms both principal component analysis and genomic control in correcting for sample structure.


NeuroImage | 2010

Genome-Wide Analysis Reveals Novel Genes Influencing Temporal Lobe Structure with Relevance to Neurodegeneration in Alzheimer’s Disease

Jason L. Stein; Xue Hua; Jonathan H. Morra; Suh Lee; Derrek P. Hibar; April J. Ho; Alex D. Leow; Arthur W. Toga; Jae Hoon Sul; Hyun Min Kang; Eleazar Eskin; Andrew J. Saykin; Li Shen; Tatiana Foroud; Nathan Pankratz; Matthew J. Huentelman; David Craig; Jill D. Gerber; April N. Allen; Jason J. Corneveaux; Dietrich A. Stephan; Jennifer A. Webster; Bryan M. DeChairo; Steven G. Potkin; Clifford R. Jack; Michael W. Weiner; Paul M. Thompson

In a genome-wide association study of structural brain degeneration, we mapped the 3D profile of temporal lobe volume differences in 742 brain MRI scans of Alzheimers disease patients, mildly impaired, and healthy elderly subjects. After searching 546,314 genomic markers, 2 single nucleotide polymorphisms (SNPs) were associated with bilateral temporal lobe volume (P<5 x 10(-7)). One SNP, rs10845840, is located in the GRIN2B gene which encodes the N-methyl-d-aspartate (NMDA) glutamate receptor NR2B subunit. This protein - involved in learning and memory, and excitotoxic cell death - has age-dependent prevalence in the synapse and is already a therapeutic target in Alzheimers disease. Risk alleles for lower temporal lobe volume at this SNP were significantly over-represented in AD and MCI subjects vs. controls (odds ratio=1.273; P=0.039) and were associated with mini-mental state exam scores (MMSE; t=-2.114; P=0.035) demonstrating a negative effect on global cognitive function. Voxelwise maps of genetic association of this SNP with regional brain volumes, revealed intense temporal lobe effects (FDR correction at q=0.05; critical P=0.0257). This study uses large-scale brain mapping for gene discovery with implications for Alzheimers disease.


Twin Research and Human Genetics | 2012

The minnesota center for twin and family research genome-wide association study

Michael B. Miller; Saonli Basu; Julie M. Cunningham; Eleazar Eskin; Steven M. Malone; William S. Oetting; Nicholas J. Schork; Jae Hoon Sul; William G. Iacono; Matt McGue

As part of the Genes, Environment and Development Initiative, the Minnesota Center for Twin and Family Research (MCTFR) undertook a genome-wide association study, which we describe here. A total of 8,405 research participants, clustered in four-member families, have been successfully genotyped on 527,829 single nucleotide polymorphism (SNP) markers using llluminas Human660W-Ouad array. Quality control screening of samples and markers as well as SNP imputation procedures are described. We also describe methods for ancestry control and how the familial clustering of the MCTFR sample can be accounted for in the analysis using a Rapid Feasible Generalized Least Squares algorithm. The rich longitudinal MCTFR assessments provide numerous opportunities for collaboration.


American Journal of Human Genetics | 2016

Colocalization of GWAS and eQTL Signals Detects Target Genes

Farhad Hormozdiari; Martijn van de Bunt; Ayellet V. Segrè; Xiao Li; Jong Wha J. Joo; Michael Bilow; Jae Hoon Sul; Sriram Sankararaman; Bogdan Pasaniuc; Eleazar Eskin

The vast majority of genome-wide association study (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individuals disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue could play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWASs and expression quantitative trail locus (eQTL) studies is challenging because of the uncertainty induced by linkage disequilibrium and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present eCAVIAR, a probabilistic method that has several key advantages over existing methods. First, our method can account for more than one causal variant in any given locus. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Using publicly available eQTL data on 45 different tissues, we demonstrate that eCAVIAR can prioritize likely relevant tissues and target genes for a set of glucose- and insulin-related trait loci.


PLOS Genetics | 2013

Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches

Jae Hoon Sul; Buhm Han; Chun Ye; Ted Choi; Eleazar Eskin

Gene expression data, in conjunction with information on genetic variants, have enabled studies to identify expression quantitative trait loci (eQTLs) or polymorphic locations in the genome that are associated with expression levels. Moreover, recent technological developments and cost decreases have further enabled studies to collect expression data in multiple tissues. One advantage of multiple tissue datasets is that studies can combine results from different tissues to identify eQTLs more accurately than examining each tissue separately. The idea of aggregating results of multiple tissues is closely related to the idea of meta-analysis which aggregates results of multiple genome-wide association studies to improve the power to detect associations. In principle, meta-analysis methods can be used to combine results from multiple tissues. However, eQTLs may have effects in only a single tissue, in all tissues, or in a subset of tissues with possibly different effect sizes. This heterogeneity in terms of effects across multiple tissues presents a key challenge to detect eQTLs. In this paper, we develop a framework that leverages two popular meta-analysis methods that address effect size heterogeneity to detect eQTLs across multiple tissues. We show by using simulations and multiple tissue data from mouse that our approach detects many eQTLs undetected by traditional eQTL methods. Additionally, our method provides an interpretation framework that accurately predicts whether an eQTL has an effect in a particular tissue.


Genetics | 2011

An Optimal Weighted Aggregated Association Test for Identification of Rare Variants Involved in Common Diseases

Jae Hoon Sul; Buhm Han; Dan He; Eleazar Eskin

The advent of next generation sequencing technologies allows one to discover nearly all rare variants in a genomic region of interest. This technological development increases the need for an effective statistical method for testing the aggregated effect of rare variants in a gene on disease susceptibility. The idea behind this approach is that if a certain gene is involved in a disease, many rare variants within the gene will disrupt the function of the gene and are associated with the disease. In this article, we present the rare variant weighted aggregate statistic (RWAS), a method that groups rare variants and computes a weighted sum of differences between case and control mutation counts. We show that our method outperforms the groupwise association test of Madsen and Browning in the disease-risk model that assumes that each variant makes an equally small contribution to disease risk. In addition, we can incorporate prior information into our method of which variants are likely causal. By using simulated data and real mutation screening data of the susceptibility gene for ataxia telangiectasia, we demonstrate that prior information has a substantial influence on the statistical power of association studies. Our method is publicly available at http://genetics.cs.ucla.edu/rarevariants.


Molecular Psychiatry | 2014

Genome-wide association study of monoamine metabolite levels in human cerebrospinal fluid

Jurjen J. Luykx; Steven C. Bakker; Eef Lentjes; M Neeleman; Eric Strengman; L Mentink; Joseph DeYoung; S. de Jong; Jae Hoon Sul; Eleazar Eskin; K.R. van Eijk; J van Setten; Jacobine E. Buizer-Voskamp; Rita M. Cantor; Ake Tzu-Hui Lu; M van Amerongen; E P A van Dongen; Peter Keijzers; Teus H. Kappen; P Borgdorff; Peter Bruins; Eske M. Derks; R.S. Kahn; Roel A. Ophoff

Studying genetic determinants of intermediate phenotypes is a powerful tool to increase our understanding of genotype–phenotype correlations. Metabolic traits pertinent to the central nervous system (CNS) constitute a potentially informative target for genetic studies of intermediate phenotypes as their genetic underpinnings may elucidate etiological mechanisms. We therefore conducted a genome-wide association study (GWAS) of monoamine metabolite (MM) levels in cerebrospinal fluid (CSF) of 414 human subjects from the general population. In a linear model correcting for covariates, we identified one locus associated with MMs at a genome-wide significant level (standardized β=0.32, P=4.92 × 10−8), located 20 kb from SSTR1, a gene involved with brain signal transduction and glutamate receptor signaling. By subsequent whole-genome expression quantitative trait locus (eQTL) analysis, we provide evidence that this variant controls expression of PDE9A (β=0.21; Punadjusted=5.6 × 10−7; Pcorrected=0.014), a gene previously implicated in monoaminergic transmission, major depressive disorder and antidepressant response. A post hoc analysis of loci significantly associated with psychiatric disorders suggested that genetic variation at CSMD1, a schizophrenia susceptibility locus, plays a role in the ratio between dopamine and serotonin metabolites in CSF. The presented DNA and mRNA analyses yielded genome-wide and suggestive associations in biologically plausible genes, two of which encode proteins involved with glutamate receptor functionality. These findings will hopefully contribute to an exploration of the functional impact of the highlighted genes on monoaminergic transmission and neuropsychiatric phenotypes.


Genome Biology | 2014

Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies

Jong Wha J. Joo; Jae Hoon Sul; Buhm Han; Chun Ye; Eleazar Eskin

Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods.


Nature Reviews Genetics | 2013

Mixed models can correct for population structure for genomic regions under selection

Jae Hoon Sul; Eleazar Eskin

An article in this journal by Price et al. (New approaches to population stratification in genome-wide association studies. Nature Reviews Genetics 11, 459–463 (2010))1 showed by simulations that mixed models2 may be susceptible to spurious associations on markers with unusual allele frequency differences between populations, such as markers in regions under selection. They stated that the reason for the spurious associations or inflation of test statistics is because mixed models model population structure as a random effect, although it is a fixed effect. After investigating this problem further, we found that modelling population structure as a random effect is not the cause of inflation and that it is a kinship matrix that determines the performance of mixed models. The kinship matrix defines pairwise genetic relatedness among individuals and is usually estimated by using all genotyped markers. Because most markers in the simulations carried out by Price et al.1 have small allele frequency differences between two populations, a kinship matrix estimated from all markers does not effectively capture the population structure. However, when a kinship matrix is computed from the first principal component or from only the unusually differentiated markers (UDMs), we find that mixed models achieve almost the same performance as EIGENSTRAT3, which is a method that incorporates population structure as a fixed effect (TABLE 1). A kinship matrix that captures the same information as the first principal component vector can be obtained by computing the outer product of the vector: wwT, where w is the first principal component vector. Use of this matrix in mixed models is closely related to including the first principal component as a covariate in EIGENSTRAT. As for the kinship matrix generated from UDMs, because UDMs are not known in advance, one may try to detect these using methods to identify markers under selection, one of which was developed by our group. This method, which is called spatial ancestry analysis (SPA)4, correctly identifies UDMs in the simulations, and mixed models with kinship estimated from those markers detected by SPA have almost the same inflation as kinship from the true UDMs. Although the approach of using kinship matrices from UDMs is effective in capturing broad differences among individuals, it may not capture narrow sample structure, such as family structure. One approach to solving this problem is to include an additional kinship matrix estimated from markers other than UDMs. This means that we have two kinship matrices in mixed models: one that is computed from UDMs and the other that is computed from the rest of markers. This would effectively remove inflation by population structure and other sample structure. We apply this approach to the simulations and show that this approach removes inflation on UDMs (TABLE 1). We also investigated whether UDMs cause the inflation of statistics in real genome-wide association studies (from the 1966 Northern Finland Birth Cohort (NFBC66)5 and the Wellcome Trust Case Control Consortium (WTCCC)6). We observed inflation on a few phenotypes, but no inflation was statistically significant (data not shown). In summary, mixed models are equivalent to methods that consider population structure as a fixed effect when the appropriate kinship matrix is applied. Mixed models can easily be extended to correct for inflation caused by UDMs, although our results failed to identify a case in which the phenomenon reported in Price et al. occurs in practice.


Proceedings of the National Academy of Sciences of the United States of America | 2004

Mapping subsets of scholarly information

Paul Ginsparg; Paul A. Houle; Jae Hoon Sul

We illustrate the use of machine learning techniques to analyze, structure, maintain, and evolve a large online corpus of academic literature. An emerging field of research can be identified as part of an existing corpus, permitting the implementation of a more coherent community structure for its practitioners.

Collaboration


Dive into the Jae Hoon Sul's collaboration.

Top Co-Authors

Avatar

Eleazar Eskin

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dat Duong

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rita M. Cantor

University of California

View shared research outputs
Top Co-Authors

Avatar

Roel A. Ophoff

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eun Yong Kang

University of California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge