Is this you? Create Your Porfile

Eun Yong Kang

University of California, Los Angeles

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eun Yong Kang is active.

Explore More

Publication

Featured researches published by Eun Yong Kang.

Genome Research | 2015

Genetic and environmental control of host-gut microbiota interactions

Elin Org; Brian W. Parks; Jong Wha J. Joo; Benjamin Emert; William Schwartzman; Eun Yong Kang; Margarete Mehrabian; Calvin Pan; Rob Knight; Robert P. Gunsalus; Thomas A. Drake; Eleazar Eskin; Aldons J. Lusis

Genetics provides a potentially powerful approach to dissect host-gut microbiota interactions. Toward this end, we profiled gut microbiota using 16s rRNA gene sequencing in a panel of 110 diverse inbred strains of mice. This panel has previously been studied for a wide range of metabolic traits and can be used for high-resolution association mapping. Using a SNP-based approach with a linear mixed model, we estimated the heritability of microbiota composition. We conclude that, in a controlled environment, the genetic background accounts for a substantial fraction of abundance of most common microbiota. The mice were previously studied for response to a high-fat, high-sucrose diet, and we hypothesized that the dietary response was determined in part by gut microbiota composition. We tested this using a cross-fostering strategy in which a strain showing a modest response, SWR, was seeded with microbiota from a strain showing a strong response, A×B19. Consistent with a role of microbiota in dietary response, the cross-fostered SWR pups exhibited a significantly increased response in weight gain. To examine specific microbiota contributing to the response, we identified various genera whose abundance correlated with dietary response. Among these, we chose Akkermansia muciniphila, a common anaerobe previously associated with metabolic effects. When administered to strain A×B19 by gavage, the dietary response was significantly blunted for obesity, plasma lipids, and insulin resistance. In an effort to further understand host-microbiota interactions, we mapped loci controlling microbiota composition and prioritized candidate genes. Our publicly available data provide a resource for future studies.

Genetics | 2010

Fine Mapping in 94 Inbred Mouse Strains Using a High-Density Haplotype Resource

Andrew Kirby; Hyun Min Kang; Claire M. Wade; Chris Cotsapas; Emrah Kostem; Buhm Han; Nick Furlotte; Eun Yong Kang; Manuel A. Rivas; Molly A. Bogue; Kelly A. Frazer; Frank M. Johnson; Erica Beilharz; D. R. Cox; Eleazar Eskin; Mark J. Daly

The genetics of phenotypic variation in inbred mice has for nearly a century provided a primary weapon in the medical research arsenal. A catalog of the genetic variation among inbred mouse strains, however, is required to enable powerful positional cloning and association techniques. A recent whole-genome resequencing study of 15 inbred mouse strains captured a significant fraction of the genetic variation among a limited number of strains, yet the common use of hundreds of inbred strains in medical research motivates the need for a high-density variation map of a larger set of strains. Here we report a dense set of genotypes from 94 inbred mouse strains containing 10.77 million genotypes over 121,433 single nucleotide polymorphisms (SNPs), dispersed at 20-kb intervals on average across the genome, with an average concordance of 99.94% with previous SNP sets. Through pairwise comparisons of the strains, we identified an average of 4.70 distinct segments over 73 classical inbred strains in each region of the genome, suggesting limited genetic diversity between the strains. Combining these data with genotypes of 7570 gap-filling SNPs, we further imputed the untyped or missing genotypes of 94 strains over 8.27 million Perlegen SNPs. The imputation accuracy among classical inbred strains is estimated at 99.7% for the genotypes imputed with high confidence. We demonstrated the utility of these data in high-resolution linkage mapping through power simulations and statistical power analysis and provide guidelines for developing such studies. We also provide a resource of in silico association mapping between the complex traits deposited in the Mouse Phenome Database with our genotypes. We expect that these resources will facilitate effective designs of both human and mouse studies for dissecting the genetic basis of complex traits.

Genetics | 2014

Identifying Causal Variants at Loci with Multiple Signals of Association

Farhad Hormozdiari; Emrah Kostem; Eun Yong Kang; Bogdan Pasaniuc; Eleazar Eskin

Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20–50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/.

Bioinformatics | 2013

A powerful and efficient set test for genetic markers that handles confounders

Jennifer Listgarten; Christoph Lippert; Eun Yong Kang; Jing Xiang; Carl M. Kadie; David Heckerman

Motivation: Approaches for testing sets of variants, such as a set of rare or common variants within a gene or pathway, for association with complex traits are important. In particular, set tests allow for aggregation of weak signal within a set, can capture interplay among variants and reduce the burden of multiple hypothesis testing. Until now, these approaches did not address confounding by family relatedness and population structure, a problem that is becoming more important as larger datasets are used to increase power. Results: We introduce a new approach for set tests that handles confounders. Our model is based on the linear mixed model and uses two random effects—one to capture the set association signal and one to capture confounders. We also introduce a computational speedup for two random-effects models that makes this approach feasible even for extremely large cohorts. Using this model with both the likelihood ratio test and score test, we find that the former yields more power while controlling type I error. Application of our approach to richly structured Genetic Analysis Workshop 14 data demonstrates that our method successfully corrects for population structure and family relatedness, whereas application of our method to a 15 000 individual Crohn’s disease case–control cohort demonstrates that it additionally recovers genes not recoverable by univariate analysis. Availability: A Python-based library implementing our approach is available at http://mscompbio.codeplex.com. Contact: [email protected] or [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Scientific Reports | 2013

The benefits of selecting phenotype-specific variants for applications of mixed models in genomics

Christoph Lippert; Gerald Quon; Eun Yong Kang; Carl M. Kadie; Jennifer Listgarten; David Heckerman

Applications of linear mixed models (LMMs) to problems in genomics include phenotype prediction, correction for confounding in genome-wide association studies, estimation of narrow sense heritability, and testing sets of variants (e.g., rare variants) for association. In each of these applications, the LMM uses a genetic similarity matrix, which encodes the pairwise similarity between every two individuals in a cohort. Although ideally these similarities would be estimated using strictly variants relevant to the given phenotype, the identity of such variants is typically unknown. Consequently, relevant variants are excluded and irrelevant variants are included, both having deleterious effects. For each application of the LMM, we review known effects and describe new effects showing how variable selection can be used to mitigate them.

PLOS Genetics | 2014

Meta-Analysis Identifies Gene-by-Environment Interactions as Demonstrated in a Study of 4,965 Mice

Eun Yong Kang; Buhm Han; Nicholas A. Furlotte; Jong Wha J. Joo; Diana Shih; Richard C. Davis; Aldons J. Lusis; Eleazar Eskin

Identifying environmentally-specific genetic effects is a key challenge in understanding the structure of complex traits. Model organisms play a crucial role in the identification of such gene-by-environment interactions, as a result of the unique ability to observe genetically similar individuals across multiple distinct environments. Many model organism studies examine the same traits but under varying environmental conditions. For example, knock-out or diet-controlled studies are often used to examine cholesterol in mice. These studies, when examined in aggregate, provide an opportunity to identify genomic loci exhibiting environmentally-dependent effects. However, the straightforward application of traditional methodologies to aggregate separate studies suffers from several problems. First, environmental conditions are often variable and do not fit the standard univariate model for interactions. Additionally, applying a multivariate model results in increased degrees of freedom and low statistical power. In this paper, we jointly analyze multiple studies with varying environmental conditions using a meta-analytic approach based on a random effects model to identify loci involved in gene-by-environment interactions. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional uni- or multi-variate approaches for discovery of gene-by-environment interactions. We apply our new method to combine 17 mouse studies containing in aggregate 4,965 distinct animals. We identify 26 significant loci involved in High-density lipoprotein (HDL) cholesterol, many of which are consistent with previous findings. Several of these loci show significant evidence of involvement in gene-by-environment interactions. An additional advantage of our meta-analysis approach is that our combined study has significantly higher power and improved resolution compared to any single study thus explaining the large number of loci discovered in the combined study.

Proceedings of the National Academy of Sciences of the United States of America | 2017

Identification of individuals by trait prediction using whole-genome sequencing data

Christoph Lippert; Riccardo Sabatini; M. Cyrus Maher; Eun Yong Kang; Seunghak Lee; Okan Arikan; Alena Harley; Axel Bernal; Peter Garst; Victor Lavrenko; Ken Yocum; Theodore Wong; Mingfu Zhu; Wen-Yun Yang; Chris Chang; Tim Lu; Charlie W. H. Lee; Barry Hicks; Smriti Ramakrishnan; Haibao Tang; Chao Xie; Jason Piper; Suzanne Brewerton; Yaron Turpaz; Amalio Telenti; Rhonda K. Roby; Franz J. Och; J. Craig Venter

Significance By associating deidentified genomic data with phenotypic measurements of the contributor, this work challenges current conceptions of genomic privacy. It has significant ethical and legal implications on personal privacy, the adequacy of informed consent, the viability and value of deidentification of data, the potential for police profiling, and more. We invite commentary and deliberation on the implications of these findings for research in genomics, investigatory practices, and the broader legal and ethical implications for society. Although some scholars and commentators have addressed the implications of DNA phenotyping, this work suggests that a deeper analysis is warranted. Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.

PLOS Genetics | 2016

The Genetic Basis of Host Preference and Resting Behavior in the Major African Malaria Vector, Anopheles arabiensis

Bradley J. Main; Yoosook Lee; Heather M. Ferguson; Katharina Kreppel; Anicet Kihonda; Nicodem J. Govella; Travis C. Collier; Anthony J. Cornel; Eleazar Eskin; Eun Yong Kang; Catelyn C. Nieman; Allison M Weakley; Gregory C. Lanzaro

Malaria transmission is dependent on the propensity of Anopheles mosquitoes to bite humans (anthropophily) instead of other dead end hosts. Recent increases in the usage of Long Lasting Insecticide Treated Nets (LLINs) in Africa have been associated with reductions in highly anthropophilic and endophilic vectors such as Anopheles gambiae s.s., leaving species with a broader host range, such as Anopheles arabiensis, as the most prominent remaining source of transmission in many settings. An. arabiensis appears to be more of a generalist in terms of its host choice and resting behavior, which may be due to phenotypic plasticity and/or segregating allelic variation. To investigate the genetic basis of host choice and resting behavior in An. arabiensis we sequenced the genomes of 23 human-fed and 25 cattle-fed mosquitoes collected both in-doors and out-doors in the Kilombero Valley, Tanzania. We identified a total of 4,820,851 SNPs, which were used to conduct the first genome-wide estimates of “SNP heritability” for host choice and resting behavior in this species. A genetic component was detected for host choice (human vs cow fed; permuted P = 0.002), but there was no evidence of a genetic component for resting behavior (indoors versus outside; permuted P = 0.465). A principal component analysis (PCA) segregated individuals based on genomic variation into three groups which were characterized by differences at the 2Rb and/or 3Ra paracentromeric chromosome inversions. There was a non-random distribution of cattle-fed mosquitoes between the PCA clusters, suggesting that alleles linked to the 2Rb and/or 3Ra inversions may influence host choice. Using a novel inversion genotyping assay, we detected a significant enrichment of the standard arrangement (non-inverted) of 3Ra among cattle-fed mosquitoes (N = 129) versus all non-cattle-fed individuals (N = 234; χ2, p = 0.007). Thus, tracking the frequency of the 3Ra in An. arabiensis populations may be of use to infer selection on host choice behavior within these vector populations; possibly in response to vector control. Controlled host-choice assays are needed to discern whether the observed genetic component has a direct relationship with innate host preference. A better understanding of the genetic basis for host feeding behavior in An. arabiensis may also open avenues for novel vector control strategies based on driving genes for zoophily into wild mosquito populations.

Genetics | 2012

Increasing Association Mapping Power and Resolution in Mouse Genetic Studies Through the Use of Meta-Analysis for Structured Populations

Nicholas A. Furlotte; Eun Yong Kang; Atila van Nas; Charles R. Farber; Aldons J. Lusis; Eleazar Eskin

Genetic studies in mouse models have played an integral role in the discovery of the mechanisms underlying many human diseases. The primary mode of discovery has been the application of linkage analysis to mouse crosses. This approach results in high power to identify regions that affect traits, but in low resolution, making it difficult to identify the precise genomic location harboring the causal variant. Recently, a panel of mice referred to as the hybrid mouse diversity panel (HMDP) has been developed to overcome this problem. However, power in this panel is limited by the availability of inbred strains. Previous studies have suggested combining results across multiple panels as a means to increase power, but the methods employed may not be well suited to structured populations, such as the HMDP. In this article, we introduce a meta-analysis-based method that may be used to combine HMDP studies with F2 cross studies to gain power, while increasing resolution. Due to the drastically different genetic structure of F2s and the HMDP, the best way to combine two studies for a given SNP depends on the strain distribution pattern in each study. We show that combining results, while accounting for these patterns, leads to increased power and resolution. Using our method to map bone mineral density, we find that two previously implicated loci are replicated with increased significance and that the size of the associated is decreased. We also map HDL cholesterol and show a dramatic increase in the significance of a previously identified result.

American Journal of Human Genetics | 2016

Imputing Phenotypes for Genome-wide Association Studies

Farhad Hormozdiari; Eun Yong Kang; Michael Bilow; Eyal Ben-David; Chris D. Vulpe; Stela McLachlan; Aldons J. Lusis; Buhm Han; Eleazar Eskin

Genome-wide association studies (GWASs) have been successful in detecting variants correlated with phenotypes of clinical interest. However, the power to detect these variants depends on the number of individuals whose phenotypes are collected, and for phenotypes that are difficult to collect, the sample size might be insufficient to achieve the desired statistical power. The phenotype of interest is often difficult to collect, whereas surrogate phenotypes or related phenotypes are easier to collect and have already been collected in very large samples. This paper demonstrates how we take advantage of these additional related phenotypes to impute the phenotype of interest or target phenotype and then perform association analysis. Our approach leverages the correlation structure between phenotypes to perform the imputation. The correlation structure can be estimated from a smaller complete dataset for which both the target and related phenotypes have been collected. Under some assumptions, the statistical power can be computed analytically given the correlation structure of the phenotypes used in imputation. In addition, our method can impute the summary statistic of the target phenotype as a weighted linear combination of the summary statistics of related phenotypes. Thus, our method is applicable to datasets for which we have access only to summary statistics and not to the raw genotypes. We illustrate our approach by analyzing associated loci to triglycerides (TGs), body mass index (BMI), and systolic blood pressure (SBP) in the Northern Finland Birth Cohort dataset.

Explore More