Minsun Song
National Institutes of Health
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Minsun Song.
Genetic Epidemiology | 2009
Omar De la Cruz; Xiaoquan Wen; Baoguan Ke; Minsun Song; Dan L. Nicolae
In the setting of genome‐wide association studies, we propose a method for assigning a measure of significance to pre‐defined sets of markers in the genome. The sets can be genes, conserved regions, or groups of genes such as pathways. Using the proposed methods and algorithms, evidence for association between a particular functional unit and a disease status can be obtained not just by the presence of a strong signal from a SNP within it, but also by the combination of several simultaneous weaker signals that are not strongly correlated. This approach has several advantages. First, moderately strong signals from different SNPs are combined to obtain a much stronger signal for the set, therefore increasing power. Second, in combination with methods that provide information on untyped markers, it leads to results that can be readily combined across studies and platforms that might use different SNPs. Third, the results are easy to interpret, since they refer to functional sets of markers that are likely to behave as a unit in their phenotypic effect. Finally, the availability of gene‐level P‐values for association is the first step in developing methods that integrate information from pathways and networks with genome‐wide association data, and these can lead to a better understanding of the complex traits genetic architecture. The power of the approach is investigated in simulated and real datasets. Novel Crohns disease associations are found using the WTCCC data. Genet. Epidemiol. 34: 222–231, 2010.
Genetic Epidemiology | 2009
Minsun Song; Dan L. Nicolae
There is a growing recognition that interactions (gene‐gene and gene‐environment) can play an important role in common disease etiology. The development of cost‐effective genotyping technologies has made genome‐wide association studies the preferred tool for searching for loci affecting disease risk. These studies are characterized by a large number of investigated SNPs, and efficient statistical methods are even more important than in classical association studies that are done with a small number of markers. In this article we propose a novel gene‐gene interaction test that is more powerful than classical methods. The increase in power is due to the fact that the proposed method incorporates reasonable constraints in the parameter space. The test for both association and interaction is based on a likelihood ratio statistic that has a x̄2 distribution asymptotically. We also discuss the definitions used for “no interaction” and argue that tests for pure interaction are useful in genome‐wide studies, especially when using two‐stage strategies where the analyses in the second stage are done on pairs of loci for which at least one is associated with the trait. Genet. Epidemiol. 33:386–393, 2009.
Bioinformatics | 2016
Wei Hao; Minsun Song; John D. Storey
Abstract Motivation: Modern population genetics studies typically involve genome-wide genotyping of individuals from a diverse network of ancestries. An important problem is how to formulate and estimate probabilistic models of observed genotypes that account for complex population structure. The most prominent work on this problem has focused on estimating a model of admixture proportions of ancestral populations for each individual. Here, we instead focus on modeling variation of the genotypes without requiring a higher-level admixture interpretation. Results: We formulate two general probabilistic models, and we propose computationally efficient algorithms to estimate them. First, we show how principal component analysis can be utilized to estimate a general model that includes the well-known Pritchard–Stephens–Donnelly admixture model as a special case. Noting some drawbacks of this approach, we introduce a new ‘logistic factor analysis’ framework that seeks to directly model the logit transformation of probabilities underlying observed genotypes in terms of latent variables that capture population structure. We demonstrate these advances on data from the Human Genome Diversity Panel and 1000 Genomes Project, where we are able to identify SNPs that are highly differentiated with respect to structure while making minimal modeling assumptions. Availability and Implementation: A Bioconductor R package called lfa is available at http://www.bioconductor.org/packages/release/bioc/html/lfa.html. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Biostatistics | 2015
Minsun Song; Peter Kraft; Amit Joshi; Myrto Barrdahl; Nilanjan Chatterjee
Risk-prediction models need careful calibration to ensure they produce unbiased estimates of risk for subjects in the underlying population given their risk-factor profiles. As subjects with extreme high or low risk may be the most affected by knowledge of their risk estimates, checking the adequacy of risk models at the extremes of risk is very important for clinical applications. We propose a new approach to test model calibration targeted toward extremes of disease risk distribution where standard goodness-of-fit tests may lack power due to sparseness of data. We construct a test statistic based on model residuals summed over only those individuals who pass high and/or low risk thresholds and then maximize the test statistic over different risk thresholds. We derive an asymptotic distribution for the max-test statistic based on analytic derivation of the variance-covariance function of the underlying Gaussian process. The method is applied to a large case-control study of breast cancer to examine joint effects of common single nucleotide polymorphisms (SNPs) discovered through recent genome-wide association studies. The analysis clearly indicates a non-additive effect of the SNPs on the scale of absolute risk, but an excellent fit for the linear-logistic model even at the extremes of risks.
International Journal of Epidemiology | 2018
Anja Rudolph; Minsun Song; Mark N. Brook; Roger L. Milne; Nasim Mavaddat; Kyriaki Michailidou; Manjeet K. Bolla; Qin Wang; Joe Dennis; Amber N Wilcox; John L. Hopper; Melissa C. Southey; Renske Keeman; Peter A. Fasching; Matthias W. Beckmann; Manuela Gago-Dominguez; Jose Esteban Castelao; Pascal Guénel; Thérèse Truong; Stig E. Bojesen; Henrik Flyger; Hermann Brenner; Volker Arndt; Hiltrud Brauch; Thomas Brüning; Arto Mannermaa; Veli-Matti Kosma; Diether Lambrechts; Machteld Keupers; Fergus J. Couch
Background Polygenic risk scores (PRS) for breast cancer can be used to stratify the population into groups at substantially different levels of risk. Combining PRS and environmental risk factors will improve risk prediction; however, integrating PRS into risk prediction models requires evaluation of their joint association with known environmental risk factors. Methods Analyses were based on data from 20 studies; datasets analysed ranged from 3453 to 23 104 invasive breast cancer cases and similar numbers of controls, depending on the analysed environmental risk factor. We evaluated joint associations of a 77-single nucleotide polymorphism (SNP) PRS with reproductive history, alcohol consumption, menopausal hormone therapy (MHT), height and body mass index (BMI). We tested the null hypothesis of multiplicative joint associations for PRS and each of the environmental factors, and performed global and tail-based goodness-of-fit tests in logistic regression models. The outcomes were breast cancer overall and by estrogen receptor (ER) status. Results The strongest evidence for a non-multiplicative joint associations with the 77-SNP PRS was for alcohol consumption (P-interaction = 0.009), adult height (P-interaction = 0.025) and current use of combined MHT (P-interaction = 0.038) in ER-positive disease. Risk associations for these factors by percentiles of PRS did not follow a clear dose-response. In addition, global and tail-based goodness of fit tests showed little evidence for departures from a multiplicative risk model, with alcohol consumption showing the strongest evidence for ER-positive disease (P = 0.013 for global and 0.18 for tail-based tests). Conclusions The combined effects of the 77-SNP PRS and environmental risk factors for breast cancer are generally well described by a multiplicative model. Larger studies are required to confirm possible departures from the multiplicative model for individual risk factors, and assess models specific for ER-negative disease.
Genetic Epidemiology | 2018
Minsun Song; William Wheeler; Neil E. Caporaso; Maria Teresa Landi; Nilanjan Chatterjee
Genome‐wide association studies (GWAS) are now routinely imputed for untyped single nucleotide polymorphisms (SNPs) based on various powerful statistical algorithms for imputation trained on reference datasets. The use of predicted allele counts for imputed SNPs as the dosage variable is known to produce valid score test for genetic association. In this paper, we investigate how to best handle imputed SNPs in various modern complex tests for genetic associations incorporating gene–environment interactions. We focus on case‐control association studies where inference for an underlying logistic regression model can be performed using alternative methods that rely on varying degree on an assumption of gene–environment independence in the underlying population. As increasingly large‐scale GWAS are being performed through consortia effort where it is preferable to share only summary‐level information across studies, we also describe simple mechanisms for implementing score tests based on standard meta‐analysis of “one‐step” maximum‐likelihood estimates across studies. Applications of the methods in simulation studies and a dataset from GWAS of lung cancer illustrate ability of the proposed methods to maintain type‐I error rates for the underlying testing procedures. For analysis of imputed SNPs, similar to typed SNPs, the retrospective methods can lead to considerable efficiency gain for modeling of gene–environment interactions under the assumption of gene–environment independence. Methods are made available for public use through CGEN R software package.
BMC Genetics | 2015
Minsun Song
BackgroundTesting gene-gene interaction in genome-wide association studies generally yields lower power than testing marginal association. Meta-analysis that combines different genotyping platforms is one method used to increase power when assessing gene-gene interactions, which requires a test for interaction on untyped SNPs. However, to date, formal statistical tests for gene-gene interaction on untyped SNPs have not been thoroughly addressed. The key concern for gene-gene interaction testing on untyped SNPs located on different chromosomes is that the pair of genes might not be independent and the current generation of imputation methods provides imputed genotypes at the marginal accuracy.ResultsIn this study we address this challenge and describe a novel method for testing gene-gene interaction on marginally imputed values of untyped SNPs. We show that our novel Wald-type test statistics for interactions with and without constraints in the interaction parameters follow the asymptotic distributions which are the same as those of the corresponding tests for typed SNPs. Through simulations, we show that the proposed tests properly control type I error and are more powerful than the extension of the classical dosage method to interaction tests. The increase in power results from a proper correction for the uncertainty in imputation through the variance estimator using the jackknife, one of resampling techniques. We apply the method to detect interactions between SNPs on chromosomes 5 and 15 on lung cancer data. The inclusion of the results at the untyped SNPs provides a much more detailed information at the regions of interest.ConclusionsAs demonstrated by the simulation studies and real data analysis, our approaches outperform the application of traditional dosage method to detection of gene-gene interaction in terms of power while providing control of the type I error.
Nature Genetics | 2015
Minsun Song; Wei Hao; John D. Storey
Human Genetics | 2015
H. Dean Hosgood; Minsun Song; Chao A. Hsiung; Zhihua Yin; Xiao-Ou Shu; Zhaoming Wang; Nilanjan Chatterjee; Wei Zheng; Neil E. Caporaso; Laurie Burdette; Meredith Yeager; Sonja I. Berndt; Maria Teresa Landi; Chien-Jen Chen; Gee Chen Chang; Chin Fu Hsiao; Ying-Huang Tsai; Li Hsin Chien; Kuan-Yu Chen; Ming Shyan Huang; Wu-Chou Su; Yuh-Min Chen; Chung Hsing Chen; Tsung Ying Yang; Chih Liang Wang; Jen Yu Hung; Chien-Chung Lin; Reury Perng Perng; Chih Yi Chen; Kun Chieh Chen
international symposium on bioinformatics research and applications | 2008
Dan L. Nicolae; Omar De la Cruz; William Wen; Baoguan Ke; Minsun Song