Jinbo Chen
University of Pennsylvania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jinbo Chen.
American Journal of Human Genetics | 2012
Hugues Aschard; Jinbo Chen; Marilyn C. Cornelis; Lori B. Chibnik; Elizabeth W. Karlson; Peter Kraft
Genome-wide association studies have identified hundreds of common genetic variants associated with the risk of multifactorial diseases. However, their impact on discrimination and risk prediction is limited. It has been suggested that the identification of gene-gene (G-G) and gene-environment (G-E) interactions would improve disease prediction and facilitate prevention. We conducted a simulation study to explore the potential improvement in discrimination if G-G and G-E interactions exist and are known. We used three diseases (breast cancer, type 2 diabetes, and rheumatoid arthritis) as motivating examples. We show that the inclusion of G-G and G-E interaction effects in risk-prediction models is unlikely to dramatically improve the discrimination ability of these models.
Statistics in Medicine | 2012
Tyler J. VanderWeele; Bhramar Mukherjee; Jinbo Chen
We develop a sensitivity analysis technique to assess the sensitivity of interaction analyses to unmeasured confounding. We give bias formulas for sensitivity analysis for interaction under unmeasured confounding on both additive and multiplicative scales. We provide simplified formulas in the case in which either one of the two factors does not interact with the unmeasured confounder in its effects on the outcome. An interesting consequence of the results is that if the two exposures of interest are independent (e.g., gene-environment independence), even under unmeasured confounding, if the estimate of the interaction is nonzero, then either there is a true interaction between the two factors or there is an interaction between one of the factors and the unmeasured confounder; an interaction must be present in either scenario. We apply the results to two examples drawn from the literature.
Human Heredity | 2007
Jinbo Chen; Nilanjan Chatterjee
In case-control studies, the assessment of the association between a binary disease outcome and a single nucleotide polymorphism (SNP) is often based on comparing the observed genotype distribution for the cases against that for the controls. In this article, we investigate an alternative analytic strategy in which the observed genotype frequencies of cases are compared against the expected genotype frequencies of controls assuming Hardy-Weinberg Equilibrium (HWE). Assuming HWE for controls, we derive closed-form expressions for maximum likelihood estimates of the genotype-specific disease odds ratio (OR) parameters and related variance-covariances. Based on these estimates and their variance-covariance structure, we then propose a two-degree-of-freedom test for disease-SNP association. We show that the proposed test can have substantially higher power than a variety of existing methods, especially when the true effect of the SNP is recessive. We also obtain analytic expressions for the bias of the OR estimates when the underlying HWE assumption is violated. We conclude that the novel test would be particularly useful for analyzing data from the initial ‘screening’ stages of contemporary multi-stage association studies.
Statistics in Medicine | 2012
Jinbo Chen; Guolian Kang; Tyler J. VanderWeele; Cuilin Zhang; Bhramar Mukherjee
It is important to investigate whether genetic susceptibility variants exercise the same effects in populations that are differentially exposed to environmental risk factors. Here, we assess the power of four two-stage case-control design strategies for assessing multiplicative gene-environment (G-E) interactions or for assessing genetic or environmental effects in the presence of G-E interactions. We considered a di-allelic single nucleotide polymorphism G and a binary environmental variable E under the constraints of G-E independence and Hardy-Weinberg equilibrium and used the Wald statistic for all tests. We concluded that (i) for testing G-E interactions or genetic effects in the presence of G-E interactions when data for E are fully available, it is preferable to ascertain data for G in a subsample of cases with similar numbers of exposed and unexposed and a random subsample of controls; and (ii) for testing G-E interactions or environmental effects in the presence of G-E interactions when data for G are fully available, it is preferable to ascertain data for E in a subsample of cases that has similar numbers for each genotype and a random subsample of controls. In addition, supplementing external control data to an existing case-control sample leads to improved power for assessing effects of G or E in the presence of G-E interactions.
Biometrics | 2011
Tianxi Cai; Thomas A. Gerds; Yingye Zheng; Jinbo Chen
Recently meta-analysis has been widely utilized to combine information across multiple studies to evaluate a common effect. Integrating data from similar studies is particularly useful in genomic studies where the individual study sample sizes are not large relative to the number of parameters of interest. In this article, we are interested in developing robust prognostic rules for the prediction of t-year survival based on multiple studies. We propose to construct a composite score for prediction by fitting a stratified semiparametric transformation model that allows the studies to have related but not identical outcomes. To evaluate the accuracy of the resulting score, we provide point and interval estimators for the commonly used accuracy measures including the time-specific receiver operating characteristic curves, and positive and negative predictive values. We apply the proposed procedures to develop prognostic rules for the 5-year survival of breast cancer patients based on five breast cancer genomic studies.
Genetic Epidemiology | 2009
Jinbo Chen; Haitao Zheng; Melissa L. Wilson
It is well recognized that both maternal and fetal genes could contribute to susceptibility for obstetric complications. Logistic regression models are usually adopted to model the separate or joint action of maternal and fetal loci with case‐control data. The standard likelihood ratio tests (LRTs) can be used to test the significance of appropriate odds ratio parameters. This method, although simple to implement, fails to exploit a unique feature of genetic epidemiology studies of obstetric complications. Specifically, it does not take into consideration the correlation between the maternal and offspring genomes. We propose novel LRT that take advantage of this information by incorporating the fact that half of a childs genome is inherited from the mother. Our methods have substantially improved power for detecting marginal, main, and interactive maternal and fetal genotype effects, as evidenced by results from extensive simulation studies. We demonstrate our new methods by applying them to the analysis of data from a pilot study of preeclampsia. Genet. Epidemiol. 33:526–538, 2009.
Biometrics | 2010
Kai Yu; William Wheeler; Qizhai Li; Andrew W. Bergen; Neil E. Caporaso; Nilanjan Chatterjee; Jinbo Chen
In the genetic study of complex traits, especially behavior related ones, such as smoking and alcoholism, usually several phenotypic measurements are obtained for the description of the complex trait, but no single measurement can quantify fully the complicated characteristics of the symptom because of our lack of understanding of the underlying etiology. If those phenotypes share a common genetic mechanism, rather than studying each individual phenotype separately, it is more advantageous to analyze them jointly as a multivariate trait to enhance the power to identify associated genes. We propose a multilocus association test for the study of multivariate traits. The test is derived from a partially linear tree-based regression model for multiple outcomes. This novel tree-based model provides a formal statistical testing framework for the evaluation of the association between a multivariate outcome and a set of candidate predictors, such as markers within a gene or pathway, while accommodating adjustment for other covariates. Through simulation studies we show that the proposed method has an acceptable type I error rate and improved power over the univariate outcome analysis, which studies each component of the complex trait separately with multiple-comparison adjustment. A candidate gene association study of multiple smoking-related phenotypes is used to demonstrate the application and advantages of this new method. The proposed method is general enough to be used for the assessment of the joint effect of a set of multiple risk factors on a multivariate outcome in other biomedical research settings.
Biometrics | 2012
Jinbo Chen; Dongyu Lin; Hagit Hochner
Case-control mother-child pair design represents a unique advantage for dissecting genetic susceptibility of complex traits because it allows the assessment of both maternal and offspring genetic compositions. This design has been widely adopted in studies of obstetric complications and neonatal outcomes. In this work, we developed an efficient statistical method for evaluating joint genetic and environmental effects on a binary phenotype. Using a logistic regression model to describe the relationship between the phenotype and maternal and offspring genetic and environmental risk factors, we developed a semiparametric maximum likelihood method for the estimation of odds ratio association parameters. Our method is novel because it exploits two unique features of the study data for the parameter estimation. First, the correlation between maternal and offspring SNP genotypes can be specified under the assumptions of random mating, Hardy-Weinberg equilibrium, and Mendelian inheritance. Second, environmental exposures are often not affected by offspring genes conditional on maternal genes. Our method yields more efficient estimates compared with the standard prospective method for fitting logistic regression models to case-control data. We demonstrated the performance of our method through extensive simulation studies and the analysis of data from the Jerusalem Perinatal Study.
Genetic Epidemiology | 2013
Dongyu Lin; Clarice R. Weinberg; Rui Feng; Hagit Hochner; Jinbo Chen
Parent‐of‐origin effects have been pointed out to be one plausible source of the heritability that was unexplained by genome‐wide association studies. Here, we consider a case‐control mother‐child pair design for studying parent‐of‐origin effects of offspring genes on neonatal/early‐life disorders or pregnancy‐related conditions. In contrast to the standard case‐control design, the case‐control mother‐child pair design contains valuable parental information and therefore permits powerful assessment of parent‐of‐origin effects. Suppose the region under study is in Hardy‐Weinberg equilibrium, inheritance is Mendelian at the diallelic locus under study, there is random mating in the source population, and the SNP under study is not related to risk for the phenotype under study because of linkage disequilibrium (LD) with other SNPs. Using a maximum likelihood method that simultaneously assesses likely parental sources and estimates effect sizes of the two offspring genotypes, we investigate the extent of power increase for testing parent‐of‐origin effects through the incorporation of genotype data for adjacent markers that are in LD with the test locus. Our method does not need to assume the outcome is rare because it exploits supplementary information on phenotype prevalence. Analysis with simulated SNP data indicates that incorporating genotype data for adjacent markers greatly help recover the parent‐of‐origin information. This recovery can sometimes substantially improve statistical power for detecting parent‐of‐origin effects. We demonstrate our method by examining parent‐of‐origin effects of the gene PPARGC1A on low birth weight using data from 636 mother‐child pairs in the Jerusalem Perinatal Study.
Genetic Epidemiology | 2009
Sheng Luo; Bhramar Mukherjee; Jinbo Chen; Nilanjan Chatterjee
Population‐based case‐control design has become one of the most popular approaches for conducting genome‐wide association scans for rare diseases like cancer. In this article, we propose a novel method for improving the power of the widely used single‐single‐nucleotide polymorphism (SNP) two‐degrees‐of‐freedom (2 d.f.) association test for case‐control studies by exploiting the common assumption of Hardy‐Weinberg Equilibrium (HWE) for the underlying population. A key feature of the method is that it can relax the assumed model constraints via a completely data‐adaptive shrinkage estimation approach so that the number of false‐positive results due to the departure of HWE is controlled. The method is computationally simple and is easily scalable to association tests involving hundreds of thousands or millions of genetic markers. Simulation studies as well as an application involving data from a real genome‐wide association study illustrate that the proposed method is very robust for large‐scale association studies and can improve the power for detecting susceptibility SNPs with recessive effects, when compared to existing methods. Implications of the general estimation strategy beyond the simple 2 d.f. association test are discussed. Genet. Epidemiol. 33:740–750, 2009. Published 2009 Wiley‐Liss, Inc.