Joyee Ghosh
University of Iowa
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joyee Ghosh.
Journal of Computational and Graphical Statistics | 2011
Merlise A. Clyde; Joyee Ghosh; Michael L. Littman
For the problem of model choice in linear regression, we introduce a Bayesian adaptive sampling algorithm (BAS), that samples models without replacement from the space of models. For problems that permit enumeration of all models, BAS is guaranteed to enumerate the model space in 2p iterations where p is the number of potential variables under consideration. For larger problems where sampling is required, we provide conditions under which BAS provides perfect samples without replacement. When the sampling probabilities in the algorithm are the marginal variable inclusion probabilities, BAS may be viewed as sampling models “near” the median probability model of Barbieri and Berger. As marginal inclusion probabilities are not known in advance, we discuss several strategies to estimate adaptively the marginal inclusion probabilities within BAS. We illustrate the performance of the algorithm using simulated and real data and show that BAS can outperform Markov chain Monte Carlo methods. The algorithm is implemented in the R package BAS available at CRAN. This article has supplementary material online.
Journal of Computational and Graphical Statistics | 2009
Joyee Ghosh; David B. Dunson
Factor analytic models are widely used in social sciences. These models have also proven useful for sparse modeling of the covariance structure in multidimensional data. Normal prior distributions for factor loadings and inverse gamma prior distributions for residual variances are a popular choice because of their conditionally conjugate form. However, such prior distributions require elicitation of many hyperparameters and tend to result in poorly behaved Gibbs samplers. In addition, one must choose an informative specification, as high variance prior distributions face problems due to impropriety of the posterior distribution. This article proposes a default, heavy-tailed prior distribution specification, which is induced through parameter expansion while facilitating efficient posterior computation. We also develop an approach to allow uncertainty in the number of factors. The methods are illustrated through simulated examples and epidemiology and toxicology applications. Data sets and computer code used in this article are available online.
American Journal of Obstetrics and Gynecology | 2010
Alison M. Stuebe; Helen Lyon; Amy H. Herring; Joyee Ghosh; Alison Wise; Kari E. North; Anna Maria Siega-Riz
OBJECTIVE We sought to determine whether genetic variants associated with diabetes and obesity predict gestational weight gain. STUDY DESIGN A total of 960 participants in the Pregnancy, Infection, and Nutrition cohorts were genotyped for 27 single-nucleotide polymorphisms (SNPs) associated with diabetes and obesity. RESULTS Among Caucasian and African American women (n = 960), KCNQ1 risk allele carriage was directly associated with weight gain (P < .01). In Bayesian hierarchical models among Caucasian women (n = 628), we found posterior odds ratios >3 for inclusion of TCF2 and THADA SNPs in our models. Among African American women (n = 332), we found associations between risk allele carriage and weight gain for the THADA and INSIG2 SNPs. In Bayesian variable selection models, we found an interaction between the TSPAN8 risk allele and pregravid obesity, with lower weight gain among obese risk allele carriers. CONCLUSION We found evidence that diabetes and obesity risk alleles interact with maternal pregravid body mass index to predict gestational weight gain.
Journal of the American Statistical Association | 2011
Joyee Ghosh; Merlise A. Clyde
Choosing the subset of covariates to use in regression or generalized linear models is a ubiquitous problem. The Bayesian paradigm addresses the problem of model uncertainty by considering models corresponding to all possible subsets of the covariates, where the posterior distribution over models is used to select models or combine them via Bayesian model averaging (BMA). Although conceptually straightforward, BMA is often difficult to implement in practice, since either the number of covariates is too large for enumeration of all subsets, calculations cannot be done analytically, or both. For orthogonal designs with the appropriate choice of prior, the posterior probability of any model can be calculated without having to enumerate the entire model space and scales linearly with the number of predictors, p. In this article we extend this idea to a much broader class of nonorthogonal design matrices. We propose a novel method which augments the observed nonorthogonal design by at most p new rows to obtain a design matrix with orthogonal columns and generate the “missing” response variables in a data augmentation algorithm. We show that our data augmentation approach keeps the original posterior distribution of interest unaltered, and develop methods to construct Rao–Blackwellized estimates of several quantities of interest, including posterior model probabilities of any model, which may not be available from an ordinary Gibbs sampler. Our method can be used for BMA in linear regression and binary regression with nonorthogonal design matrices in conjunction with independent “spike and slab” priors with a continuous prior component that is a Cauchy or other heavy tailed distribution that may be represented as a scale mixture of normals. We provide simulated and real examples to illustrate the methodology. Supplemental materials for the manuscript are available online.
The American Statistician | 2015
Joyee Ghosh; Andrew Ghattas
In this article, we highlight some interesting facts about Bayesian variable selection methods for linear regression models in settings where the design matrix exhibits strong collinearity. We first demonstrate via real data analysis and simulation studies that summaries of the posterior distribution based on marginal and joint distributions may give conflicting results for assessing the importance of strongly correlated covariates. The natural question is which one should be used in practice. The simulation studies suggest that posterior inclusion probabilities and Bayes factors that evaluate the importance of correlated covariates jointly are more appropriate, and some priors may be more adversely affected in such a setting. To obtain a better understanding behind the phenomenon, we study some toy examples with Zellner’s g-prior. The results show that strong collinearity may lead to a multimodal posterior distribution over models, in which joint summaries are more appropriate than marginal summaries. Thus, we recommend a routine examination of the correlation matrix and calculation of the joint inclusion probabilities for correlated covariates, in addition to marginal inclusion probabilities, for assessing the importance of covariates in Bayesian variable selection.
Archive | 2008
Joyee Ghosh; David B. Dunson
Factor analytic models are widely used in social science applications to study latent traits, such as intelligence, creativity, stress, and depression, that cannot be accurately measured with a single variable. In recent years, there has been a rise in the popularity of factor models due to their flexibility in characterizing multivari-ate data. For example, latent factor regression models have been used as a dimensionality reduction tool for modeling of sparse covariance structures in genomic applications (West, 2003; Carvalho et al. 2008). In addition, structural equation models and other generalizations of factor analysis are widely useful in epidemi-ologic studies involving complex health outcomes and exposures (Sanchez et al., 2005). Improvements in Bayesian computation permit the routine implementation of latent factor models via Markov chain Monte Carlo (MCMC) algorithms, and a very broad class of models can be fitted easily using the freely available software package WinBUGS. The literature on methods for fitting and inferences in latent factor models is vast (for recent books, see Loehlin, 2004; Thompson, 2004).
Biometrics | 2011
Joyee Ghosh; Amy H. Herring; Anna Maria Siega-Riz
In this article, we develop a latent class model with class probabilities that depend on subject-specific covariates. One of our major goals is to identify important predictors of latent classes. We consider methodology that allows estimation of latent classes while allowing for variable selection uncertainty. We propose a Bayesian variable selection approach and implement a stochastic search Gibbs sampler for posterior computation to obtain model-averaged estimates of quantities of interest such as marginal inclusion probabilities of predictors. Our methods are illustrated through simulation studies and application to data on weight gain during pregnancy, where it is of interest to identify important predictors of latent weight gain classes.
Journal of The American Water Resources Association | 2015
Gabriele Villarini; Enrico Scoccimarro; Kathleen D. White; Jeffrey R. Arnold; Keith E. Schilling; Joyee Ghosh
Our improved capability to adapt to the future changes in discharge is linked to our capability to predict the magnitude or at least the direction of these changes. For the agricultural United States Midwest, too much or too little water has severe socioeconomic impacts. Here, we focus on the Raccoon River at Van Meter, Iowa, and use a statistical approach to examine projected changes in discharge. We build on statistical models using rainfall and harvested corn and soybean acreage to explain the observed discharge variability. We then use projections of these two predictors to examine the projected discharge response. Results are based on seven global climate models part of the Coupled Model Intercomparison Project Phase 5 and two representative concentration pathways (RCPs 4.5 and 8.5). There is not a strong signal of change in the discharge projections under the RCP 4.5. However, the results for the RCP 8.5 point to a stronger changing signal related to larger projected increases in rainfall, resulting in increased trends, in particular, in the upper part of the discharge distribution (i.e., 60th percentile and above). Examination of two hypothetical agricultural scenarios indicates that these increasing trends could be alleviated by decreasing the extent of the agricultural production. We also discuss how the methodology presented in this study represents a viable approach to move forward with the concept of return period for engineering design and management in a nonstationary world.
Statistics and Computing | 2013
Joyee Ghosh
When multiple data owners possess records on different subjects with the same set of attributes—known as horizontally partitioned data—the data owners can improve analyses by concatenating their databases. However, concatenation of data may be infeasible because of confidentiality concerns. In such settings, the data owners can use secure computation techniques to obtain the results of certain analyses on the integrated database without sharing individual records. We present secure computation protocols for Bayesian model averaging and model selection for both linear regression and probit regression. Using simulations based on genuine data, we illustrate the approach for probit regression, and show that it can provide reasonable model selection outputs.
Climate Dynamics | 2016
Gabriele Villarini; Beda Luitel; Gabriel A. Vecchi; Joyee Ghosh
North Atlantic tropical cyclones (TCs) and hurricanes are responsible for a large number of fatalities and economic damage. Skillful seasonal predictions of the North Atlantic TC activity can provide basic information critical to our improved preparedness. This study focuses on the development of statistical–dynamical seasonal forecasting systems for different quantities related to the frequency and intensity of North Atlantic TCs. These models use only tropical Atlantic and tropical mean sea surface temperatures (SSTs) to describe the variability exhibited by the observational records because they reflect the importance of both local and non-local effects on the genesis and development of TCs in the North Atlantic basin. A set of retrospective forecasts of SSTs by six experimental seasonal-to-interannual prediction systems from the North American Multi-Model Ensemble are used as covariates. The retrospective forecasts are performed over the period 1982–2015. The skill of these statistical–dynamical models is quantified for different quantities (basin-wide number of tropical storms and hurricanes, power dissipation index and accumulated cyclone energy) for forecasts initialized as early as November of the year prior to the season to forecast. The results of this work show that it is possible to obtain skillful retrospective forecasts of North Atlantic TC activity with a long lead time. Moreover, probabilistic forecasts of North Atlantic TC activity for the 2016 season are provided.