Featured Researches

Applications

Bayesian Matrix Completion for Hypothesis Testing

High-throughput screening (HTS) is a well-established technology that rapidly and efficiently screens thousands of chemicals for potential toxicity. Massive testing using HTS primarily aims to differentiate active vs inactive chemicals for different types of biological endpoints. However, even using high-throughput technology, it is not feasible to test all possible combinations of chemicals and assay endpoints, resulting in a majority of missing combinations. Our goal is to derive posterior probabilities of activity for each chemical by assay endpoint combination, addressing the sparsity of HTS data. We propose a Bayesian hierarchical framework, which borrows information across different chemicals and assay endpoints in a low-dimensional latent space. This framework facilitates out-of-sample prediction of bioactivity potential for new chemicals not yet tested. Furthermore, this paper makes a novel attempt in toxicology to simultaneously model heteroscedastic errors as well as a nonparametric mean function. It leads to a broader definition of activity whose need has been suggested by toxicologists. Simulation studies demonstrate that our approach shows superior performance with more realistic inferences on activity than current standard methods. Application to an HTS data set identifies chemicals that are most likely active for two disease outcomes: neurodevelopmental disorders and obesity. Code is available on Github.

Read more
Applications

Bayesian Non-Parametric Detection Heterogeneity in Ecological Models

Detection heterogeneity is inherent to ecological data, arising from factors such as varied terrain or weather conditions, inconsistent sampling effort, or heterogeneity of individuals themselves. Incorporating additional covariates into a statistical model is one approach for addressing heterogeneity, but is no guarantee that any set of measurable covariates will adequately address the heterogeneity, and the presence of unmodelled heterogeneity has been shown to produce biases in the resulting inferences. Other approaches for addressing heterogeneity include the use of random effects, or finite mixtures of homogeneous subgroups. Here, we present a non-parametric approach for modelling detection heterogeneity for use in a Bayesian hierarchical framework. We employ a Dirichlet process mixture which allows a flexible number of population subgroups without the need to pre-specify this number of subgroups as in a finite mixture. We describe this non-parametric approach, then consider its use for modelling detection heterogeneity in two common ecological motifs: capture-recapture and occupancy modelling. For each, we consider a homogeneous model, finite mixture models, and the non-parametric approach. We compare these approaches using two simulation studies, and observe the non-parametric approach as the most reliable method for addressing varying degrees of heterogeneity. We also present two real-data examples, and compare the inferences resulting from each modelling approach. Analyses are carried out using the \texttt{nimble} package for \texttt{R}, which provides facilities for Bayesian non-parametric models.

Read more
Applications

Bayesian Nonparametric Multivariate Spatial Mixture Mixed Effects Models with Application to American Community Survey Special Tabulations

Leveraging multivariate spatial dependence to improve the precision of estimates using American Community Survey data and other sample survey data has been a topic of recent interest among data-users and federal statistical agencies. One strategy is to use a multivariate spatial mixed effects model with a Gaussian observation model and latent Gaussian process model. In practice, this works well for a wide range of tabulations. Nevertheless, in situations that exhibit heterogeneity among geographies and/or sparsity in the data, the Gaussian assumptions may be problematic and lead to underperformance. To remedy these situations, we propose a multivariate hierarchical Bayesian nonparametric mixed effects spatial mixture model to increase model flexibility. The number of clusters is chosen automatically in a data-driven manner. The effectiveness of our approach is demonstrated through a simulation study and motivating application of special tabulations for American Community Survey data.

Read more
Applications

Bayesian Sparse Mediation Analysis with Targeted Penalization of Natural Indirect Effects

Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional mediators while directly target penalization of the natural indirect effect (NIE) for active mediator identification. Here, we develop two novel prior models for identification of active mediators in high-dimensional mediation analysis through penalizing NIEs in a Bayesian paradigm. Both methods specify a joint prior distribution on the exposure-mediator effect and mediator-outcome effect with either (a) a four-component Gaussian mixture prior or (b) a product threshold Gaussian prior. By jointly modeling the two parameters that contribute to the NIE, the proposed methods enable penalization on their product in a targeted way. Resultant inference can take into account the four-component composite structure underlying the NIE. We show through simulations that the proposed methods improve both selection and estimation accuracy compared to other competing methods. We applied our methods for an in-depth analysis of two ongoing epidemiologic studies: the Multi-Ethnic Study of Atherosclerosis (MESA) and the LIFECODES birth cohort. The identified active mediators in both studies reveal important biological pathways for understanding disease mechanisms.

Read more
Applications

Bayesian Variable Selection for Cox Regression Model with Spatially Varying Coefficients with Applications to Louisiana Respiratory Cancer Data

The Cox regression model is a commonly used model in survival analysis. In public health studies, clinical data are often collected from medical service providers of different locations. There are large geographical variations in the covariate effects on survival rates from particular diseases. In this paper, we focus on the variable selection issue for the Cox regression model with spatially varying coefficients. We propose a Bayesian hierarchical model which incorporates a horseshoe prior for sparsity and a point mass mixture prior to determine whether a regression coefficient is spatially varying. An efficient two-stage computational method is used for posterior inference and variable selection. It essentially applies the existing method for maximizing the partial likelihood for the Cox model by site independently first, and then applying an MCMC algorithm for variable selection based on results of the first stage. Extensive simulation studies are carried out to examine the empirical performance of the proposed method. Finally, we apply the proposed methodology to analyzing a real data set on respiratory cancer in Louisiana from the SEER program.

Read more
Applications

Bayesian analysis of population health data

The analysis of population-wide datasets can provide insight on the health status of large populations so that public health officials can make data-driven decisions. The analysis of such datasets often requires highly parameterized models with different types of fixed and randoms effects to account for risk factors, spatial and temporal variations, multilevel effects and other sources on uncertainty. To illustrate the potential of Bayesian hierarchical models, a dataset of about 500 000 inhabitants released by the Polish National Health Fund containing information about ischemic stroke incidence for a 2-year period is analyzed using different types of models. Spatial logistic regression and survival models are considered for analyzing the individual probabilities of stroke and the times to the occurrence of an ischemic stroke event. Demographic and socioeconomic variables as well as drug prescription information are available at an individual level. Spatial variation is considered by means of region-level random effects.

Read more
Applications

Bayesian beta nonlinear models with constrained parameters to describe ruminal degradation kinetics

This paper proposes a beta nonlinear model to describe the kinetics of ruminal degradation. The model generalizes the widely applied model proposed by Orskov and McDonald (1979) because the proportion of degraded food is modelled with the beta distribution and according to the Bayesian perspective. A default method to obtain a prior distribution is proposed for this model, where the application of standard methodologies, such as the Jeffreys prior (Jeffreys (1961)) or the reference priors (Bernardo (1979)), involves serious difficulties. This methodology is generalized to a larger class of models, and an implementation of the method in OpenBUGS is shown.

Read more
Applications

Bayesian hierarchical modeling and analysis for physical activity trajectories using actigraph data

Rapid developments in streaming data technologies are continuing to generate increased interest in monitoring human activity. Wearable devices, such as wrist-worn sensors that monitor gross motor activity (actigraphy), have become prevalent. An actigraph unit continually records the activity level of an individual, producing a very large amount of data at a high-resolution that can be immediately downloaded and analyzed. While this kind of \textit{big data} includes both spatial and temporal information, the variation in such data seems to be more appropriately modeled by considering stochastic evolution through time while accounting for spatial information separately. We propose a comprehensive Bayesian hierarchical modeling and inferential framework for actigraphy data reckoning with the massive sizes of such databases while attempting to offer full inference. Building upon recent developments in this field, we construct Nearest Neighbour Gaussian Processes (NNGPs) for actigraphy data to compute at large temporal scales. More specifically, we construct a temporal NNGP and we focus on the optimized implementation of the collapsed algorithm in this specific context. This approach permits improved model scaling while also offering full inference. We test and validate our methods on simulated data and subsequently apply and verify their predictive ability on an original dataset concerning a health study conducted by the Fielding School of Public Health of the University of California, Los Angeles.

Read more
Applications

Bayesian hierarchical models for the prediction of the driver flow and passenger waiting times in a stochastic carpooling service

Carpooling is an integral component in smart carbon-neutral cities, in particular to facilitate homework commuting. We study an innovative carpooling service developed by the start-up Ecov which specialises in homework commutes in peri-urban and rural regions. When a passenger makes a carpooling request, a designated driver is not assigned as in a traditional carpooling service; rather the passenger waits for the first driver, from a population of non-professional drivers who are already en route, to arrive. We propose a two-stage Bayesian hierarchical model to overcome the considerable difficulties, due to the sparsely observed driver and passenger data from an embryonic stochastic carpooling service, to deliver high-quality predictions of driver flow and passenger waiting times. The first stage focuses on the driver flow, whose predictions are aggregated at the daily level to compensate the data sparsity. The second stage processes this single daily driver flow into sub-daily (e.g. hourly) predictions of the passenger waiting times. We demonstrate that our model mostly outperforms frequentist and non-hierarchical Bayesian methods for observed data from operational carpooling service in Lyon, France and we also validated our model on simulated data.

Read more
Applications

Bayesian nonparametric analysis for the detection of spikes in noisy calcium imaging data

Recent advancements in miniaturized fluorescence microscopy have made it possible to investigate neuronal responses to external stimuli in awake behaving animals through the analysis of intra-cellular calcium signals. An on-going challenge is deconvoluting the temporal signals to extract the spike trains from the noisy calcium signals' time-series. In this manuscript, we propose a nested Bayesian finite mixture specification that allows for the estimation of spiking activity and, simultaneously, reconstructing the distributions of the calcium transient spikes' amplitudes under different experimental conditions. The proposed model leverages two nested layers of random discrete mixture priors to borrow information between experiments and discover similarities in the distributional patterns of neuronal responses to different stimuli. Furthermore, the spikes' intensity values are also clustered within and between experimental conditions to determine the existence of common (recurring) response amplitudes. Simulation studies and the analysis of a data set from the Allen Brain Observatory show the effectiveness of the method in clustering and detecting neuronal activities.

Read more

Ready to get started?

Join us today