Featured Researches

Methodology

Bayesian Bandwidths in Semiparametric Modelling for Nonnegative Orthant Data with Diagnostics

Multivariate nonnegative orthant data are real vectors bounded to the left by the null vector, and they can be continuous, discrete or mixed. We first review the recent relative variability indexes for multivariate nonnegative continuous and count distributions. As a prelude, the classification of two comparable distributions having the same mean vector is done through under-, equi- and over-variability with respect to the reference distribution. Multivariate associated kernel estimators are then reviewed with new proposals that can accommodate any nonnegative orthant dataset. We focus on bandwidth matrix selections by adaptive and local Bayesian methods for semicontinuous and counting supports, respectively. We finally introduce a flexible semiparametric approach for estimating all these distributions on nonnegative supports. The corresponding estimator is directed by a given parametric part, and a nonparametric part which is a weight function to be estimated through multivariate associated kernels. A diagnostic model is also discussed to make an appropriate choice between the parametric, semiparametric and nonparametric approaches. The retention of pure nonparametric means the inconvenience of parametric part used in the modelization. Multivariate real data examples in semicontinuous setup as reliability are gradually considered to illustrate the proposed approach. Concluding remarks are made for extension to other multiple functions.

Read more
Methodology

Bayesian Cumulative Probability Models for Continuous and Mixed Outcomes

Ordinal cumulative probability models (CPMs) -- also known as cumulative link models -- such as the proportional odds regression model are typically used for discrete ordered outcomes, but can accommodate both continuous and mixed discrete/continuous outcomes since these are also ordered. Recent papers by Liu et al. and Tian et al. describe ordinal CPMs in this setting using non-parametric maximum likelihood estimation. We formulate a Bayesian CPM for continuous or mixed outcome data. Bayesian CPMs inherit many of the benefits of frequentist CPMs and have advantages with regard to interpretation, flexibility, and exact inference (within simulation error) for parameters and functions of parameters. We explore characteristics of the Bayesian CPM through simulations and a case study using HIV biomarker data. In addition, we provide the package 'bayesCPM' which implements Bayesian CPM models using the R interface to the Stan probabilistic programing language. The Bayesian CPM for continuous outcomes can be implemented with only minor modifications to the prior specification and -- despite several limitations -- has generally good statistical performance with moderate or large sample sizes.

Read more
Methodology

Bayesian Fusion: Scalable unification of distributed statistical analyses

There has recently been considerable interest in addressing the problem of unifying distributed statistical analyses into a single coherent inference. This problem naturally arises in a number of situations, including in big-data settings, when working under privacy constraints, and in Bayesian model choice. The majority of existing approaches have relied upon convenient approximations of the distributed analyses. Although typically being computationally efficient, and readily scaling with respect to the number of analyses being unified, approximate approaches can have significant shortcomings -- the quality of the inference can degrade rapidly with the number of analyses being unified, and can be substantially biased even when unifying a small number of analyses that do not concur. In contrast, the recent Fusion approach of Dai et al. (2019) is a rejection sampling scheme which is readily parallelisable and is exact (avoiding any form of approximation other than Monte Carlo error), albeit limited in applicability to unifying a small number of low-dimensional analyses. In this paper we introduce a practical Bayesian Fusion approach. We extend the theory underpinning the Fusion methodology and, by embedding it within a sequential Monte Carlo algorithm, we are able to recover the correct target distribution. By means of extensive guidance on the implementation of the approach, we demonstrate theoretically and empirically that Bayesian Fusion is robust to increasing numbers of analyses, and coherently unifying analyses which do not concur. This is achieved while being computationally competitive with approximate schemes.

Read more
Methodology

Bayesian GARCH Modeling of Functional Sports Data

The use of statistical methods in sport analytics has gained a rapidly growing interest over the last decade, and nowadays is common practice. In particular, the interest in understanding and predicting an athlete's performance throughout his/her career is motivated by the need to evaluate the efficacy of training programs, anticipate fatigue to prevent injuries and detect unexpected of disproportionate increases in performance that might be indicative of doping. Moreover, fast evolving data gathering technologies require up to date modelling techniques that adapt to the distinctive features of sports data. In this work, we propose a hierarchical Bayesian model for describing and predicting the evolution of performance over time for shot put athletes. To account for seasonality and heterogeneity in recorded results, we rely both on a smooth functional contribution and on a linear mixed effect model with heteroskedastic errors to represent the athlete-specific trajectories. The resulting model provides an accurate description of the performance trajectories and helps specifying both the intra- and inter-seasonal variability of measurements. Further, the model allows for the prediction of athletes' performance in future seasons. We apply our model to an extensive real world data set on performance data of professional shot put athletes recorded at elite competitions.

Read more
Methodology

Bayesian Inference for Stationary Points in Gaussian Process Regression Models for Event-Related Potentials Analysis

Stationary points embedded in the derivatives are often critical for a model to be interpretable and may be considered as key features of interest in many applications. We propose a semiparametric Bayesian model to efficiently infer the locations of stationary points of a nonparametric function, while treating the function itself as a nuisance parameter. We use Gaussian processes as a flexible prior for the underlying function and impose derivative constraints to control the function's shape via conditioning. We develop an inferential strategy that intentionally restricts estimation to the case of at least one stationary point, bypassing possible mis-specifications in the number of stationary points and avoiding the varying dimension problem that often brings in computational complexity. We illustrate the proposed methods using simulations and then apply the method to the estimation of event-related potentials (ERP) derived from electroencephalography (EEG) signals. We show how the proposed method automatically identifies characteristic components and their latencies at the individual level, which avoids the excessive averaging across subjects which is routinely done in the field to obtain smooth curves. By applying this approach to EEG data collected from younger and older adults during a speech perception task, we are able to demonstrate how the time course of speech perception processes changes with age.

Read more
Methodology

Bayesian Knockoff Filter Using Gibbs Sampler

In many fields, researchers are interested in discovering features with substantial effect on the response from a large number of features and controlling the proportion of false discoveries. By incorporating the knockoff procedure in the Bayesian framework, we develop the Bayesian knockoff filter (BKF) for selecting features that have important effect on the response. In contrast to the fixed knockoff variables in the frequentist procedures, we allow the knockoff variables to be continuously updated in the Markov chain Monte Carlo. Based on the posterior samples and elaborated greedy selection procedures, our method can distinguish the truly important features as well as controlling the Bayesian false discovery rate at a desirable level. Numerical experiments on both synthetic and real data demonstrate the advantages of our method over existing knockoff methods and Bayesian variable selection approaches, i.e., the BKF possesses higher power and yields a lower false discovery rate.

Read more
Methodology

Bayesian Meta-analysis of Rare Events with Non-ignorable Missing Data

Meta-analysis is a powerful tool for drug safety assessment by synthesizing treatment-related toxicological findings from independent clinical trials. However, published clinical studies may or may not report all adverse events (AEs) if the observed number of AEs were fewer than a pre-specified study-dependent cutoff. Subsequently, with censored information ignored, the estimated incidence rate of AEs could be significantly biased. To address this non-ignorable missing data problem in meta-analysis, we propose a Bayesian multilevel regression model to accommodate the censored rare event data. The performance of the proposed Bayesian model of censored data compared to other existing methods is demonstrated through simulation studies under various censoring scenarios. Finally, the proposed approach is illustrated using data from a recent meta-analysis of 125 clinical trials involving PD-1/PD-L1 inhibitors with respect to their toxicity profiles.

Read more
Methodology

Bayesian Multiple Index Models for Environmental Mixtures

An important goal of environmental health research is to assess the risk posed by mixtures of environmental exposures. Two popular classes of models for mixtures analyses are response-surface methods and exposure-index methods. Response-surface methods estimate high-dimensional surfaces and are thus highly flexible but difficult to interpret. In contrast, exposure-index methods decompose coefficients from a linear model into an overall mixture effect and individual index weights; these models yield easily interpretable effect estimates and efficient inferences when model assumptions hold, but, like most parsimonious models, incur bias when these assumptions do not hold. In this paper we propose a Bayesian multiple index model framework that combines the strengths of each, allowing for non-linear and non-additive relationships between exposure indices and a health outcome, while reducing the dimensionality of the exposure vector and estimating index weights with variable selection. This framework contains response-surface and exposure-index models as special cases, thereby unifying the two analysis strategies. This unification increases the range of models possible for analyzing environmental mixtures and health, allowing one to select an appropriate analysis from a spectrum of models varying in flexibility and interpretability. In an analysis of the association between telomere length and 18 organic pollutants in the National Health and Nutrition Examination Survey (NHANES), the proposed approach fits the data as well as more complex response-surface methods and yields more interpretable results.

Read more
Methodology

Bayesian Non-parametric Quantile Process Regression and Estimation of Marginal Quantile Effects

Flexible estimation of multiple conditional quantiles is of interest in numerous applications, such as studying the effect of pregnancy-related factors on very low or high birth weight. We propose a Bayesian non-parametric method to simultaneously estimate non-crossing, non-linear quantile curves. We expand the conditional distribution function of the response in I-spline basis functions where the covariate-dependent coefficients are modeled using neural networks. By leveraging the approximation power of splines and neural networks, our model can approximate any continuous quantile function. Compared to existing models, our model estimates all rather than a finite subset of quantiles, scales well to high dimensions, and accounts for estimation uncertainty. While the model is arbitrarily flexible, interpretable marginal quantile effects are estimated using accumulative local effect plots and variable importance measures. A simulation study shows that our model can better recover quantiles of the response distribution when the data is sparse, and illustrative applications providing new insights on analyses of birth weight and tropical cyclone intensity are presented.

Read more
Methodology

Bayesian Nonparametric Bivariate Survival Regression for Current Status Data

We consider nonparametric inference for event time distributions based on current status data. We show that in this scenario conventional mixture priors, including the popular Dirichlet process mixture prior, lead to biologically uninterpretable results as they unnaturally skew the probability mass for the event times toward the extremes of the observed data. Simple assumptions on dependent censoring can fix the problem. We then extend the discussion to bivariate current status data with partial ordering of the two outcomes. In addition to dependent censoring, we also exploit some minimal known structure relating the two event times. We design a Markov chain Monte Carlo algorithm for posterior simulation. Applied to a recurrent infection study, the method provides novel insights into how symptoms-related hospital visits are affected by covariates.

Read more

Ready to get started?

Join us today