Statistics Computation - Researchain | Decentralizing Knowledge

Featured Researches

Bayesian Optimization of Hyperparameters when the Marginal Likelihood is Estimated by MCMC

Bayesian models often involve a small set of hyperparameters determined by maximizing the marginal likelihood. Bayesian optimization is a popular iterative method where a Gaussian process posterior of the underlying function is sequentially updated by new function evaluations. An acquisition strategy uses this posterior distribution to decide where to place the next function evaluation. We propose a novel Bayesian optimization framework for situations where the user controls the computational effort, and therefore the precision of the function evaluations. This is a common situation in econometrics where the marginal likelihood is often computed by Markov Chain Monte Carlo (MCMC) methods, with the precision of the marginal likelihood estimate determined by the number of MCMC draws. The proposed acquisition strategy gives the optimizer the option to explore the function with cheap noisy evaluations and therefore finds the optimum faster. Prior hyperparameter estimation in the steady-state Bayesian vector autoregressive (BVAR) model on US macroeconomic time series data is used for illustration. The proposed method is shown to find the optimum much quicker than traditional Bayesian optimization or grid search.

Computation

Bayesian Reliability Analysis of the Power Law Process with Respect to the Higgins-Tsokos Loss Function for Modeling Software Failure Times

The Power Law Process, also known as Non-Homogeneous Poisson Process, has been used in various aspects, one of which is the software reliability assessment. Specifically, by using its intensity function to compute the rate of change of a software reliability as time-varying function. Justification of Bayesian analysis applicability to the Power Law Process was shown using real data. The probability distribution that best characterizes the behavior of the key parameter of the intensity function was first identified, then the likelihood-based Bayesian reliability estimate of the Power Law Process under the Higgins-Tsokos loss function was obtained. As a result of a simulation study and using real data, the Bayesian estimate shows an outstanding performance compared to the maximum likelihood estimate using different sample sizes. In addition, a sensitivity analysis was performed, resulting in the Bayesian estimate being sensitive to the prior selection; whether parametric or non-parametric.

Computation

Bayesian Survival Analysis Using the rstanarm R Package

Survival data is encountered in a range of disciplines, most notably health and medical research. Although Bayesian approaches to the analysis of survival data can provide a number of benefits, they are less widely used than classical (e.g. likelihood-based) approaches. This may be in part due to a relative absence of user-friendly implementations of Bayesian survival models. In this article we describe how the rstanarm R package can be used to fit a wide range of Bayesian survival models. The rstanarm package facilitates Bayesian regression modelling by providing a user-friendly interface (users specify their model using customary R formula syntax and data frames) and using the Stan software (a C++ library for Bayesian inference) for the back-end estimation. The suite of models that can be estimated using rstanarm is broad and includes generalised linear models (GLMs), generalised linear mixed models (GLMMs), generalised additive models (GAMs) and more. In this article we focus only on the survival modelling functionality. This includes standard parametric (exponential, Weibull, Gompertz) and flexible parametric (spline-based) hazard models, as well as standard parametric accelerated failure time (AFT) models. All types of censoring (left, right, interval) are allowed, as is delayed entry (left truncation), time-varying covariates, time-varying effects, and frailty effects. We demonstrate the functionality through worked examples. We anticipate these implementations will increase the uptake of Bayesian survival analysis in applied research.

Computation

Bayesian Update with Importance Sampling: Required Sample Size

Importance sampling is used to approximate Bayes' rule in many computational approaches to Bayesian inverse problems, data assimilation and machine learning. This paper reviews and further investigates the required sample size for importance sampling in terms of the χ 2 -divergence between target and proposal. We develop general abstract theory and illustrate through numerous examples the roles that dimension, noise-level and other model parameters play in approximating the Bayesian update with importance sampling. Our examples also facilitate a new direct comparison of standard and optimal proposals for particle filtering.

Computation

Bayesian design for minimising uncertainty in spatial processes

Model-based geostatistical design involves the selection of locations to collect data to minimise an expected loss function over a set of all possible locations. The loss function is specified to reflect the aim of data collection, which, for geostatistical studies, would typically be to minimise the uncertainty in a spatial process. In this paper, we propose a new approach to design such studies via a loss function derived through considering the entropy of model predictions, and we show that this simultaneously addresses the goal of precise parameter estimation. One drawback of this loss function is that is it computationally expensive to evaluate, so we provide an efficient approximation such that it can be used within realistically sized geostatistical studies. To demonstrate our approach, we apply the proposed approach to design the collection of spatially dependent multiple responses, and compare this with either designing for estimation or prediction only. The results show that our designs remain highly efficient in achieving each experimental objective individually, and provide an ideal compromise between the two objectives. Accordingly, we advocate that our design approach should be used more generally in model-based geostatistical studies.

Computation

Bayesian experimental design without posterior calculations: an adversarial approach

Most computational approaches to Bayesian experimental design require making posterior calculations, such evidence estimates, repeatedly for a large number of potential designs and/or simulated datasets. This can be expensive and prohibit scaling up these methods to models with many parameters, or designs with many unknowns to select. We introduce an efficient alternative approach without posterior calculations, based on optimising the expected trace of the Fisher information, as discussed by Walker (2016). We illustrate drawbacks of this approach, including lack of invariance to reparameterisation and encouraging designs which are informative about one parameter combination but not any others. We show these can be avoided by using an adversarial approach: the experimenter must select their design while an adversary attempts to select the least favourable parameterisation. We present theoretical properties of this approach and show it can be used with gradient based optimisation methods to find designs efficiently in practice.

Computation

Bayesian inference of Stochastic reaction networks using Multifidelity Sequential Tempered Markov Chain Monte Carlo

Stochastic reaction network models are often used to explain and predict the dynamics of gene regulation in single cells. These models usually involve several parameters, such as the kinetic rates of chemical reactions, that are not directly measurable and must be inferred from experimental data. Bayesian inference provides a rigorous probabilistic framework for identifying these parameters by finding a posterior parameter distribution that captures their uncertainty. Traditional computational methods for solving inference problems such as Markov Chain Monte Carlo methods based on classical Metropolis-Hastings algorithm involve numerous serial evaluations of the likelihood function, which in turn requires expensive forward solutions of the chemical master equation (CME). We propose an alternative approach based on a multifidelity extension of the Sequential Tempered Markov Chain Monte Carlo (ST-MCMC) sampler. This algorithm is built upon Sequential Monte Carlo and solves the Bayesian inference problem by decomposing it into a sequence of efficiently solved subproblems that gradually increase model fidelity and the influence of the observed data. We reformulate the finite state projection (FSP) algorithm, a well-known method for solving the CME, to produce a hierarchy of surrogate master equations to be used in this multifidelity scheme. To determine the appropriate fidelity, we introduce a novel information-theoretic criteria that seeks to extract the most information about the ultimate Bayesian posterior from each model in the hierarchy without inducing significant bias. This novel sampling scheme is tested with high performance computing resources using biologically relevant problems.

Computation

Bayesian inverse regression for dimension reduction with small datasets

We consider supervised dimension reduction problems, namely to identify a low dimensional projection of the predictors $\-x$ which can retain the statistical relationship between $\-x$ and the response variable y . We follow the idea of the sliced inverse regression (SIR) and the sliced average variance estimation (SAVE) type of methods, which is to use the statistical information of the conditional distribution $\pi(\-x|y)$ to identify the dimension reduction (DR) space. In particular we focus on the task of computing this conditional distribution without slicing the data. We propose a Bayesian framework to compute the conditional distribution where the likelihood function is obtained using the Gaussian process regression model. The conditional distribution $\pi(\-x|y)$ can then be computed directly via Monte Carlo sampling. We then can perform DR by considering certain moment functions (e.g. the first or the second moment) of the samples of the posterior distribution. With numerical examples, we demonstrate that the proposed method is especially effective for small data problems.

Computation

Bayesian model inversion using stochastic spectral embedding

In this paper we propose a new sampling-free approach to solve Bayesian model inversion problems that is an extension of the previously proposed spectral likelihood expansions (SLE) method. Our approach, called stochastic spectral likelihood embedding (SSLE), uses the recently presented stochastic spectral embedding (SSE) method for local spectral expansion refinement to approximate the likelihood function at the core of Bayesian inversion problems. We show that, similar to SLE, this approach results in analytical expressions for key statistics of the Bayesian posterior distribution, such as evidence, posterior moments and posterior marginals, by direct post-processing of the expansion coefficients. Because SSLE and SSE rely on the direct approximation of the likelihood function, they are in a way independent of the computational/mathematical complexity of the forward model. We further enhance the efficiency of SSLE by introducing a likelihood specific adaptive sample enrichment scheme. To showcase the performance of the proposed SSLE, we solve three problems that exhibit different kinds of complexity in the likelihood function: multimodality, high posterior concentration and high nominal dimensionality. We demonstrate how SSLE significantly improves on SLE, and present it as a promising alternative to existing inversion frameworks.

Computation

Bayesian model selection for unsupervised image deconvolution with structured Gaussian priors

This paper considers the objective comparison of stochastic models to solve inverse problems, more specifically image restoration. Most often, model comparison is addressed in a supervised manner, that can be time-consuming and partly arbitrary. Here we adopt an unsupervised Bayesian approach and objectively compare the models based on their posterior probabilities, directly from the data without ground truth available. The probabilities depend on the marginal likelihood or "evidence" of the models and we resort to the Chib approach including a Gibbs sampler. We focus on the family of Gaussian models with circulant covariances and unknown hyperparameters, and compare different types of covariance matrices for the image and noise.

Ready to get started?

Join us today

Archive Your Research