Statistics Computation - Researchain | Decentralizing Knowledge

Featured Researches

A robust and efficient algorithm to find profile likelihood confidence intervals

Profile likelihood confidence intervals are a robust alternative to Wald's method if the asymptotic properties of the maximum likelihood estimator are not met. However, the constrained optimization problem defining profile likelihood confidence intervals can be difficult to solve in these situations, because the likelihood function may exhibit unfavorable properties. As a result, existing methods may be inefficient and yield misleading results. In this paper, we address this problem by computing profile likelihood confidence intervals via a trust-region approach, where steps computed based on local approximations are constrained to regions where these approximations are sufficiently precise. As our algorithm also accounts for numerical issues arising if the likelihood function is strongly non-linear or parameters are not estimable, the method is applicable in many scenarios where earlier approaches are shown to be unreliable. To demonstrate its potential in applications, we apply our algorithm to benchmark problems and compare it with 6 existing approaches to compute profile likelihood confidence intervals. Our algorithm consistently achieved higher success rates than any competitor while also being among the quickest methods. As our algorithm can be applied to compute both confidence intervals of parameters and model predictions, it is useful in a wide range of scenarios.

Computation

A scalable optimal-transport based local particle filter

Filtering in spatially-extended dynamical systems is a challenging problem with significant practical applications such as numerical weather prediction. Particle filters allow asymptotically consistent inference but require infeasibly large ensemble sizes for accurate estimates in complex spatial models. Localisation approaches, which perform local state updates by exploiting low dependence between variables at distant points, have been suggested as a potential resolution to this issue. Naively applying the resampling step of the particle filter locally however produces implausible spatially discontinuous states. The ensemble transform particle filter replaces resampling with an optimal-transport map and can be localised by computing maps for every spatial mesh node. The resulting local ensemble transport particle filter is however computationally intensive for dense meshes. We propose a new optimal-transport based local particle filter which computes a fixed number of maps independent of the mesh resolution and interpolates these maps across space, reducing the computation required and allowing it to be ensured particles remain spatially smooth. We numerically illustrate that, at a reduced computational cost, we are able to achieve the same accuracy as the local ensemble transport particle filter, and retain its improved robustness to non-Gaussianity and ability to quantify uncertainty when compared to local ensemble Kalman filters.

Computation

A simple Markov chain for independent Bernoulli variables conditioned on their sum

We consider a vector of N independent binary variables, each with a different probability of success. The distribution of the vector conditional on its sum is known as the conditional Bernoulli distribution. Assuming that N goes to infinity and that the sum is proportional to N , exact sampling costs order N 2 , while a simple Markov chain Monte Carlo algorithm using 'swaps' has constant cost per iteration. We provide conditions under which this Markov chain converges in order NlogN iterations. Our proof relies on couplings and an auxiliary Markov chain defined on a partition of the space into favorable and unfavorable pairs.

Computation

A simple algorithm for global sensitivity analysis with Shapley effects

Global sensitivity analysis aims at measuring the relative importance of different variables or groups of variables for the variability of a quantity of interest. Among several sensitivity indices, so-called Shapley effects have recently gained popularity mainly because the Shapley effects for all the individual variables are summed up to the overall variance, which gives a better interpretability than the classical sensitivity indices called main effects and total effects. In this paper, assuming that all the input variables are independent, we introduce a quite simple Monte Carlo algorithm to estimate the Shapley effects for all the individual variables simultaneously, which drastically simplifies the existing algorithms proposed in the literature. We present a short Matlab implementation of our algorithm and show some numerical results. A possible extension to the case where the input variables are dependent is also discussed.

Computation

A table of short-period Tausworthe generators for Markov chain quasi-Monte Carlo

We consider the problem of estimating expectations by using Markov chain Monte Carlo methods and improving the accuracy by replacing IID uniform random points with quasi-Monte Carlo (QMC) points. Recently, it has been shown that Markov chain QMC remains consistent when the driving sequences are completely uniformly distributed (CUD). However, the definition of CUD sequences is not constructive, so an implementation method using short-period Tausworthe generators (i.e., linear feedback shift register generators over the two-element field) that approximate CUD sequences has been proposed. In this paper, we conduct an exhaustive search of short-period Tausworthe generators for Markov chain QMC in terms of the t -value, which is a criterion of uniformity widely used in the study of QMC methods. We provide a parameter table of Tausworthe generators and show the effectiveness in numerical examples using Gibbs sampling.

Computation

A two-level Kriging-based approach with active learning for solving time-variant risk optimization problems

Several methods have been proposed in the literature to solve reliability-based optimization problems, where failure probabilities are design constraints. However, few methods address the problem of life-cycle cost or risk optimization, where failure probabilities are part of the objective function. Moreover, few papers in the literature address time-variant reliability problems in life-cycle cost or risk optimization formulations; in particular, because most often computationally expensive Monte Carlo simulation is required. This paper proposes a numerical framework for solving general risk optimization problems involving time-variant reliability analysis. To alleviate the computational burden of Monte Carlo simulation, two adaptive coupled surrogate models are used: the first one to approximate the objective function, and the second one to approximate the quasi-static limit state function. An iterative procedure is implemented for choosing additional support points to increase the accuracy of the surrogate models. Three application problems are used to illustrate the proposed approach. Two examples involve random load and random resistance degradation processes. The third problem is related to load-path dependent failures. This subject had not yet been addressed in the context of risk-based optimization. It is shown herein that accurate solutions are obtained, with extremely limited numbers of objective function and limit state functions calls.

Computation

A unified performance analysis of likelihood-informed subspace methods

The likelihood-informed subspace (LIS) method offers a viable route to reducing the dimensionality of high-dimensional probability distributions arisen in Bayesian inference. LIS identifies an intrinsic low-dimensional linear subspace where the target distribution differs the most from some tractable reference distribution. Such a subspace can be identified using the leading eigenvectors of a Gram matrix of the gradient of the log-likelihood function. Then, the original high-dimensional target distribution is approximated through various forms of ridge approximations of the likelihood function, in which the approximated likelihood only has support on the intrinsic low-dimensional subspace. This approximation enables the design of inference algorithms that can scale sub-linearly with the apparent dimensionality of the problem. Intuitively, the accuracy of the approximation, and hence the performance of the inference algorithms, are influenced by three factors -- the dimension truncation error in identifying the subspace, Monte Carlo error in estimating the Gram matrices, and Monte Carlo error in constructing ridge approximations. This work establishes a unified framework to analysis each of these three factors and their interplay. Under mild technical assumptions, we establish error bounds for a range of existing dimension reduction techniques based on the principle of LIS. Our error bounds also provide useful insights into the accuracy comparison of these methods. In addition, we analyze the integration of LIS with sampling methods such as Markov Chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC). We also demonstrate our analyses on a linear inverse problem with Gaussian prior, which shows that all the estimates can be dimension-independent if the prior covariance is a trace-class operator.

Computation

A web application for the design of multi-arm clinical trials

Multi-arm designs provide an effective means of evaluating several treatments within the same clinical trial. Given the large number of treatments now available for testing in many disease areas, it has been argued that their utilisation should increase. However, for any given clinical trial there are numerous possible multi-arm designs that could be used, and choosing between them can be a difficult task. This task is complicated further by a lack of available easy-to-use software for designing multi-arm trials. To aid the wider implementation of multi-arm clinical trial designs, we have developed a web application for sample size calculation when using a variety of popular multiple comparison corrections. Furthermore, the application supports sample size calculation to control several varieties of power, as well as the determination of optimised arm-wise allocation ratios. It is built using the Shiny package in the R programming language, is free to access on any device with an internet browser, and requires no programming knowledge to use. The application provides the core information required by statisticians and clinicians to review the operating characteristics of a chosen multi-arm clinical trial design. We hope that it will assist with the future utilisation of such designs in practice.

Computation

A-ComVar: A Flexible Extension of Common Variance Design

We consider nonregular fractions of factorial experiments for a class of linear models. These models have a common general mean and main effects, however they may have different 2-factor interactions. Here we assume for simplicity that 3-factor and higher order interactions are negligible. In the absence of a priori knowledge about which interactions are important, it is reasonable to prefer a design that results in equal variance for the estimates of all interaction effects to aid in model discrimination. Such designs are called common variance designs and can be quite challenging to identify without performing an exhaustive search of possible designs. In this work, we introduce an extension of common variance designs called approximate common variance, or A-ComVar designs. We develop a numerical approach to finding A-ComVar designs that is much more efficient than an exhaustive search. We present the types of A-ComVar designs that can be found for different number of factors, runs, and interactions. We further demonstrate the competitive performance of both common variance and A-ComVar designs using several comparisons to other popular designs in the literature.

Computation

ABCMETAapp: R Shiny Application for Simulation-based Estimation of Mean and Standard Deviation for Meta-analysis via Approximate Bayesian Computation (ABC)

Background and Objective: In meta-analysis based on continuous outcome, estimated means and corresponding standard deviations from the selected studies are key inputs to obtain a pooled estimate of the mean and its confidence interval. We often encounter the situation that these quantities are not directly reported in the literatures. Instead, other summary statistics are reported such as median, minimum, maximum, quartiles, and study sample size. Based on available summary statistics, we need to estimate estimates of mean and standard deviation for meta-analysis. Methods: We developed a R Shiny code based on approximate Bayesian computation (ABC), ABCMETA, to deal with this situation. Results: In this article, we present an interactive and user-friendly R Shiny application for implementing the proposed method (named ABCMETAapp). In ABCMETAapp, users can choose an underlying outcome distribution other than the normal distribution when the distribution of the outcome variable is skewed or heavy tailed. We show how to run ABCMETAapp with examples. Conclusions: ABCMETAapp provides a R Shiny implementation. This method is more flexible than the existing analytical methods since estimation can be based on five different distribution (Normal, Lognormal, Exponential, Weibull, and Beta) for the outcome variable.

Ready to get started?

Join us today

Archive Your Research