Statistics Computation - Researchain | Decentralizing Knowledge

Featured Researches

A direct Hamiltonian MCMC approach for reliability estimation

Accurate and efficient estimation of rare events probabilities is of significant importance, since often the occurrences of such events have widespread impacts. The focus in this work is on precisely quantifying these probabilities, often encountered in reliability analysis of complex engineering systems, by introducing a gradient-based Hamiltonian Markov Chain Monte Carlo (HMCMC) framework, termed Approximate Sampling Target with Post-processing Adjustment (ASTPA). The basic idea is to construct a relevant target distribution by weighting the high-dimensional random variable space through a one-dimensional likelihood model, using the limit-state function. To sample from this target distribution we utilize HMCMC algorithms that produce Markov chain samples based on Hamiltonian dynamics rather than random walks. We compare the performance of typical HMCMC scheme with our newly developed Quasi-Newton based mass preconditioned HMCMC algorithm that can sample very adeptly, particularly in difficult cases with high-dimensionality and very small failure probabilities. To eventually compute the probability of interest, an original post-sampling step is devised at this stage, using an inverse importance sampling procedure based on the samples. The involved user-defined parameters of ASTPA are then discussed and general default values are suggested. Finally, the performance of the proposed methodology is examined in detail and compared against Subset Simulation in a series of static and dynamic low- and high-dimensional benchmark problems.

Computation

A fast tunable blurring algorithm for scattered data

A blurring algorithm with linear time complexity can reduce the small-scale content of data observed at scattered locations in a spatially extended domain of arbitrary dimension. The method works by forming a Gaussian interpolant of the input data, and then convolving the interpolant with a multiresolution Gaussian approximation of the Green's function to a differential operator whose spectrum can be tuned for problem-specific considerations. Like conventional blurring algorithms, which the new algorithm generalizes to data measured at locations other than a uniform grid, applications include deblurring and separation of spatial scales. An example illustrates a possible application toward enabling importance sampling approaches to data assimilation of geophysical observations, which are often scattered over a spatial domain, since blurring observations can make particle filters more effective at state estimation of large scales. Another example, motivated by data analysis of dynamics like ocean eddies that have strong separation of spatial scales, uses the algorithm to decompose scattered oceanographic float measurements into large-scale and small-scale components.

Computation

A fresh take on 'Barker dynamics' for MCMC

We study a recently introduced gradient-based Markov chain Monte Carlo method based on 'Barker dynamics'. We provide a full derivation of the method from first principles, placing it within a wider class of continuous-time Markov jump processes. We then evaluate the Barker approach numerically on a challenging ill-conditioned logistic regression example with imbalanced data, showing in particular that the algorithm is remarkably robust to irregularity (in this case a high degree of skew) in the target distribution.

Computation

A general perspective on the Metropolis-Hastings kernel

Since its inception the Metropolis-Hastings kernel has been applied in sophisticated ways to address ever more challenging and diverse sampling problems. Its success stems from the flexibility brought by the fact that its verification and sampling implementation rests on a local ``detailed balance'' condition, as opposed to a global condition in the form of a typically intractable integral equation. While checking the local condition is routine in the simplest scenarios, this proves much more difficult for complicated applications involving auxiliary structures and variables. Our aim is to develop a framework making establishing correctness of complex Markov chain Monte Carlo kernels a purely mechanical or algebraic exercise, while making communication of ideas simpler and unambiguous by allowing a stronger focus on essential features -- a choice of embedding distribution, an involution and occasionally an acceptance function -- rather than the induced, boilerplate structure of the kernels that often tends to obscure what is important. This framework can also be used to validate kernels that do not satisfy detailed balance, i.e. which are not reversible, but a modified version thereof.

Computation

A method for deriving information from running R code

It is often useful to tap information from a running R script. Obvious use cases include monitoring the consumption of resources (time, memory) and logging. Perhaps less obvious cases include tracking changes in R objects orcollecting output of unit tests. In this paper we demonstrate an approach that abstracts collection and processing of such secondary information from the running R script. Our approach is based on a combination of three elements. The first element is to build a customized way to evaluate code. The second is labeled \emph{local masking} and it involves temporarily masking auser-facing function so an alternative version of it is called. The third element we label \emph{local side effect}. This refers to the fact that the masking function exports information to the secondary information flow without altering a global state. The result is a method for building systems in pure R that lets users create and control secondary flows of information with minimal impact on their workflow, and no global side effects.

Computation

A parsimonious family of multivariate Poisson-lognormal distributions for clustering multivariate count data

Multivariate count data are commonly encountered through high-throughput sequencing technologies in bioinformatics, text mining, or in sports analytics. Although the Poisson distribution seems a natural fit to these count data, its multivariate extension is computationally this http URL most cases mutual independence among the variables is assumed, however this fails to take into account the correlation among the variables usually observed in the data. Recently, mixtures of multivariate Poisson-lognormal (MPLN) models have been used to analyze such multivariate count measurements with a dependence structure. In the MPLN model, each count is modeled using an independent Poisson distribution conditional on a latent multivariate Gaussian variable. Due to this hierarchical structure, the MPLN model can account for over-dispersion as opposed to the traditional Poisson distribution and allows for correlation between the variables. Rather than relying on a Monte Carlo-based estimation framework which is computationally inefficient, a fast variational-EM based framework is used here for parameter estimation. Further, a parsimonious family of mixtures of Poisson-lognormal distributions are proposed by decomposing the covariance matrix and imposing constraints on these decompositions. Utility of such models is shown using simulated and benchmark datasets.

Computation

A positive-definiteness-assured block Gibbs sampler for Bayesian graphical models with shrinkage priors

Although the block Gibbs sampler for the Bayesian graphical LASSO proposed by Wang (2012) has been widely applied and extended to various shrinkage priors in recent years, it has a less noticeable but possibly severe disadvantage that the positive definiteness of a precision matrix in the Gaussian graphical model is not guaranteed in each cycle of the Gibbs sampler. Specifically, if the dimension of the precision matrix exceeds the sample size, the positive definiteness of the precision matrix will be barely satisfied and the Gibbs sampler will almost surely fail. In this paper, we propose modifying the original block Gibbs sampler so that the precision matrix never fails to be positive definite by sampling it exactly from the domain of the positive definiteness. As we have shown in the Monte Carlo experiments, this modification not only stabilizes the sampling procedure but also significantly improves the performance of the parameter estimation and graphical structure learning. We also apply our proposed algorithm to a graphical model of the monthly return data in which the number of stocks exceeds the sample period, demonstrating its stability and scalability.

Computation

A pseudo-marginal sequential Monte Carlo online smoothing algorithm

We consider online computation of expectations of additive state functionals under general path probability measures proportional to products of unnormalised transition densities. These transition densities are assumed to be intractable but possible to estimate, with or without bias. Using pseudo-marginalisation techniques we are able to extend the particle-based, rapid incremental smoother (PaRIS) algorithm proposed in [J.Olsson and J.Westerborn. Efficient particle-based online smoothing in general hidden Markov models: The PaRIS algorithm. Bernoulli, 23(3):1951--1996, 2017] to this setting. The resulting algorithm, which has a linear complexity in the number of particles and constant memory requirements, applies to a wide range of challenging path-space Monte Carlo problems, including smoothing in partially observed diffusion processes and models with intractable likelihood. The algorithm is furnished with several theoretical results, including a central limit theorem, establishing its convergence and numerical stability. Moreover, under strong mixing assumptions we establish a novel O(nε) bound on the asymptotic bias of the algorithm, where n is the path length and ε controls the bias of the density estimators.

Computation

A review of Approximate Bayesian Computation methods via density estimation: inference for simulator-models

This paper provides a review of Approximate Bayesian Computation (ABC) methods for carrying out Bayesian posterior inference, through the lens of density estimation. We describe several recent algorithms and make connection with traditional approaches. We show advantages and limitations of models based on parametric approaches and we then draw attention to developments in machine learning, which we believe have the potential to make ABC scalable to higher dimensions and may be the future direction for research in this area.

Computation

A review of available software for adaptive clinical trial design

Background/Aims: The increasing expense of the drug development process has seen interest in the use of adaptive designs (ADs) grow substantially in recent years. Accordingly, much research has been conducted to identify potential barriers to increasing the use of ADs in practice, and several articles have argued that the availability of user-friendly software will be an important step in making ADs easier to implement. Therefore, in this paper we present a review of the current state of software availability for AD. Methods: We first review articles from 31 journals published in 2013-17 that relate to methodology for adaptive trials, in order to assess how often code and software for implementing novel ADs is made available at the time of publication. We contrast our findings against these journals' current policies on code distribution. Secondly, we conduct additional searches of popular code repositories, such as CRAN and GitHub, to identify further existing user-contributed software for ADs. From this, we are able to direct interested parties towards solutions for their problem of interest by classifying available code by type of adaptation. Results: Only 29% of included articles made their code available in some form. In many instances, articles published in journals that had mandatory requirements on code provision still did not make code available. There are several areas in which available software is currently limited or saturated. In particular, many packages are available to address group sequential design, but comparatively little code is present in the public domain to determine biomarker-guided ADs. Conclusions: There is much room for improvement in the provision of software alongside AD publications. Additionally, whilst progress has been made, well-established software for various types of trial adaptation remains sparsely available.

Ready to get started?

Join us today

Archive Your Research