Statistics Computation - Researchain | Decentralizing Knowledge

Featured Researches

Composite likelihood methods for histogram-valued random variables

Symbolic data analysis has been proposed as a technique for summarising large and complex datasets into a much smaller and tractable number of distributions -- such as random rectangles or histograms -- each describing a portion of the larger dataset. Recent work has developed likelihood-based methods that permit fitting models for the underlying data while only observing the distributional summaries. However, while powerful, when working with random histograms this approach rapidly becomes computationally intractable as the dimension of the underlying data increases. We introduce a composite-likelihood variation of this likelihood-based approach for the analysis of random histograms in K dimensions, through the construction of lower-dimensional marginal histograms. The performance of this approach is examined through simulated and real data analysis of max-stable models for spatial extremes using millions of observed datapoints in more than K=100 dimensions. Large computational savings are available compared to existing model fitting approaches.

Computation

Computation of projection regression depth and its induced median

Notions of depth in regression have been introduced and studied in the literature. The most famous example is Regression Depth (RD), which is a direct extension of location depth to regression. The projection regression depth (PRD) is the extension of another prevailing location depth, the projection depth, to regression. The computation issues of the RD have been discussed in the literature. The computation issues of the PRD have never been dealt with before. The computation issues of the PRD and its induced median (maximum depth estimator) in a regression setting are addressed now. For a given $\bs{\beta}\in\R^p$ exact algorithms for the PRD with cost O( n 2 logn) ( p=2 ) and O(N(n,p)( p 3 +nlogn+n p 1.5 +np N Iter )) ( p>2 ) and approximate algorithms for the PRD and its induced median with cost respectively $O(N_{\mb{v}}np)$ and $O(Rp N_{\bs{\beta}}(p^2+nN_{\mb{v}}N_{Iter}))$ are proposed. Here N(n,p) is a number defined based on the total number of (p−1) dimensional hyperplanes formed by points induced from sample points and the $\bs{\beta}$; $N_{\mb{v}}$ is the total number of unit directions $\mb{v}$ utilized; $N_{\bs{\beta}}$ is the total number of candidate regression parameters $\bs{\beta}$ employed; N Iter is the total number of iterations carried out in an optimization algorithm; R is the total number of replications. Furthermore, as the second major contribution, three PRD induced estimators, which can be computed up to 30 times faster than that of the PRD induced median while maintaining a similar level of accuracy are introduced. Examples and simulation studies reveal that the depth median induced from the PRD is favorable in terms of robustness and efficiency, compared to the maximum depth estimator induced from the RD, which is the current leading regression median.

Computation

Computation of the Gradient and the Hessian of the Log-likelihood of the State-space Model by the Kalman Filter

The maximum likelihood estimates of an ARMA model can be obtained by the Kalman filter based on the state-space representation of the model. This paper presents an algorithm for computing gradient of the log-likelihood by an extending the Kalman filter without resorting to the numerical difference. Three examples of seasonal adjustment model and ARMA model are presented to exemplified the specification of structural matrices and initial matrices. An extension of the algorithm to compute the Hessian matrix is also shown.

Computation

Computational Causal Inference

We introduce computational causal inference as an interdisciplinary field across causal inference, algorithms design and numerical computing. The field aims to develop software specializing in causal inference that can analyze massive datasets with a variety of causal effects, in a performant, general, and robust way. The focus on software improves research agility, and enables causal inference to be easily integrated into large engineering systems. In particular, we use computational causal inference to deepen the relationship between causal inference, online experimentation, and algorithmic decision making. This paper describes the new field, the demand, opportunities for scalability, open challenges, and begins the discussion for how the community can unite to solve challenges for scaling causal inference and decision making.

Computation

Computing Bayes: Bayesian Computation from 1763 to the 21st Century

The Bayesian statistical paradigm uses the language of probability to express uncertainty about the phenomena that generate observed data. Probability distributions thus characterize Bayesian analysis, with the rules of probability used to transform prior probability distributions for all unknowns - parameters, latent variables, models - into posterior distributions, subsequent to the observation of data. Conducting Bayesian analysis requires the evaluation of integrals in which these probability distributions appear. Bayesian computation is all about evaluating such integrals in the typical case where no analytical solution exists. This paper takes the reader on a chronological tour of Bayesian computation over the past two and a half centuries. Beginning with the one-dimensional integral first confronted by Bayes in 1763, through to recent problems in which the unknowns number in the millions, we place all computational problems into a common framework, and describe all computational methods using a common notation. The aim is to help new researchers in particular - and more generally those interested in adopting a Bayesian approach to empirical work - make sense of the plethora of computational techniques that are now on offer; understand when and why different methods are useful; and see the links that do exist, between them all.

Computation

Computing Estimators of Dantzig Selector type via Column and Constraint Generation

We consider a class of linear-programming based estimators in reconstructing a sparse signal from linear measurements. Specific formulations of the reconstruction problem considered here include Dantzig selector, basis pursuit (for the case in which the measurements contain no errors), and the fused Dantzig selector (for the case in which the underlying signal is piecewise constant). In spite of being estimators central to sparse signal processing and machine learning, solving these linear programming problems for large scale instances remains a challenging task, thereby limiting their usage in practice. We show that classic constraint- and column-generation techniques from large scale linear programming, when used in conjunction with a commercial implementation of the simplex method, and initialized with the solution from a closely-related Lasso formulation, yields solutions with high efficiency in many settings.

Computation

Computing Maximum Likelihood Estimates for Gaussian Graphical Models with Macaulay2

We introduce the package GraphicalModelsMLE for computing the maximum likelihood estimator (MLE) of a Gaussian graphical model in the computer algebra system Macaulay2. The package allows to compute for the class of loopless mixed graphs. Additional functionality allows to explore the underlying algebraic structure of the model, such as its ML degree and the ideal of score equations.

Computation

Computing Shapley Effects for Sensitivity Analysis

Shapley effects are attracting increasing attention as sensitivity measures. When the value function is the conditional variance, they account for the individual and higher order effects of a model input. They are also well defined under model input dependence. However, one of the issues associated with their use is computational cost. We present a new algorithm that offers major improvements for the computation of Shapley effects, reducing computational burden by several orders of magnitude (from k!⋅k to 2 k , where k is the number of inputs) with respect to currently available implementations. The algorithm works in the presence of input dependencies. The algorithm also makes it possible to estimate all generalized (Shapley-Owen) effects for interactions.

Computation

Conditional particle filters with diffuse initial distributions

Conditional particle filters (CPFs) are powerful smoothing algorithms for general nonlinear/non-Gaussian hidden Markov models. However, CPFs can be inefficient or difficult to apply with diffuse initial distributions, which are common in statistical applications. We propose a simple but generally applicable auxiliary variable method, which can be used together with the CPF in order to perform efficient inference with diffuse initial distributions. The method only requires simulatable Markov transitions that are reversible with respect to the initial distribution, which can be improper. We focus in particular on random-walk type transitions which are reversible with respect to a uniform initial distribution (on some domain), and autoregressive kernels for Gaussian initial distributions. We propose to use on-line adaptations within the methods. In the case of random-walk transition, our adaptations use the estimated covariance and acceptance rate adaptation, and we detail their theoretical validity. We tested our methods with a linear-Gaussian random-walk model, a stochastic volatility model, and a stochastic epidemic compartment model with time-varying transmission rate. The experimental findings demonstrate that our method works reliably with little user specification, and can be substantially better mixing than a direct particle Gibbs algorithm that treats initial states as parameters.

Computation

Conditionally Gaussian Random Sequences for an Integrated Variance Estimator with Correlation between Noise and Returns

Correlation between microstructure noise and latent financial logarithmic returns is an empirically relevant phenomenon with sound theoretical justification. With few notable exceptions, all integrated variance estimators proposed in the financial literature are not designed to explicitly handle such a dependence, or handle it only in special settings. We provide an integrated variance estimator that is robust to correlated noise and returns. For this purpose, a generalization of the Forward Filtering Backward Sampling algorithm is proposed, to provide a sampling technique for a latent conditionally Gaussian random sequence. We apply our methodology to intra-day Microsoft prices, and compare it in a simulation study with established alternatives, showing an advantage in terms of root mean square error and dispersion.

Ready to get started?

Join us today

Archive Your Research