Featured Researches

Statistics Theory

Prediction in polynomial errors-in-variables models

A multivariate errors-in-variables (EIV) model with an intercept term, and a polynomial EIV model are considered. Focus is made on a structural homoskedastic case, where vectors of covariates are i.i.d. and measurement errors are i.i.d. as well. The covariates contaminated with errors are normally distributed and the corresponding classical errors are also assumed normal. In both models, it is shown that (inconsistent) ordinary least squares estimators of regression parameters yield an a.s. approximation to the best prediction of response given the values of observable covariates. Thus, not only in the linear EIV, but in the polynomial EIV models as well, consistent estimators of regression parameters are useless in the prediction problem, provided the size and covariance structure of observation errors for the predicted subject do not differ from those in the data used for the model fitting.

Read more
Statistics Theory

Prediction-based estimation for diffusion models with high-frequency data

This paper obtains asymptotic results for parametric inference using prediction-based estimating functions when the data are high frequency observations of a diffusion process with an infinite time horizon. Specifically, the data are observations of a diffusion process at n equidistant time points Δ n i , and the asymptotic scenario is Δ n →0 and n Δ n →∞ . For a useful and tractable classes of prediction-based estimating functions, existence of a consistent estimator is proved under standard weak regularity conditions on the diffusion process and the estimating function. Asymptotic normality of the estimator is established under the additional rate condition n Δ 3 n →0 . The prediction-based estimating functions are approximate martingale estimating functions to a smaller order than what has previously been studied, and new non-standard asymptotic theory is needed. A Monte Carlo method for calculating the asymptotic variance of the estimators is proposed.

Read more
Statistics Theory

Prepivoted permutation tests

We present a general approach to constructing permutation tests that are both exact for the null hypothesis of equality of distributions and asymptotically correct for testing equality of parameters of distributions while allowing the distributions themselves to differ. These robust permutation tests transform a given test statistic by a consistent estimator of its limiting distribution function before enumerating its permutation distribution. This transformation, known as prepivoting, aligns the unconditional limiting distribution for the test statistic with the probability limit of its permutation distribution. Through prepivoting, the tests permute one minus an asymptotically valid p -value for testing the null of equality of parameters. We describe two approaches for prepivoting within permutation tests, one directly using asymptotic normality and the other using the bootstrap. We further illustrate that permutation tests using bootstrap prepivoting can provide improvements to the order of the error in rejection probability relative to competing transformations when testing equality of parameters, while maintaining exactness under equality of distributions. Simulation studies highlight the versatility of the proposal, illustrating the restoration of asymptotic validity to a wide range of permutation tests conducted when only the parameters of distributions are equal.

Read more
Statistics Theory

Principal Loading Analysis

This paper proposes a tool for dimension reduction where the dimension of the original space is reduced: a Principal Loading Analysis (PLA). PLA is a tool to reduce dimensions by discarding variables. The intuition is that variables are dropped which distort the covariance matrix only by a little. Our method is introduced and an algorithm for conducting PLA is provided. Further, we give bounds for the noise arising in the sample case.

Read more
Statistics Theory

Principal Separable Component Analysis via the Partial Inner Product

The non-parametric estimation of covariance lies at the heart of functional data analysis, whether for curve or surface-valued data. The case of a two-dimensional domain poses both statistical and computational challenges, which are typically alleviated by assuming separability. However, separability is often questionable, sometimes even demonstrably inadequate. We propose a framework for the analysis of covariance operators of random surfaces that generalises separability, while retaining its major advantages. Our approach is based on the additive decomposition of the covariance into a series of separable components. The decomposition is valid for any covariance over a two-dimensional domain. Leveraging the key notion of the partial inner product, we generalise the power iteration method to general Hilbert spaces and show how the aforementioned decomposition can be efficiently constructed in practice. Truncation of the decomposition and retention of the principal separable components automatically induces a non-parametric estimator of the covariance, whose parsimony is dictated by the truncation level. The resulting estimator can be calculated, stored and manipulated with little computational overhead relative to separability. The framework and estimation method are genuinely non-parametric, since the considered decomposition holds for any covariance. Consistency and rates of convergence are derived under mild regularity assumptions, illustrating the trade-off between bias and variance regulated by the truncation level. The merits and practical performance of the proposed methodology are demonstrated in a comprehensive simulation study.

Read more
Statistics Theory

Product-form estimators: exploiting independence to scale up Monte Carlo

We introduce a class of Monte Carlo estimators for product-form target distributions that aim to overcome the rapid growth of variance with dimension often observed for standard estimators. We identify them with a class of generalized U-Statistics, and thus establish their unbiasedness, consistency, and asymptotic normality. Moreover, we show that they achieve lower variances than their conventional counterparts given the same number of samples drawn from the target, investigate the gap in variance via several examples, and identify the situations in which the difference is most, and least, pronounced. We further study the estimators' computational cost and delineate the settings in which they are most efficient. We illustrate their utility beyond the setting of product-form distributions by detailing two simple extensions (one to targets that are mixtures of product-form distributions and another to targets that are absolutely continuous with respect to product-form distributions) and conclude by discussing further possible uses.

Read more
Statistics Theory

Qualitative Robust Bayesianism and the Likelihood Principle

We argue that the likelihood principle (LP) and weak law of likelihood (LL) generalize naturally to settings in which experimenters are justified only in making comparative, non-numerical judgments of the form " A given B is more likely than C given D ." To do so, we first \emph{formulate} qualitative analogs of those theses. Then, using a framework for qualitative conditional probability, just as the characterizes when all Bayesians (regardless of prior) agree that two pieces of evidence are equivalent, so a qualitative/non-numerical version of LP provides sufficient conditions for agreement among experimenters' whose degrees of belief satisfy only very weak "coherence" constraints. We prove a similar result for LL. We conclude by discussing the relevance of results to stopping rules.

Read more
Statistics Theory

Quantile Regression Neural Networks: A Bayesian Approach

This article introduces a Bayesian neural network estimation method for quantile regression assuming an asymmetric Laplace distribution (ALD) for the response variable. It is shown that the posterior distribution for feedforward neural network quantile regression is asymptotically consistent under a misspecified ALD model. This consistency proof embeds the problem from density estimation domain and uses bounds on the bracketing entropy to derive the posterior consistency over Hellinger neighborhoods. This consistency result is shown in the setting where the number of hidden nodes grow with the sample size. The Bayesian implementation utilizes the normal-exponential mixture representation of the ALD density. The algorithm uses Markov chain Monte Carlo (MCMC) simulation technique - Gibbs sampling coupled with Metropolis-Hastings algorithm. We have addressed the issue of complexity associated with the afore-mentioned MCMC implementation in the context of chain convergence, choice of starting values, and step sizes. We have illustrated the proposed method with simulation studies and real data examples.

Read more
Statistics Theory

Quickest Detection of Moving Anomalies in Sensor Networks

The problem of sequentially detecting a moving anomaly which affects different parts of a sensor network with time is studied. Each network sensor is characterized by a non-anomalous and anomalous distribution, governing the generation of sensor data. Initially, the observations of each sensor are generated according to the corresponding non-anomalous distribution. After some unknown but deterministic time instant, a moving anomaly emerges, affecting different sets of sensors as time progresses. As a result, the observations of the affected sensors are generated according to the corresponding anomalous distribution. Our goal is to design a stopping procedure to detect the emergence of the anomaly as quickly as possible, subject to constraints on the frequency of false alarms. The problem is studied in a quickest change detection framework where it is assumed that the evolution of the anomaly is unknown but deterministic. To this end, we propose a modification of Lorden's worst average detection delay metric to account for the trajectory of the anomaly that maximizes the detection delay of a candidate detection procedure. We establish that a Cumulative Sum-type test solves the resulting sequential detection problem exactly when the sensors are homogeneous. For the case of heterogeneous sensors, the proposed detection scheme can be modified to provide a first-order asymptotically optimal algorithm. We conclude by presenting numerical simulations to validate our theoretical analysis.

Read more
Statistics Theory

Random Graph Asymptotics for Treatment Effect Estimation under Network Interference

The network interference model for causal inference places all experimental units at the vertices of an undirected exposure graph, such that treatment assigned to one unit may affect the outcome of another unit if and only if these two units are connected by an edge. This model has recently gained popularity as means of incorporating interference effects into the Neyman--Rubin potential outcomes framework; and several authors have considered estimation of various causal targets, including the direct and indirect effects of treatment. In this paper, we consider large-sample asymptotics for treatment effect estimation under network interference in a setting where the exposure graph is a random draw from a graphon. When targeting the direct effect, we show that -- in our setting -- popular estimators are considerably more accurate than existing results suggest, and provide a central limit theorem in terms of moments of the graphon. Meanwhile, when targeting the indirect effect, we leverage our generative assumptions to propose a consistent estimator in a setting where no other consistent estimators are currently available. We also show how our results can be used to conduct a practical assessment of the sensitivity of randomized study inference to potential interference effects. Overall, our results highlight the promise of random graph asymptotics in understanding the practicality and limits of causal inference under network interference.

Read more

Ready to get started?

Join us today