Featured Researches

Statistics Theory

Adaptive nonparametric estimation of a component density in a two-class mixture model

A two-class mixture model, where the density of one of the components is known, is considered. We address the issue of the nonparametric adaptive estimation of the unknown probability density of the second component. We propose a randomly weighted kernel estimator with a fully data-driven bandwidth selection method, in the spirit of the Goldenshluger and Lepski method. An oracle-type inequality for the pointwise quadratic risk is derived as well as convergence rates over Holder smoothness classes. The theoretical results are illustrated by numerical simulations.

Read more
Statistics Theory

Admissible anytime-valid sequential inference must rely on nonnegative martingales

Wald's anytime-valid p -values and Robbins' confidence sequences enable sequential inference for composite and nonparametric classes of distributions at arbitrary stopping times, as do more recent proposals involving Vovk's ` e -values' or Shafer's `betting scores'. Examining the literature, one finds that at the heart of all these (quite different) approaches has been the identification of composite nonnegative (super)martingales. Thus, informally, nonnegative (super)martingales are known to be sufficient for \emph{valid} sequential inference. Our central contribution is to show that martingales are also universal---all \emph{admissible} constructions of (composite) anytime p -values, confidence sequences, or e -values must necessarily utilize nonnegative martingales (or so-called max-martingales in the case of p -values). Sufficient conditions for composite admissibility are also provided. Our proofs utilize a plethora of modern mathematical tools for composite testing and estimation problems: max-martingales, Snell envelopes, and new Doob-Lévy martingales make appearances in previously unencountered ways. Informally, if one wishes to perform anytime-valid sequential inference, then any existing approach can be recovered or dominated using martingales. We provide several sophisticated examples, with special focus on the nonparametric problem of testing if a distribution is symmetric, where our new constructions render past methods inadmissible.

Read more
Statistics Theory

Admissible ways of merging p-values under arbitrary dependence

Methods of merging several p-values into a single p-value are important in their own right and widely used in multiple hypothesis testing. This paper is the first to systematically study the admissibility (in Wald's sense) of p-merging functions and their domination structure, without any information on the dependence structure of the input p-values. As a technical tool we use the notion of e-values, which are alternatives to p-values recently promoted by several authors. We obtain several results on the representation of admissible p-merging functions via e-values and on (in)admissibility of existing p-merging functions. By introducing new admissible p-merging functions, we show that some classic merging methods can be strictly improved to enhance power without compromising validity under arbitrary dependence.

Read more
Statistics Theory

Adversarial robust weighted Huber regression

We propose a novel method to estimate the coefficients of linear regression when outputs and inputs are contaminated by malicious outliers. Our method consists of two-step: (i) Make appropriate weights { w ^ i } n i=1 such that the weighted sample mean of regression covariates robustly estimates the population mean of the regression covariate, (ii) Process Huber regression using { w ^ i } n i=1 . When (a-1) the regression covariate is a sequence with i.i.d. random vectors drawn from sub-Gaussian distribution satisfying L 4 - L 2 norm equivalence with unknown mean and known identity covariance and (a-2) the absolute moment of the random noise is finite, our method attains a convergence rate, which is information theoretically optimal up to constant factor about noise term. When (b-1) the regression covariate is a sequence with i.i.d. random vectors drawn from heavy tailed distribution satisfying L 4 - L 2 norm equivalence with unknown mean and (b-2) the absolute moment of the random noise is finite, our method attains a convergence rate, which is information theoretically optimal up to constant factor.

Read more
Statistics Theory

An l 1 -oracle inequality for the Lasso in mixture-of-experts regression models

Mixture-of-experts (MoE) models are a popular framework for modeling heterogeneity in data, for both regression and classification problems in statistics and machine learning, due to their flexibility and the abundance of statistical estimation and model choice tools. Such flexibility comes from allowing the mixture weights (or gating functions) in the MoE model to depend on the explanatory variables, along with the experts (or component densities). This permits the modeling of data arising from more complex data generating processes, compared to the classical finite mixtures and finite mixtures of regression models, whose mixing parameters are independent of the covariates. The use of MoE models in a high-dimensional setting, when the number of explanatory variables can be much larger than the sample size (i.e., p≫n ), is challenging from a computational point of view, and in particular from a theoretical point of view, where the literature is still lacking results in dealing with the curse of dimensionality, in both the statistical estimation and feature selection. We consider the finite mixture-of-experts model with soft-max gating functions and Gaussian experts for high-dimensional regression on heterogeneous data, and its l 1 -regularized estimation via the Lasso. We focus on the Lasso estimation properties rather than its feature selection properties. We provide a lower bound on the regularization parameter of the Lasso function that ensures an l 1 -oracle inequality satisfied by the Lasso estimator according to the Kullback-Leibler loss.

Read more
Statistics Theory

An ℓ p theory of PCA and spectral clustering

Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery of principal components and their associated eigenvalues, there are few precise characterizations of individual principal component scores that yield low-dimensional embedding of samples. That hinders the analysis of various spectral methods. In this paper, we first develop an ℓ p perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel ℓ p analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in ℓ p norm, which includes ℓ 2 and ℓ ∞ as special examples. For sub-Gaussian mixture models, the choice of p giving optimal bounds depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the ℓ p theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery. These also provide optimal recovery results for Gaussian mixture and stochastic block models as special cases.

Read more
Statistics Theory

An Asymptotic Theory of Joint Sequential Changepoint Detection and Identification for General Stochastic Models

The paper addresses a joint sequential changepoint detection and identification/isolation problem for a general stochastic model, assuming that the observed data may be dependent and non-identically distributed, the prior distribution of the change point is arbitrary, and the post-change hypotheses are composite. The developed detection-identification theory generalizes the changepoint detection theory developed by Tartakovsky (2019) to the case of multiple composite post-change hypotheses when one has not only to detect a change as quickly as possible but also to identify (or isolate) the true post-change distribution. We propose a multi-hypothesis change detection-identification rule and show that it is nearly optimal, minimizing moments of the delay to detection as the probability of a false alarm and the probabilities of misidentification go to zero.

Read more
Statistics Theory

An Empirical Bayes Approach to Shrinkage Estimation on the Manifold of Symmetric Positive-Definite Matrices

The James-Stein estimator is an estimator of the multivariate normal mean and dominates the maximum likelihood estimator (MLE) under squared error loss. The original work inspired great interest in developing shrinkage estimators for a variety of problems. Nonetheless, research on shrinkage estimation for manifold-valued data is scarce. In this paper, we propose shrinkage estimators for the parameters of the Log-Normal distribution defined on the manifold of N×N symmetric positive-definite matrices. For this manifold, we choose the Log-Euclidean metric as its Riemannian metric since it is easy to compute and is widely used in applications. By using the Log-Euclidean distance in the loss function, we derive a shrinkage estimator in an analytic form and show that it is asymptotically optimal within a large class of estimators including the MLE, which is the sample Fréchet mean of the data. We demonstrate the performance of the proposed shrinkage estimator via several simulated data experiments. Furthermore, we apply the shrinkage estimator to perform statistical inference in diffusion magnetic resonance imaging problems.

Read more
Statistics Theory

An Intrinsic Treatment of Stochastic Linear Regression

Linear regression is perhaps one of the most popular statistical concepts, which permeates almost every scientific field of study. Due to the technical simplicity and wide applicability of linear regression, attention is almost always quickly directed to the algorithmic or computational side of linear regression. In particular, the underlying mathematics of stochastic linear regression itself as an entity usually gets either a peripheral treatment or a relatively in-depth but ad hoc treatment depending on the type of concerned problems; in other words, compared to the extensiveness of the study of mathematical properties of the "derivatives" of stochastic linear regression such as the least squares estimator, the mathematics of stochastic linear regression itself seems to have not yet received a due intrinsic treatment. Apart from the conceptual importance, a consequence of an insufficient or possibly inaccurate understanding of stochastic linear regression would be the recurrence for the role of stochastic linear regression in the important (and more sophisticated) context of structural equation modeling to be misperceived or taught in a misleading way. We believe this pity is rectifiable when the fundamental concepts are correctly classified. Accompanied by some illustrative, distinguishing examples and counterexamples, we intend to pave out the mathematical framework for stochastic linear regression, in a rigorous but non-technical way, by giving new results and pasting together several fundamental known results that are, we believe, both enlightening and conceptually useful, and that had not yet been systematically documented in the related literature. As a minor contribution, the way we arrange the fundamental known results would be the first attempt in the related literature.

Read more
Statistics Theory

An autocovariance-based learning framework for high-dimensional functional time series

Many scientific and economic applications involve the analysis of high-dimensional functional time series, which stands at the intersection between functional time series and high-dimensional statistics gathering challenges of infinite-dimensionality with serial dependence and non-asymptotics. In this paper, we model observed functional time series, which are subject to errors in the sense that each functional datum arises as the sum of two uncorrelated components, one dynamic and one white noise. Motivated from a simple fact that the autocovariance function of observed functional time series automatically filters out the noise term, we propose an autocovariance-based three-step procedure by first performing autocovariance-based dimension reduction and then formulating a novel autocovariance-based block regularized minimum distance (RMD) estimation framework to produce block sparse estimates, from which we can finally recover functional sparse estimates. We investigate non-asymptotic properties of relevant estimated terms under such autocovariance-based dimension reduction framework. To provide theoretical guarantees for the second step, we also present convergence analysis of the block RMD estimator. Finally, we illustrate the proposed autocovariance-based learning framework using applications of three sparse high-dimensional functional time series models. With derived theoretical results, we study convergence properties of the associated estimators. We demonstrate via simulated and real datasets that our proposed estimators significantly outperform the competitors.

Read more

Ready to get started?

Join us today