Featured Researches

Statistics Theory

Parameter estimation for Vasicek model driven by a general Gaussian noise

This paper developed an inference problem for Vasicek model driven by a general Gaussian process. We construct a least squares estimator and a moment estimator for the drift parameters of the Vasicek model, and we prove the consistency and the asymptotic normality. Our approach extended the result of Xiao and Yu (2018) for the case when noise is a fractional Brownian motion with Hurst parameter H \in [1/2,1).

Read more
Statistics Theory

Parameter estimation in branching processes with almost sure extinction

We consider population-size-dependent branching processes (PSDBPs) which eventually become extinct with probability one. For these processes, we derive maximum likelihood estimators for the mean number of offspring born to individuals when the current population size is z≥1 . As is standard in branching process theory, an asymptotic analysis of the estimators requires us to condition on non-extinction up to a finite generation n and let n→∞ ; however, because the processes become extinct with probability one, we are able to demonstrate that our estimators do not satisfy the classical consistency property ( C -consistency). This leads us to define the concept of Q -consistency, and we prove that our estimators are Q -consistent and asymptotically normal. To investigate the circumstances in which a C -consistent estimator is preferable to a Q -consistent estimator, we then provide two C -consistent estimators for subcritical Galton-Watson branching processes. Our results rely on a combination of linear operator theory, coupling arguments, and martingale methods.

Read more
Statistics Theory

Partial Recovery for Top- k Ranking: Optimality of MLE and Sub-Optimality of Spectral Method

Given partially observed pairwise comparison data generated by the Bradley-Terry-Luce (BTL) model, we study the problem of top- k ranking. That is, to optimally identify the set of top- k players. We derive the minimax rate with respect to a normalized Hamming loss. This provides the first result in the literature that characterizes the partial recovery error in terms of the proportion of mistakes for top- k ranking. We also derive the optimal signal to noise ratio condition for the exact recovery of the top- k set. The maximum likelihood estimator (MLE) is shown to achieve both optimal partial recovery and optimal exact recovery. On the other hand, we show another popular algorithm, the spectral method, is in general sub-optimal. Our results complement the recent work by Chen et al. (2019) that shows both the MLE and the spectral method achieve the optimal sample complexity for exact recovery. It turns out the leading constants of the sample complexity are different for the two algorithms. Another contribution that may be of independent interest is the analysis of the MLE without any penalty or regularization for the BTL model. This closes an important gap between theory and practice in the literature of ranking.

Read more
Statistics Theory

Penalized Langevin dynamics with vanishing penalty for smooth and log-concave targets

We study the problem of sampling from a probability distribution on R p defined via a convex and smooth potential function. We consider a continuous-time diffusion-type process, termed Penalized Langevin dynamics (PLD), the drift of which is the negative gradient of the potential plus a linear penalty that vanishes when time goes to infinity. An upper bound on the Wasserstein-2 distance between the distribution of the PLD at time t and the target is established. This upper bound highlights the influence of the speed of decay of the penalty on the accuracy of the approximation. As a consequence, considering the low-temperature limit we infer a new nonasymptotic guarantee of convergence of the penalized gradient flow for the optimization problem.

Read more
Statistics Theory

Permanental Graphs

The two components for infinite exchangeability of a sequence of distributions ( P n ) are (i) consistency, and (ii) finite exchangeability for each n . A consequence of the Aldous-Hoover theorem is that any node-exchangeable, subselection-consistent sequence of distributions that describes a randomly evolving network yields a sequence of random graphs whose expected number of edges grows quadratically in the number of nodes. In this note, another notion of consistency is considered, namely, delete-and-repair consistency; it is motivated by the sense in which infinitely exchangeable permutations defined by the Chinese restaurant process (CRP) are consistent. A goal is to exploit delete-and-repair consistency to obtain a nontrivial sequence of distributions on graphs ( P n ) that is sparse, exchangeable, and consistent with respect to delete-and-repair, a well known example being the Ewens permutations \cite{tavare}. A generalization of the CRP (α) as a distribution on a directed graph using the α -weighted permanent is presented along with the corresponding normalization constant and degree distribution; it is dubbed the Permanental Graph Model (PGM). A negative result is obtained: no setting of parameters in the PGM allows for a consistent sequence ( P n ) in the sense of either subselection or delete-and-repair.

Read more
Statistics Theory

Permutation Testing for Dependence in Time Series

Given observations from a stationary time series, permutation tests allow one to construct exactly level α tests under the null hypothesis of an i.i.d. (or, more generally, exchangeable) distribution. On the other hand, when the null hypothesis of interest is that the underlying process is an uncorrelated sequence, permutation tests are not necessarily level α , nor are they approximately level α in large samples. In addition, permutation tests may have large Type 3, or directional, errors, in which a two-sided test rejects the null hypothesis and concludes that the first-order autocorrelation is larger than 0, when in fact it is less than 0. In this paper, under weak assumptions on the mixing coefficients and moments of the sequence, we provide a test procedure for which the asymptotic validity of the permutation test holds, while retaining the exact rejection probability α in finite samples when the observations are independent and identically distributed. A Monte Carlo simulation study, comparing the permutation test to other tests of autocorrelation, is also performed, along with an empirical example of application to financial data.

Read more
Statistics Theory

Phase retrieval in high dimensions: Statistical and computational phase transitions

We consider the phase retrieval problem of reconstructing a n -dimensional real or complex signal X ⋆ from m (possibly noisy) observations Y μ =| ∑ n i=1 Φ μi X ⋆ i / n − − √ | , for a large class of correlated real and complex random sensing matrices Φ , in a high-dimensional setting where m,n→∞ while α=m/n=Θ(1) . First, we derive sharp asymptotics for the lowest possible estimation error achievable statistically and we unveil the existence of sharp phase transitions for the weak- and full-recovery thresholds as a function of the singular values of the matrix Φ . This is achieved by providing a rigorous proof of a result first obtained by the replica method from statistical mechanics. In particular, the information-theoretic transition to perfect recovery for full-rank matrices appears at α=1 (real case) and α=2 (complex case). Secondly, we analyze the performance of the best-known polynomial time algorithm for this problem -- approximate message-passing -- establishing the existence of a statistical-to-algorithmic gap depending, again, on the spectral properties of Φ . Our work provides an extensive classification of the statistical and algorithmic thresholds in high-dimensional phase retrieval for a broad class of random matrices.

Read more
Statistics Theory

Poisson QMLE for change-point detection in general integer-valued time series models

We consider together the retrospective and the sequential change-point detection in a general class of integer-valued time series. The conditional mean of the process depends on a parameter θ ∗ which may change over time. We propose procedures which are based on the Poisson quasi-maximum likelihood estimator of the parameter, and where the updated estimator is computed without the historical observations in the sequential framework. For both the retrospective and the sequential detection, the test statistics converge to some distributions obtained from the standard Brownian motion under the null hypothesis of no change and diverge to infinity under the alternative; that is, these procedures are consistent. Some results of simulations as well as real data application are provided.

Read more
Statistics Theory

Posterior Impropriety of some Sparse Bayesian Learning Models

Sparse Bayesian learning models are typically used for prediction in datasets with significantly greater number of covariates than observations. Such models often take a reproducing kernel Hilbert space (RKHS) approach to carry out the task of prediction and can be implemented using either proper or improper priors. In this article we show that a few sparse Bayesian learning models in the literature, when implemented using improper priors, lead to improper posteriors.

Read more
Statistics Theory

Precise Error Analysis of the LASSO under Correlated Designs

In this paper, we consider the problem of recovering a sparse signal from noisy linear measurements using the so called LASSO formulation. We assume a correlated Gaussian design matrix with additive Gaussian noise. We precisely analyze the high dimensional asymptotic performance of the LASSO under correlated design matrices using the Convex Gaussian Min-max Theorem (CGMT). We define appropriate performance measures such as the mean-square error (MSE), probability of support recovery, element error rate (EER) and cosine similarity. Numerical simulations are presented to validate the derived theoretical results.

Read more

Ready to get started?

Join us today