Featured Researches

Statistics Theory

Convolution of a symmetric log-concave distribution and a symmetric bimodal distribution can have any number of modes

In this note, we show that the convolution of a discrete symmetric log-concave distribution and a discrete symmetric bimodal distribution can have any strictly positive number of modes. A similar result is proved for smooth distributions.

Read more
Statistics Theory

Correlations with tailored extremal properties

Recently, Chatterjee has introduced a new coefficient of correlation which has several natural properties. In particular, the coefficient attains its maximal value if and only if one variable is a measurable function of the other variable. In this paper, we seek to define correlations which have a similar property, except now the measurable function must belong to a pre-specified class, which amounts to a shape restriction on the function. We will then look specifically at the correlation corresponding to the class of monotone nondecreasing functions, in which case we can prove various asymptotic results, as well as perform local power calculations.

Read more
Statistics Theory

Covariance estimation with nonnegative partial correlations

We study the problem of high-dimensional covariance estimation under the constraint that the partial correlations are nonnegative. The sign constraints dramatically simplify estimation: the Gaussian maximum likelihood estimator is well defined with only two observations regardless of the number of variables. We analyze its performance in the setting where the dimension may be much larger than the sample size. We establish that the estimator is both high-dimensionally consistent and minimax optimal in the symmetrized Stein loss. We also prove a negative result which shows that the sign-constraints can introduce substantial bias for estimating the top eigenvalue of the covariance matrix.

Read more
Statistics Theory

Cube root weak convergence of empirical estimators of a density level set

Given n independent random vectors with common density f on R d , we study the weak convergence of three empirical-measure based estimators of the convex λ -level set L λ of f , namely the excess mass set, the minimum volume set and the maximum probability set, all selected from a class of convex sets A that contains L λ . Since these set-valued estimators approach L λ , even the formulation of their weak convergence is non-standard. We identify the joint limiting distribution of the symmetric difference of L λ and each of the three estimators, at rate n −1/3 . It turns out that the minimum volume set and the maximum probability set estimators are asymptotically indistinguishable, whereas the excess mass set estimator exhibits "richer" limit behavior. Arguments rely on the boundary local empirical process, its cylinder representation, dimension-free concentration around the boundary of L λ , and the set-valued argmax of a drifted Wiener process.

Read more
Statistics Theory

Cumulative Residual Extropy of Minimum Ranked Set Sampling with Unequal Samples

Recently, an alternative measure of uncertainty called cumulative residual extropy (CREX) was proposed by Jahanshahi et al. (2019). In this paper, we consider uncertainty measures of minimum ranked set sampling procedure with unequal samples (MinRSSU) in terms of CREX and its dynamic version and we compare the uncertainty and information content of CREX based on MinRSSU and simple random sampling (SRS) designs. Also, using simulation, we study on new estimators of CREX for MinRSSU and SRS designs in terms of bias and mean square error. Finally, we provide a new discrimination measure of disparity between the distribution of MinRSSU and parental data SRS.

Read more
Statistics Theory

D -optimal designs for Poisson regression with synergetic interaction effect

We characterize D -optimal designs in the two-dimensional Poisson regression model with synergetic interaction and provide an explicit proof. The proof is based on the idea of reparameterization of the design region in terms of contours of constant intensity. This approach leads to a substantial reduction of complexity as properties of the sensitivity can be treated along and across the contours separately. Furthermore, some extensions of this result to higher dimensions are presented.

Read more
Statistics Theory

Data-driven aggregation in circular deconvolution

In a circular deconvolution model we consider the fully data driven density estimation of a circular random variable where the density of the additive independent measurement error is unknown. We have at hand two independent iid samples, one of the contaminated version of the variable of interest, and the other of the additive noise. We show optimality,in an oracle and minimax sense, of a fully data-driven weighted sum of orthogonal series density estimators. Two shapes of random weights are considered, one motivated by a Bayesian approach and the other by a well known model selection method. We derive non-asymptotic upper bounds for the quadratic risk and the maximal quadratic risk over Sobolev-like ellipsoids of the fully data-driven estimator. We compute rates which can be obtained in different configurations for the smoothness of the density of interest and the error density. The rates (strictly) match the optimal oracle or minimax rates for a large variety of cases, and feature otherwise at most a deterioration by a logarithmic factor. We illustrate the performance of the fully data-driven weighted sum of orthogonal series estimators by a simulation study.

Read more
Statistics Theory

Decision Theory and Large Deviations for Dynamical Hypotheses Test: Neyman-Pearson, Min-Max and Bayesian Tests

We analyze hypotheses tests via classical results on large deviations for the case of two different Holder Gibbs probabilities. The main difference for the the classical hypotheses tests in Decision Theory is that here the two considered measures are singular with respect to each other. We analyze the classical Neyman-Pearson test showing its optimality. This test becomes exponentially better when compared to other alternative tests, with the sample size going to infinity. We also consider both, the Min-Max and a certain type of Bayesian hypotheses tests. We shall consider these tests in the log likelihood framework by using several tools of Thermodynamic Formalism. Versions of the Stein's Lemma and the Chernoff's information are also presented.

Read more
Statistics Theory

Deconvoluting Kernel Density Estimation and Regression for Locally Differentially Private Data

Local differential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally differential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is always flatter in comparison with the density function of the original data points due to convolution with privacy-preserving noise density function. The effect is especially more pronounced when using slow-decaying privacy-preserving noises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This is an important challenge facing social scientists due to the use of differential privacy in the 2020 Census in the United States. In this paper, we develop density estimation methods using smoothing kernels. We use the framework of deconvoluting kernel density estimators to remove the effect of privacy-preserving noise. This approach also allows us to adapt the results from non-parameteric regression with errors-in-variables to develop regression models based on locally differentially private data. We demonstrate the performance of the developed methods on financial and demographic datasets.

Read more
Statistics Theory

Deconvolution with unknown noise distribution is possible for multivariate signals

This paper considers the deconvolution problem in the case where the target signal is multidimensional and no information is known about the noise distribution. More precisely, no assumption is made on the noise distribution and no samples are available to estimate it: the deconvolution problem is solved based only on the corrupted signal observations. We establish the identifiability of the model up to translation when the signal has a Laplace transform with an exponential growth smaller than 2 and when it can be decomposed into two dependent components. Then, we propose an estimator of the probability density function of the signal without any assumption on the noise distribution. As this estimator depends of the lightness of the tail of the signal distribution which is usually unknown, a model selection procedure is proposed to obtain an adaptive estimator in this parameter with the same rate of convergence as the estimator with a known tail parameter. Finally, we establish a lower bound on the minimax rate of convergence that matches the upper bound.

Read more

Ready to get started?

Join us today