Mathematics Statistics Theory - Researchain

Featured Researches

Denoising modulo samples: k-NN regression and tightness of SDP relaxation

Many modern applications involve the acquisition of noisy modulo samples of a function f , with the goal being to recover estimates of the original samples of f . For a Lipschitz function f:[0,1 ] d →R , suppose we are given the samples y i =(f( x i )+ η i )mod1;i=1,…,n where η i denotes noise. Assuming η i are zero-mean i.i.d Gaussian's, and x i 's form a uniform grid, we derive a two-stage algorithm that recovers estimates of the samples f( x i ) with a uniform error rate O(( logn n ) 1 d+2 ) holding with high probability. The first stage involves embedding the points on the unit complex circle, and obtaining denoised estimates of f( x i )mod1 via a k NN (nearest neighbor) estimator. The second stage involves a sequential unwrapping procedure which unwraps the denoised mod 1 estimates from the first stage. The estimates of the samples f( x i ) can be subsequently utilized to construct an estimate of the function f , with the aforementioned uniform error rate. Recently, Cucuringu and Tyagi proposed an alternative way of denoising modulo 1 data which works with their representation on the unit complex circle. They formulated a smoothness regularized least squares problem on the product manifold of unit circles, where the smoothness is measured with respect to the Laplacian of a proximity graph G involving the x i 's. This is a nonconvex quadratically constrained quadratic program (QCQP) hence they proposed solving its semidefinite program (SDP) based relaxation. We derive sufficient conditions under which the SDP is a tight relaxation of the QCQP. Hence under these conditions, the global solution of QCQP can be obtained in polynomial time.

Statistics Theory

Density Deconvolution with Non-Standard Error Distributions: Rates of Convergence and Adaptive Estimation

It is a typical standard assumption in the density deconvolution problem that the characteristic function of the measurement error distribution is non-zero on the real line. While this condition is assumed in the majority of existing works on the topic, there are many problem instances of interest where it is violated. In this paper we focus on non--standard settings where the characteristic function of the measurement errors has zeros, and study how zeros multiplicity affects the estimation accuracy. For a prototypical problem of this type we demonstrate that the best achievable estimation accuracy is determined by the multiplicity of zeros, the rate of decay of the error characteristic function, as well as by the smoothness and the tail behavior of the estimated density. We derive lower bounds on the minimax risk and develop optimal in the minimax sense estimators. In addition, we consider the problem of adaptive estimation and propose a data-driven estimator that automatically adapts to unknown smoothness and tail behavior of the density to be estimated.

Statistics Theory

Density power divergence for general integer-valued time series with multivariate exogenous covariate

In this article, we study a robust estimation method for a general class of integer-valued time series models. The conditional distribution of the process belongs to a broad class of distribution and unlike classical autoregressive framework, the conditional mean of the process also depends on some multivariate exogenous covariate. We derive a robust inference procedure based on the minimum density power divergence. Under certain regularity conditions, we establish that the proposed estimator is consistent and asymptotically normal. Simulation experiments are conducted to illustrate the empirical performances of the estimator. An application to the number of transactions per minute for the stock Ericsson B is also provided.

Statistics Theory

Design based incomplete U-statistics

U-statistics are widely used in fields such as economics, machine learning, and statistics. However, while they enjoy desirable statistical properties, they have an obvious drawback in that the computation becomes impractical as the data size n increases. Specifically, the number of combinations, say m , that a U-statistic of order d has to evaluate is O( n d ) . Many efforts have been made to approximate the original U-statistic using a small subset of combinations since Blom (1976), who referred to such an approximation as an incomplete U-statistic. To the best of our knowledge, all existing methods require m to grow at least faster than n , albeit more slowly than n d , in order for the corresponding incomplete U-statistic to be asymptotically efficient in terms of the mean squared error. In this paper, we introduce a new type of incomplete U-statistic that can be asymptotically efficient, even when m grows more slowly than n . In some cases, m is only required to grow faster than n − − √ . Our theoretical and empirical results both show significant improvements in the statistical efficiency of the new incomplete U-statistic.

Statistics Theory

Developments and applications of Shapley effects to reliability-oriented sensitivity analysis with correlated inputs

Reliability-oriented sensitivity analysis methods have been developed for understanding the influence of model inputs relative to events which characterize the failure of a system (e.g., a threshold exceedance of the model output). In this field, the target sensitivity analysis focuses primarily on capturing the influence of the inputs on the occurrence of such a critical event. This paper proposes new target sensitivity indices, based on the Shapley values and called "target Shapley effects", allowing for interpretable sensitivity measures under dependent inputs. Two algorithms (one based on Monte Carlo sampling, and a given-data algorithm based on a nearest-neighbors procedure) are proposed for the estimation of these target Shapley effects based on the ??2 norm. Additionally, the behavior of these target Shapley effects are theoretically and empirically studied through various toy-cases. Finally, the application of these new indices in two real-world use-cases (a river flood model and a COVID-19 epidemiological model) is discussed.

Statistics Theory

Differential equations, splines and Gaussian processes

We explore the connections between Green's functions for certain differential equations, covariance functions for Gaussian processes, and the smoothing splines problem. Conventionally, the smoothing spline problem is considered in a setting of reproducing kernel Hilbert spaces, but here we present a more direct approach. With this approach, some choices that are implicit in the reproducing kernel Hilbert space setting stand out, one example being choice of boundary conditions and more elaborate shape restrictions. The paper first explores the Laplace operator and the Poisson equation and studies the corresponding Green's functions under various boundary conditions and constraints. Explicit functional forms are derived in a range of examples. These examples include several novel forms of the Green's function that, to the author's knowledge, have not previously been presented. Next we present a smoothing spline problem where we penalize the integrated squared derivative of the function to be estimated. We then show how the solution can be explicitly computed using the Green's function for the Laplace operator. In the last part of the paper, we explore the connection between Gaussian processes and differential equations, and show how the Laplace operator is related to Brownian processes and how processes that arise due to boundary conditions and shape constraints can be viewed as conditional Gaussian processes. The presented connection between Green's functions for the Laplace operator and covariance functions for Brownian processes allows us to introduce several new novel Brownian processes with specific behaviors. Finally, we consider the connection between Gaussian process priors and smoothing splines.

Statistics Theory

Differentially private depth functions and their associated medians

In this paper, we investigate the differentially private estimation of data depth functions and their associated medians. We introduce several methods for privatizing depth values at a fixed point, and show that for some depth functions, when the depth is computed at an out of sample point, privacy can be gained for free when n?��? . We also present a method for privately estimating the vector of sample point depth values. Additionally, we introduce estimation methods for depth-based medians for both depth functions with low global sensitivity and depth functions with only highly probable, low local sensitivity. We provide a general result (Lemma 1) which can be used to prove consistency of an estimator produced by the exponential mechanism, provided the limiting cost function is sufficiently smooth at a unique minimizer. We also introduce a general algorithm to privately estimate a minimizer of a cost function which has, with high probability, low local sensitivity. This algorithm combines the propose-test-release algorithm with the exponential mechanism. An application of this algorithm to generate consistent estimates of the projection depth-based median is presented. Thus, for these private depth-based medians, we show that it is possible for privacy to be obtained for free when n?��? .

Statistics Theory

Diffusion Asymptotics for Sequential Experiments

We propose a new diffusion-asymptotic analysis for sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with n time steps, we let the mean reward gaps between actions scale to the order 1/ n ??????so as to preserve the difficulty of the learning task as n grows. In this regime, we show that the behavior of a class of sequentially randomized Markov experiments converges to a diffusion limit, given as the solution of a stochastic differential equation. The diffusion limit thus enables us to derive refined, instance-specific characterization of the stochastic dynamics of adaptive experiments. As an application of this framework, we use the diffusion limit to obtain several new insights on the regret and belief evolution of Thompson sampling. We show that a version of Thompson sampling with an asymptotically uninformative prior variance achieves nearly-optimal instance-specific regret scaling when the reward gaps are relatively large. We also demonstrate that, in this regime, the posterior beliefs underlying Thompson sampling are highly unstable over time.

Statistics Theory

Discrepancy Bounds for a Class of Negatively Dependent Random Points Including Latin Hypercube Samples

We introduce a class of γ -negatively dependent random samples. We prove that this class includes, apart from Monte Carlo samples, in particular Latin hypercube samples and Latin hypercube samples padded by Monte Carlo. For a γ -negatively dependent N -point sample in dimension d we provide probabilistic upper bounds for its star discrepancy with explicitly stated dependence on N , d , and γ . These bounds generalize the probabilistic bounds for Monte Carlo samples from [Heinrich et al., Acta Arith. 96 (2001), 279--302] and [C.~Aistleitner, J.~Complexity 27 (2011), 531--540], and they are optimal for Monte Carlo and Latin hypercube samples. In the special case of Monte Carlo samples the constants that appear in our bounds improve substantially on the constants presented in the latter paper and in [C.~Aistleitner, M.~T.~Hofer, Math. Comp.~83 (2014), 1373--1381].

Statistics Theory

Discrepancy of stratified samples from partitions of the unit cube

We extend the notion of jittered sampling to arbitrary partitions and study the discrepancy of the related point sets. Let Ω=( Ω 1 ,…, Ω N ) be a partition of [0,1 ] d and let the i th point in P be chosen uniformly in the i th set of the partition (and stochastically independent of the other points), i=1,…,N . For the study of such sets we introduce the concept of a uniformly distributed triangular array and compare this notion to related notions in the literature. We prove that the expected L p -discrepancy, E L p ( P Ω ) p , of a point set P Ω generated from any equivolume partition Ω is always strictly smaller than the expected L p -discrepancy of a set of N uniform random samples for p>1 . For fixed N we consider classes of stratified samples based on equivolume partitions of the unit cube into convex sets or into sets with a uniform positive lower bound on their reach. It is shown that these classes contain at least one minimizer of the expected L p -discrepancy. We illustrate our results with explicit constructions for small N . In addition, we present a family of partitions that seems to improve the expected discrepancy of Monte Carlo sampling by a factor of 2 for every N .

Ready to get started?

Join us today

Archive Your Research