Featured Researches

Statistics Theory

A Bayesian Multiple Testing Paradigm for Model Selection in Inverse Regression Problems

In this article, we propose a novel Bayesian multiple testing formulation for model and variable selection in inverse setups, judiciously embedding the idea of inverse reference distributions proposed by Bhattacharya (2013) in a mixture framework consisting of the competing models. We develop the theory and methods in the general context encompassing parametric and nonparametric competing models, dependent data, as well as misspecifications. Our investigation shows that asymptotically the multiple testing procedure almost surely selects the best possible inverse model that minimizes the minimum Kullback-Leibler divergence from the true model. We also show that the error rates, namely, versions of the false discovery rate and the false non-discovery rate converge to zero almost surely as the sample size goes to infinity. Asymptotic {\alpha}-control of versions of the false discovery rate and its impact on the convergence of false non-discovery rate versions, are also investigated. Our simulation experiments involve small sample based selection among inverse Poisson log regression and inverse geometric logit and probit regression, where the regressions are either linear or based on Gaussian processes. Additionally, variable selection is also considered. Our multiple testing results turn out to be very encouraging in the sense of selecting the best models in all the non-misspecified and misspecified cases.

Read more
Statistics Theory

A Bayesian Nonparametric Conditional Two-sample Test with an Application to Local Causal Discovery

For a continuous random variable Z , testing conditional independence X⊥⊥Y|Z is known to be a particularly hard problem. It constitutes a key ingredient of many constraint-based causal discovery algorithms. These algorithms are often applied to datasets containing binary (or discrete) variables, which indicate the 'context' of the observations, e.g. a control or treatment group within an experiment. In these settings, conditional independence testing with X or Y discrete (and the other continuous) is paramount to the performance of the causal discovery algorithm. To our knowledge no such conditional independence test currently exists, and in practice tests which assume all variables to be continuous are used instead. In this paper we aim to fill this gap, as we combine elements of Holmes et al. (2015) and Teymur and Filippi (2020) to propose a novel Bayesian nonparametric conditional two-sample test. Applied to the Local Causal Discovery algorithm, we investigate its performance on both synthetic and real-world data, and compare with state-of-the-art continuous conditional independence tests.

Read more
Statistics Theory

A Berry-Esseen theorem for sample quantiles under association

In this paper, the uniformly asymptotic normality for sample quantiles of associated random variables is investigated under some conditions on the decay of the covariances. We obtain the rate of normal approximation of order O( n −1/2 log 2 n) if the covariances decrease exponentially to 0 . The best rate is shown as O( n −1/3 ) under a polynomial decay of the covariances.

Read more
Statistics Theory

A Consistent Extension of Discrete Optimal Transport Maps for Machine Learning Applications

Optimal transport maps define a one-to-one correspondence between probability distributions, and as such have grown popular for machine learning applications. However, these maps are generally defined on empirical observations and cannot be generalized to new samples while preserving asymptotic properties. We extend a novel method to learn a consistent estimator of a continuous optimal transport map from two empirical distributions. The consequences of this work are two-fold: first, it enables to extend the transport plan to new observations without computing again the discrete optimal transport map; second, it provides statistical guarantees to machine learning applications of optimal transport. We illustrate the strength of this approach by deriving a consistent framework for transport-based counterfactual explanations in fairness.

Read more
Statistics Theory

A Dimension-free Computational Upper-bound for Smooth Optimal Transport Estimation

It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality. Despite recent efforts to improve the rate of estimation with the smoothness of the problem, the computational complexities of these recently proposed methods still degrade exponentially with the dimension. In this paper, thanks to an infinite-dimensional sum-of-squares representation, we derive a statistical estimator of smooth optimal transport which achieves a precision ε from O ~ ( ε ?? ) independent and identically distributed samples from the distributions, for a computational cost of O ~ ( ε ?? ) when the smoothness increases, hence yielding dimension-free statistical \emph{and} computational rates, with potentially exponentially dimension-dependent constants.

Read more
Statistics Theory

A Generalization of the Pearson Correlation to Riemannian Manifolds

The increasing application of deep-learning is accompanied by a shift towards highly non-linear statistical models. In terms of their geometry it is natural to identify these models with Riemannian manifolds. The further analysis of the statistical models therefore raises the issue of a correlation measure, that in the cutting planes of the tangent spaces equals the respective Pearson correlation and extends to a correlation measure that is normalized with respect to the underlying manifold. In this purpose the article reconstitutes elementary properties of the Pearson correlation to successively derive a linear generalization to multiple dimensions and thereupon a nonlinear generalization to principal manifolds, given by the Riemann-Pearson Correlation.

Read more
Statistics Theory

A Kernel Two-Sample Test for Functional Data

We propose a nonparametric two-sample test procedure based on Maximum Mean Discrepancy (MMD) for testing the hypothesis that two samples of functions have the same underlying distribution, using kernels defined on function spaces. This construction is motivated by a scaling analysis of the efficiency of MMD-based tests for datasets of increasing dimension. Theoretical properties of kernels on function spaces and their associated MMD are established and employed to ascertain the efficacy of the newly proposed test, as well as to assess the effects of using functional reconstructions based on discretised function samples. The theoretical results are demonstrated over a range of synthetic and real world datasets.

Read more
Statistics Theory

A Mean-Field Theory for Learning the Schönberg Measure of Radial Basis Functions

We develop and analyze a projected particle Langevin optimization method to learn the distribution in the Schönberg integral representation of the radial basis functions from training samples. More specifically, we characterize a distributionally robust optimization method with respect to the Wasserstein distance to optimize the distribution in the Schönberg integral representation. To provide theoretical performance guarantees, we analyze the scaling limits of a projected particle online (stochastic) optimization method in the mean-field regime. In particular, we prove that in the scaling limits, the empirical measure of the Langevin particles converges to the law of a reflected Itô diffusion-drift process. Moreover, the drift is also a function of the law of the underlying process. Using Itô lemma for semi-martingales and Grisanov's change of measure for the Wiener processes, we then derive a Mckean-Vlasov type partial differential equation (PDE) with Robin boundary conditions that describes the evolution of the empirical measure of the projected Langevin particles in the mean-field regime. In addition, we establish the existence and uniqueness of the steady-state solutions of the derived PDE in the weak sense. We apply our learning approach to train radial kernels in the kernel locally sensitive hash (LSH) functions, where the training data-set is generated via a k -mean clustering method on a small subset of data-base. We subsequently apply our kernel LSH with a trained kernel for image retrieval task on MNIST data-set, and demonstrate the efficacy of our kernel learning approach. We also apply our kernel learning approach in conjunction with the kernel support vector machines (SVMs) for classification of benchmark data-sets.

Read more
Statistics Theory

A Natural Discrete One Parameter Polynomial Exponential Distribution

In this paper, a new natural discrete version of the one parameter polynomial exponential family of distributions have been proposed and studied. The distribution is named as Natural Discrete One Parameter Polynomial Exponential (NDOPPE) distribution. Structural and reliability properties have been studied. Estimation procedure of the parameter of the distribution have been mentioned. Compound NDOPPE distribution in the context of collective risk model have been obtained in closed form. The new compound distribution has been compared with the classical compound Poisson, compound Negative binomial, compound discrete Lindley, compound xgamma-I and compound xgamma-II distributions regarding suitability of modelling extreme data with the help of some automobile claim.

Read more
Statistics Theory

A New Class of Multivariate Elliptically Contoured Distributions with Inconsistency Property

We introduce a new class of multivariate elliptically symmetric distributions including elliptically symmetric logistic distributions and Kotz type distributions. We investigate the various probabilistic properties including marginal distributions, conditional distributions, linear transformations, characteristic functions and dependence measure in the perspective of the inconsistency property. In addition, we provide a real data example to show that the new distributions have reasonable flexibility.

Read more

Ready to get started?

Join us today