Featured Researches

Statistics Theory

An efficient Averaged Stochastic Gauss-Newton algorithm for estimating parameters of non linear regressions models

Non linear regression models are a standard tool for modeling real phenomena, with several applications in machine learning, ecology, econometry... Estimating the parameters of the model has garnered a lot of attention during many years. We focus here on a recursive method for estimating parameters of non linear regressions. Indeed, these kinds of methods, whose most famous are probably the stochastic gradient algorithm and its averaged version, enable to deal efficiently with massive data arriving sequentially. Nevertheless, they can be, in practice, very sensitive to the case where the eigen-values of the Hessian of the functional we would like to minimize are at different scales. To avoid this problem, we first introduce an online Stochastic Gauss-Newton algorithm. In order to improve the estimates behavior in case of bad initialization, we also introduce a new Averaged Stochastic Gauss-Newton algorithm and prove its asymptotic efficiency.

Read more
Statistics Theory

An elementary approach for minimax estimation of Bernoulli proportion in the restricted parameter space

We present an elementary mathematical method to find the minimax estimator of the Bernoulli proportion θ under the squared error loss when θ belongs to the restricted parameter space of the form Ω=[0,η] for some pre-specified constant 0≤η≤1 . This problem is inspired from the problem of estimating the rate of positive COVID-19 tests. The presented results and applications would be useful materials for both instructors and students when teaching point estimation in statistical or machine learning courses.

Read more
Statistics Theory

An information geometry approach for robustness analysis in uncertainty quantification of computer codes

Robustness analysis is an emerging field in the domain of uncertainty quantification. It consists of analysing the response of a computer model with uncertain inputs to the perturbation of one or several of its input distributions. Thus, a practical robustness analysis methodology should rely on a coherent definition of a distribution perturbation. This paper addresses this issue by exposing a rigorous way of perturbing densities. The proposed methodology is based the Fisher distance on manifolds of probability distributions. A numerical method to calculate perturbed densities in practice is presented. This method comes from Lagrangian mechanics and consists of solving an ordinary differential equations system. This perturbation definition is then used to compute quantile-oriented robustness indices. The resulting Perturbed-Law based sensitivity Indices (PLI) are illustrated on several numerical models. This methodology is also applied to an industrial study (simulation of a loss of coolant accident in a nuclear reactor), where several tens of the model physical parameters are uncertain with limited knowledge concerning their distributions.

Read more
Statistics Theory

An optimal linear filter for estimation of random functions in Hilbert space

Let \boldmath f be a square-integrable, zero-mean, random vector with observable realizations in a Hilbert space H , and let \boldmath g be an associated square-integrable, zero-mean, random vector with realizations, which are not observable, in a Hilbert space K . We seek an optimal filter in the form of a closed linear operator X acting on the observable realizations of a proximate vector \boldmath f ϵ ≈\boldmath f that provides the best estimate \boldmath g ˆ ϵ =X \boldmath f ϵ of the vector \boldmath f . We assume the required covariance operators are known. The results are illustrated with a typical example.

Read more
Statistics Theory

Analysis of Deviance for Hypothesis Testing in Generalized Partially Linear Models

In this study, we develop nonparametric analysis of deviance tools for generalized partially linear models based on local polynomial fitting. Assuming a canonical link, we propose expressions for both local and global analysis of deviance, which admit an additivity property that reduces to analysis of variance decompositions in the Gaussian case. Chi-square tests based on integrated likelihood functions are proposed to formally test whether the nonparametric term is significant. Simulation results are shown to illustrate the proposed chi-square tests and to compare them with an existing procedure based on penalized splines. The methodology is applied to German Bundesbank Federal Reserve data.

Read more
Statistics Theory

Analytical and statistical properties of local depth functions motivated by clustering applications

Local depth functions (LDFs) are used for describing the local geometric features of multivariate distributions, especially in multimodal models. In this paper, we undertake a rigorous systematic study of the LDFs and use it to develop a theoretically validated algorithm for clustering. For this reason, we establish several analytical and statistical properties of LDFs. First, we show that, when the underlying probability distribution is absolutely continuous, under an appropriate scaling that converge to zero (referred to as extreme localization), LDFs converge uniformly to a power of the density and obtain a related rate of convergence result. Second, we establish that the centered and scaled sample LDFs converge in distribution to a centered Gaussian process, uniformly in the space of bounded functions on R p x [0,infinity], as the sample size diverges to infinity. Third, under an extreme localization that depends on the sample size, we determine the correct centering and scaling for the sample LDFs to possess a limiting normal distribution. Fourth, invoking the above results, we develop a new clustering algorithm that uses the LDFs and their differentiability properties. Fifth, for the last purpose, we establish several results concerning the gradient systems related to LDFs. Finally, we illustrate the finite sample performance of our results using simulations and apply them to two datasets.

Read more
Statistics Theory

Approximate Co-Sufficient Sampling for Goodness-of-fit Tests and Synthetic Data

Co-sufficient sampling refers to resampling the data conditional on a sufficient statistic, a useful technique for statistical problems such as goodness-of-fit tests, model selection, and confidence interval construction; it is also a powerful tool to generate synthetic data which limits the disclosure risk of sensitive data. However, sampling from such conditional distributions is both technically and computationally challenging, and is inapplicable in models without low-dimensional sufficient statistics. We study an indirect inference approach to approximate co-sufficient sampling, which only requires an efficient statistic rather than a sufficient statistic. Given an efficient estimator, we prove that the expected KL divergence goes to zero between the true conditional distribution and the resulting approximate distribution. We also propose a one-step approximate solution to the optimization problem that preserves the original estimator with an error of o p ( n −1/2 ) , which suffices for asymptotic optimality. The one-step method is easily implemented, highly computationally efficient, and applicable to a wide variety of models, only requiring the ability to sample from the model and compute an efficient statistic. We implement our methods via simulations to tackle problems in synthetic data, hypothesis testing, and differential privacy.

Read more
Statistics Theory

Approximation of probability density functions via location-scale finite mixtures in Lebesgue spaces

The class of location-scale finite mixtures is of enduring interest both from applied and theoretical perspectives of probability and statistics. We prove the following results: to an arbitrary degree of accuracy, (a) location-scale mixtures of a continuous probability density function (PDF) can approximate any continuous PDF, uniformly, on a compact set; and (b) for any finite p≥1 , location-scale mixtures of an essentially bounded PDF can approximate any PDF in L p , in the L p norm.

Read more
Statistics Theory

Asymptotic Analysis for Data-Driven Inventory Policies

We study periodic review stochastic inventory control in the data-driven setting, in which the retailer makes ordering decisions based only on historical demand observations without any knowledge of the probability distribution of the demand. Since an (s,S) -policy is optimal when the demand distribution is known, we investigate the statistical properties of the data-driven (s,S) -policy obtained by recursively computing the empirical cost-to-go functions. This policy is inherently challenging to analyze because the recursion induces propagation of the estimation error backwards in time. In this work, we establish the asymptotic properties of this data-driven policy by fully accounting for the error propagation. First, we rigorously show the consistency of the estimated parameters by filling in some gaps (due to unaccounted error propagation) in the existing studies. On the other hand, empirical process theory cannot be directly applied to show asymptotic normality. To explain, the empirical cost-to-go functions for the estimated parameters are not i.i.d. sums, again due to the error propagation. Our main methodological innovation comes from an asymptotic representation for multi-sample U -processes in terms of i.i.d. sums. This representation enables us to apply empirical process theory to derive the influence functions of the estimated parameters and establish joint asymptotic normality. Based on these results, we also propose an entirely data-driven estimator of the optimal expected cost and we derive its asymptotic distribution. We demonstrate some useful applications of our asymptotic results, including sample size determination, as well as interval estimation and hypothesis testing on vital parameters of the inventory problem. The results from our numerical simulations conform to our theoretical analysis.

Read more
Statistics Theory

Asymptotic analysis of maximum likelihood estimation of covariance parameters for Gaussian processes: an introduction with proofs

This article provides an introduction to the asymptotic analysis of covariance parameter estimation for Gaussian processes. Maximum likelihood estimation is considered. The aim of this introduction is to be accessible to a wide audience and to present some existing results and proof techniques from the literature. The increasing-domain and fixed-domain asymptotic settings are considered. Under increasing-domain asymptotics, it is shown that in general all the components of the covariance parameter can be estimated consistently by maximum likelihood and that asymptotic normality holds. In contrast, under fixed-domain asymptotics, only some components of the covariance parameter, constituting the microergodic parameter, can be estimated consistently. Under fixed-domain asymptotics, the special case of the family of isotropic Matérn covariance functions is considered. It is shown that only a combination of the variance and spatial scale parameter is microergodic. A consistency and asymptotic normality proof is sketched for maximum likelihood estimators.

Read more

Ready to get started?

Join us today