Featured Researches

Statistics Theory

Discrete Max-Linear Bayesian Networks

Discrete max-linear Bayesian networks are directed graphical models specified by the same recursive structural equations as max-linear models but with discrete innovations. When all of the random variables in the model are binary, these models are isomorphic to the conjunctive Bayesian network (CBN) models of Beerenwinkel, Eriksson, and Sturmfels. Many of the techniques used to study CBN models can be extended to discrete max-linear models and similar results can be obtained. In particular, we extend the fact that CBN models are toric varieties after linear change of coordinates to all discrete max-linear models.

Read more
Statistics Theory

Discrete convolution statistic for hypothesis testing

The question of testing for equality in distribution between two linear models, each consisting of sums of distinct discrete independent random variables with unequal numbers of observations, has emerged from the biological research. In this case, the computation of classical χ 2 statistics, which would not include all observations, results in loss of power, especially when sample sizes are small. Here, as an alternative that uses all data, the nonparametric maximum likelihood estimator for the distribution of sum of discrete and independent random variables, which we call the convolution statistic, is proposed and its limiting normal covariance matrix determined. To challenge null hypotheses about the distribution of this sum, the generalized Wald's method is applied to define a testing statistic whose distribution is asymptotic to a χ 2 with as many degrees of freedom as the rank of such covariance matrix. Rank analysis also reveals a connection with the roots of the probability generating functions associated to the addend variables of the linear models. A simulation study is performed to compare the convolution test with Pearson's χ 2 , and to provide usage guidelines.

Read more
Statistics Theory

Discrete-time inference for slow-fast systems driven by fractional Brownian motion

We study statistical inference for small-noise-perturbed multiscale dynamical systems where the slow motion is driven by fractional Brownian motion. We develop statistical estimators for both the Hurst index as well as a vector of unknown parameters in the model based on a single time series of observations from the slow process only. We prove that these estimators are both consistent and asymptotically normal as the amplitude of the perturbation and the time-scale separation parameter go to zero. Numerical simulations illustrate the theoretical results.

Read more
Statistics Theory

Distribution sensitive estimators of the index of regular variation based on ratios of order statistics

Ratios of central order statistics seem to be very useful for estimating the tail of the distributions and therefore, quantiles outside the range of the data. In 1995 Isabel Fraga Alves investigated the rate of convergence of three semi-parametric estimators of the parameter of the tail index in case when the cumulative distribution function of the observed random variable belongs to the max-domain of attraction of a fixed Generalized Extreme Value Distribution. They are based on ratios of specific linear transformations of two extreme order statistics. In 2019 we considered Pareto case and found two very simple and unbiased estimators of the index of regular variation. Then, using the central order statistics we showed that these estimators have many good properties. Then, we observed that although the assumptions are different, one of them is equivalent to one of Alves's estimators. Using central order statistics we proved unbiasedness, asymptotic consistency, asymptotic normality and asymptotic efficiency. Here we use again central order statistics and a parametric approach and obtain distribution sensitive estimators of the index of regular variation in some particular cases. Then, we find conditions which guarantee that these estimators are unbiased, consistent and asymptotically normal. The results are depicted via simulation study.

Read more
Statistics Theory

Distribution-Free Conditional Median Inference

We consider the problem of constructing confidence intervals for the median of a response Y?�R conditional on features X=x??R d in a situation where we are not willing to make any assumption whatsoever on the underlying distribution of the data (X,Y) . We propose a method based upon ideas from conformal prediction and establish a theoretical guarantee of coverage while also going over particular distributions where its performance is sharp. Further, we provide a lower bound on the length of any possible conditional median confidence interval. This lower bound is independent of sample size and holds for all distributions with no point masses.

Read more
Statistics Theory

Distribution-Free Robust Linear Regression

We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. When learning without assumptions on the covariates, we establish boundedness of the conditional second moment of the response variable as a necessary and sufficient condition for achieving deviation-optimal excess risk rate of convergence. In particular, combining the ideas of truncated least squares, median-of-means procedures and aggregation theory, we construct a non-linear estimator achieving excess risk of order d/n with the optimal sub-exponential tail. While the existing approaches to learning linear classes under heavy-tailed distributions focus on proper estimators, we highlight that the improperness of our estimator is necessary for attaining non-trivial guarantees in the distribution-free setting considered in this work. Finally, as a byproduct of our analysis, we prove an optimal version of the classical bound for the truncated least squares estimator due to Györfi, Kohler, Krzyzak, and Walk.

Read more
Statistics Theory

Double-Loop Unadjusted Langevin Algorithm

A well-known first-order method for sampling from log-concave probability distributions is the Unadjusted Langevin Algorithm (ULA). This work proposes a new annealing step-size schedule for ULA, which allows to prove new convergence guarantees for sampling from a smooth log-concave distribution, which are not covered by existing state-of-the-art convergence guarantees. To establish this result, we derive a new theoretical bound that relates the Wasserstein distance to total variation distance between any two log-concave distributions that complements the reach of Talagrand T2 inequality. Moreover, applying this new step size schedule to an existing constrained sampling algorithm, we show state-of-the-art convergence rates for sampling from a constrained log-concave distribution, as well as improved dimension dependence.

Read more
Statistics Theory

Doubly Distributed Supervised Learning and Inference with High-Dimensional Correlated Outcomes

This paper presents a unified framework for supervised learning and inference procedures using the divide-and-conquer approach for high-dimensional correlated outcomes. We propose a general class of estimators that can be implemented in a fully distributed and parallelized computational scheme. Modelling, computational and theoretical challenges related to high-dimensional correlated outcomes are overcome by dividing data at both outcome and subject levels, estimating the parameter of interest from blocks of data using a broad class of supervised learning procedures, and combining block estimators in a closed-form meta-estimator asymptotically equivalent to estimates obtained by Hansen (1982)'s generalized method of moments (GMM) that does not require the entire data to be reloaded on a common server. We provide rigorous theoretical justifications for the use of distributed estimators with correlated outcomes by studying the asymptotic behaviour of the combined estimator with fixed and diverging number of data divisions. Simulations illustrate the finite sample performance of the proposed method, and we provide an R package for ease of implementation.

Read more
Statistics Theory

Doubly robust estimation for conditional treatment effect: a study on asymptotics

In this paper, we apply doubly robust approach to estimate, when some covariates are given, the conditional average treatment effect under parametric, semiparametric and nonparametric structure of the nuisance propensity score and outcome regression models. We then conduct a systematic study on the asymptotic distributions of nine estimators with different combinations of estimated propensity score and outcome regressions. The study covers the asymptotic properties with all models correctly specified; with either propensity score or outcome regressions locally / globally misspecified; and with all models locally / globally misspecified. The asymptotic variances are compared and the asymptotic bias correction under model-misspecification is discussed. The phenomenon that the asymptotic variance, with model-misspecification, could sometimes be even smaller than that with all models correctly specified is explored. We also conduct a numerical study to examine the theoretical results.

Read more
Statistics Theory

Edgeworth approximations for distributions of symmetric statistics

We study the distribution of a general class of asymptoticallylinear statistics which are symmetric functions of N independent observations. The distribution functions of these statistics are approximated by an Edgeworth expansion with a remainder of order o( N ?? ) . The Edgeworth expansion is based on Hoeffding's decomposition which provides a stochastic expansion into a linear part, a quadratic part as well as smaller higher order parts. The validity of this Edgeworth expansion is proved under Cramér's condition on the linear part, moment assumptions for all parts of the statistic and an optimal dimensionality requirement for the non linear part.

Read more

Ready to get started?

Join us today