Featured Researches

Statistics Theory

Better understanding of the multivariate hypergeometric distribution with implications in design-based survey sampling

Multivariate hypergeometric distribution arises frequently in elementary statistics and probability courses, for simultaneously studying the occurence law of specified events, when sampling without replacement from a finite population with fixed number of classification. Covariance matrix of this distribution is well known to be identical to its multinomial counterpart multiplied by 1-(n-1)/(N-1), with N and n being population and sample sizes, respectively. It appears to however, have been less discussed in the literature about the meaning of this relationship, especially regarding the specific form of the multiplier. Based on an augmenting argument together with probabilistic symmetry, we present a more transparent understanding for the covariance structure of the multivariate hypergeometric distribution. We discuss implications of these combined techniques and provide a unified description about the relative efficiency for estimating population mean based on simple random sampling, probability proportional-to-size sampling and adaptive cluster sampling, with versus without replacement. We also provide insight into the classic random group method for variance estimation.

Read more
Statistics Theory

Bi- s ∗ -Concave Distributions

We introduce new shape-constrained classes of distribution functions on R, the bi- s ∗ -concave classes. In parallel to results of Dümbgen, Kolesnyk, and Wilke (2017) for what they called the class of bi-log-concave distribution functions, we show that every s -concave density f has a bi- s ∗ -concave distribution function F for s ∗ ≤s/(s+1) . Confidence bands building on existing nonparametric bands, but accounting for the shape constraint of bi- s ∗ -concavity, are also considered. The new bands extend those developed by Dümbgen et al. (2017) for the constraint of bi-log-concavity. We also make connections between bi- s ∗ -concavity and finiteness of the Csörgő - Révész constant of F which plays an important role in the theory of quantile processes.

Read more
Statistics Theory

Bi-invariant Two-Sample Tests in Lie Groups for Shape Analysis

We propose generalizations of the Hotelling's T 2 statistic and the Bhattacharayya distance for data taking values in Lie groups. A key feature of the derived measures is that they are compatible with the group structure even for manifolds that do not admit any bi-invariant metric. This property, e.g., assures analysis that does not depend on the reference shape, thus, preventing bias due to arbitrary choices thereof. Furthermore, the generalizations agree with the common definitions for the special case of flat vector spaces guaranteeing consistency. Employing a permutation test setup, we further obtain nonparametric, two-sample testing procedures that themselves are bi-invariant and consistent. We validate our method in group tests revealing significant differences in hippocampal shape between individuals with mild cognitive impairment and normal controls.

Read more
Statistics Theory

Bias corrected estimators for proportion of true null hypotheses under exponential model: Application of adaptive FDR-controlling in segmented failure data

Two recently introduced model based bias corrected estimators for proportion of true null hypotheses ( π 0 ) under multiple hypotheses testing scenario have been restructured for exponentially distributed random observations available for each of the common hypotheses. Based on stochastic ordering, a new motivation behind formulation of some related estimators for π 0 is given. The reduction of bias for the model based estimators are theoretically justified and algorithms for computing the estimators are also presented. The estimators are also used to formulate a popular adaptive multiple testing procedure. Extensive numerical study supports superiority of the bias corrected estimators. We also point out the adverse effect of using the model based bias correction method without proper assessment of the underlying distribution. A case-study is done with a synthetic dataset in connection with reliability and warranty studies to demonstrate the applicability of the procedure, under a non-Gaussian set up. The results obtained are in line with the intuition and experience of the subject expert. An intriguing discussion has been attempted to conclude the article that also indicates the future scope of study.

Read more
Statistics Theory

Bootstrap method for misspecified ergodic Lévy driven stochastic differential equation models

In this paper, we consider possibly misspecified stochastic differential equation models driven by Lévy processes. Regardless of whether the driving noise is Gaussian or not, Gaussian quasi-likelihood estimator can estimate unknown parameters in the drift and scale coefficients. However, in the misspecified case, the asymptotic distribution of the estimator varies by the correction of the misspecification bias, and consistent estimators for the asymptotic variance proposed in the correctly specified case may lose theoretical validity. As one of its solutions, we propose a bootstrap method for approximating the asymptotic distribution. We show that our bootstrap method theoretically works in both correctly specified case and misspecified case without assuming the precise distribution of the driving noise.

Read more
Statistics Theory

Bootstrapping ℓ p -Statistics in High Dimensions

This paper considers a new bootstrap procedure to estimate the distribution of high-dimensional ℓ p -statistics, i.e. the ℓ p -norms of the sum of n independent d -dimensional random vectors with d≫n and p∈[1,∞] . We provide a non-asymptotic characterization of the sampling distribution of ℓ p -statistics based on Gaussian approximation and show that the bootstrap procedure is consistent in the Kolmogorov-Smirnov distance under mild conditions on the covariance structure of the data. As an application of the general theory we propose a bootstrap hypothesis test for simultaneous inference on high-dimensional mean vectors. We establish its asymptotic correctness and consistency under high-dimensional alternatives, and discuss the power of the test as well as the size of associated confidence sets. We illustrate the bootstrap and testing procedure numerically on simulated data.

Read more
Statistics Theory

Calibrating the scan statistic: finite sample performance vs. asymptotics

We consider the problem of detecting an elevated mean on an interval with unknown location and length in the univariate Gaussian sequence model. Recent results have shown that using scale-dependent critical values for the scan statistic allows to attain asymptotically optimal detection simultaneously for all signal lengths, thereby improving on the traditional scan, but this procedure has been criticized for losing too much power for short signals. We explain this discrepancy by showing that these asymptotic optimality results will necessarily be too imprecise to discern the performance of scan statistics in a practically relevant way, even in a large sample context. Instead, we propose to assess the performance with a new finite sample criterion. We then present three calibrations for scan statistics that perform well across a range of relevant signal lengths: The first calibration uses a particular adjustment to the critical values and is therefore tailored to the Gaussian case. The second calibration uses a scale-dependent adjustment to the significance levels and is therefore applicable to arbitrary known null distributions. The third calibration restricts the scan to a particular sparse subset of the scan windows and then applies a weighted Bonferroni adjustment to the corresponding test statistics. This calibration is also applicable to arbitrary null distributions and in addition is very simple to implement.

Read more
Statistics Theory

Canonical thresholding for non-sparse high-dimensional linear regression

We consider a high-dimensional linear regression problem. Unlike many papers on the topic, we do not require sparsity of the regression coefficients; instead, our main structural assumption is a decay of eigenvalues of the covariance matrix of the data. We propose a new family of estimators, called the canonical thresholding estimators, which pick largest regression coefficients in the canonical form. The estimators admit an explicit form and can be linked to LASSO and Principal Component Regression (PCR). A theoretical analysis for both fixed design and random design settings is provided. Obtained bounds on the mean squared error and the prediction error of a specific estimator from the family allow to clearly state sufficient conditions on the decay of eigenvalues to ensure convergence. In addition, we promote the use of the relative errors, strongly linked with the out-of-sample R 2 . The study of these relative errors leads to a new concept of joint effective dimension, which incorporates the covariance of the data and the regression coefficients simultaneously, and describes the complexity of a linear regression problem. Numerical simulations confirm good performance of the proposed estimators compared to the previously developed methods.

Read more
Statistics Theory

Central Limit Theorem and Bootstrap Approximation in High Dimensions with Near 1/ n − − √ Rates

Non-asymptotic bounds for Gaussian and bootstrap approximation have recently attracted significant interest in high-dimensional statistics. This paper studies Berry-Esseen bounds for such approximations (with respect to the multivariate Kolmogorov distance), in the context of a sum of n random vectors that are p -dimensional and i.i.d. Up to now, a growing line of work has established bounds with mild logarithmic dependence on p . However, the problem of developing corresponding bounds with near n −1/2 dependence on n has remained largely unresolved. Within the setting of random vectors that have sub-Gaussian entries, this paper establishes bounds with near n −1/2 dependence, for both Gaussian and bootstrap approximation. In addition, the proofs are considerably distinct from other recent approaches.

Read more
Statistics Theory

Central Limit Theorems for General Transportation Costs

We consider the problem of optimal transportation with general cost between a empirical measure and a general target probability on R d , with d ??1. We extend results in [19] and prove asymptotic stability of both optimal transport maps and potentials for a large class of costs in R d. We derive a central limit theorem (CLT) towards a Gaussian distribution for the empirical transportation cost under minimal assumptions, with a new proof based on the Efron-Stein inequality and on the sequential compactness of the closed unit ball in L 2 (P) for the weak topology. We provide also CLTs for empirical Wassertsein distances in the special case of potential costs | ??| p , p > 1.

Read more

Ready to get started?

Join us today