Featured Researches

Statistics Theory

Edgeworth corrections for spot volatility estimator

We develop Edgeworth expansion theory for spot volatility estimator under general assumptions on the log-price process that allow for drift and leverage effect. The result is based on further estimation of skewness and kurtosis, when compared with existing second order asymptotic normality result. Thus our theory can provide with a refinement result for the finite sample distribution of spot volatility. We also construct feasible confidence intervals (one-sided and two-sided) for spot volatility by using Edgeworth expansion. The Monte Carlo simulation study we conduct shows that the intervals based on Edgeworth expansion perform better than the conventional intervals based on normal approximation, which justifies the correctness of our theoretical conclusion.

Read more
Statistics Theory

Efficiency Loss of Asymptotically Efficient Tests in an Instrumental Variables Regression

In an instrumental variable model, the score statistic can be stochastically bounded for any alternative in parts of the parameter space. These regions involve a constraint on the first-stage regression coefficients and the reduced-form covariance matrix. As a consequence, the Lagrange Multiplier (LM) test can have power close to size, despite being efficient under standard asymptotics. This loss of information limits the power of conditional tests which use only the Anderson-Rubin (AR) and the score statistic. In particular, the conditional quasi-likelihood ratio (CQLR) test also suffers severe losses because its power can be bounded for any alternative. A necessary condition for drastic power loss to occur is that the Hermitian of the reduced-form covariance matrix has eigenvalues of opposite signs. These cases are denoted impossibility designs or impossibility DGPs (ID). This restriction cannot be satisfied with homoskedastic errors, but it can happen with heteroskedastic, autocorrelated, and/or clustered (HAC) errors. We show these situations can happen in practice, by applying our theory to the problem of inference on the intertemporal elasticity of substitution (IES) with weak instruments. Out of eleven countries studied by Yogo (2004) and Andrews (2016), the data in nine of them are consistent with impossibility designs at the 95% confidence level. For these countries, the noncentrality parameter of the score statistic can be very close to zero. Therefore, the power loss is sufficiently extensive to dissuade practitioners from blindly using LM-based tests with HAC errors.

Read more
Statistics Theory

Efficient Least Squares for Estimating Total Effects under Linearity and Causal Sufficiency

Recursive linear structural equation models are widely used to postulate causal mechanisms underlying observational data. In these models, each variable equals a linear combination of a subset of the remaining variables plus an error term. When there is no unobserved confounding or selection bias, the error terms are assumed to be independent. We consider estimating a total causal effect in this setting. The causal structure is assumed to be known only up to a maximally oriented partially directed acyclic graph (MPDAG), a general class of graphs that can represent a Markov equivalence class of directed acyclic graphs (DAGs) with added background knowledge. We propose a simple estimator based on recursive least squares, which can consistently estimate any identified total causal effect, under point or joint intervention. We show that this estimator is the most efficient among all regular estimators that are based on the sample covariance, which includes covariate adjustment and the estimators employed by the joint-IDA algorithm. Notably, our result holds without assuming Gaussian errors.

Read more
Statistics Theory

Efficient computational algorithms for approximate optimal designs

In this paper, we propose two simple yet efficient computational algorithms to obtain approximate optimal designs for multi-dimensional linear regression on a large variety of design spaces. We focus on the two commonly used optimal criteria, D - and A -optimal criteria. For D -optimality, we provide an alternative proof for the monotonic convergence for D -optimal criterion and propose an efficient computational algorithm to obtain the approximate D -optimal design. We further show that the proposed algorithm converges to the D -optimal design, and then prove that the approximate D -optimal design converges to the continuous D -optimal design under certain conditions. For A -optimality, we provide an efficient algorithm to obtain approximate A -optimal design and conjecture the monotonicity of the proposed algorithm. Numerical comparisons suggest that the proposed algorithms perform well and they are comparable or superior to some existing algorithms.

Read more
Statistics Theory

Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation

This work studies the spectral convergence of graph Laplacian to the Laplace-Beltrami operator when the graph affinity matrix is constructed from N random samples on a d -dimensional manifold embedded in a possibly high dimensional space. By analyzing Dirichlet form convergence and constructing candidate approximate eigenfunctions via convolution with manifold heat kernel, we prove that, with Gaussian kernel, one can set the kernel bandwidth parameter ϵ??logN/N ) 1/(d/2+2) such that the eigenvalue convergence rate is N ??/(d/2+2) and the eigenvector convergence in 2-norm has rate N ??/(d+4) ; When ϵ??N ??/(d/2+3) , both eigenvalue and eigenvector rates are N ??/(d/2+3) . These rates are up to a logN factor and proved for finitely many low-lying eigenvalues. The result holds for un-normalized and random-walk graph Laplacians when data are uniformly sampled on the manifold, as well as the density-corrected graph Laplacian (where the affinity matrix is normalized by the degree matrix from both sides) with non-uniformly sampled data. As an intermediate result, we prove new point-wise and Dirichlet form convergence rates for the density-corrected graph Laplacian. Numerical results are provided to verify the theory.

Read more
Statistics Theory

Ellipse Combining with Unknown Cross Ellipse Correlations

We discuss the combining of measurements where single measurement covariances are given but the joint measurement covariance is unknown. For this paper we assume the mapping of a single measurement to the solution space is the identity matrix. We examine the solution when it is assumed all measurements are uncorrelated. We then present a way to parameter joint measurement covariance based on pairwise correlation coefficients. Finally, we discuss how to use this parameterization to combine the measurements.

Read more
Statistics Theory

Empirical Bayes cumulative ??-value multiple testing procedure for sparse sequences

In the sparse sequence model, we consider a popular Bayesian multiple testing procedure and investigate for the first time its behaviour from the frequentist point of view. Given a spike-and-slab prior on the high-dimensional sparse unknown parameter, one can easily compute posterior probabilities of coming from the spike, which correspond to the well known local-fdr values, also called ??-values. The spike-and-slab weight parameter is calibrated in an empirical Bayes fashion, using marginal maximum likelihood. The multiple testing procedure under study, called here the cumulative ??-value procedure, ranks coordinates according to their empirical ??-values and thresholds so that the cumulative ranked sum does not exceed a user-specified level t . We validate the use of this method from the multiple testing perspective: for alternatives of appropriately large signal strength, the false discovery rate (FDR) of the procedure is shown to converge to the target level t , while its false negative rate (FNR) goes to 0 . We complement this study by providing convergence rates for the method. Additionally, we prove that the q -value multiple testing procedure shares similar convergence rates in this model.

Read more
Statistics Theory

Empirical Likelihood Ratio Test on quantiles under a Density Ratio Model

Population quantiles are important parameters in many applications. Enthusiasm for the development of effective statistical inference procedures for quantiles and their functions has been high for the past decade. In this article, we study inference methods for quantiles when multiple samples from linked populations are available. The research problems we consider have a wide range of applications. For example, to study the evolution of the economic status of a country, economists monitor changes in the quantiles of annual household incomes, based on multiple survey datasets collected annually. Even with multiple samples, a routine approach would estimate the quantiles of different populations separately. Such approaches ignore the fact that these populations are linked and share some intrinsic latent structure. Recently, many researchers have advocated the use of the density ratio model (DRM) to account for this latent structure and have developed more efficient procedures based on pooled data. The nonparametric empirical likelihood (EL) is subsequently employed. Interestingly, there has been no discussion in this context of the EL-based likelihood ratio test (ELRT) for population quantiles. We explore the use of the ELRT for hypotheses concerning quantiles and confidence regions under the DRM. We show that the ELRT statistic has a chi-square limiting distribution under the null hypothesis. Simulation experiments show that the chi-square distributions approximate the finite-sample distributions well and lead to accurate tests and confidence regions. The DRM helps to improve statistical efficiency. We also give a real-data example to illustrate the efficiency of the proposed method.

Read more
Statistics Theory

Empirical MSE Minimization to Estimate a Scalar Parameter

We consider the estimation of a scalar parameter, when two estimators are available. The first is always consistent. The second is inconsistent in general, but has a smaller asymptotic variance than the first, and may be consistent if an assumption is satisfied. We propose to use the weighted sum of the two estimators with the lowest estimated mean-squared error (MSE). We show that this third estimator dominates the other two from a minimax-regret perspective: the maximum asymptotic-MSE-gain one may incur by using this estimator rather than one of the other estimators is larger than the maximum asymptotic-MSE-loss.

Read more
Statistics Theory

Empirical process theory for locally stationary processes

We provide a framework for empirical process theory of locally stationary processes using the functional dependence measure. Our results extend known results for stationary mixing sequences by another common possibility to measure dependence and allow for additional time dependence. We develop maximal inequalities for expectations and provide functional limit theorems and Bernstein-type inequalities. We show their applicability to a variety of situations, for instance we prove the weak functional convergence of the empirical distribution function and uniform convergence rates for kernel density and regression estimation if the observations are locally stationary processes.

Read more

Ready to get started?

Join us today