Featured Researches

Methodology

Controlling False Discovery Rates under Cross-Sectional Correlations

We consider controlling the false discovery rate for testing many time series with an unknown cross-sectional correlation structure. Given a large number of hypotheses, false and missing discoveries can plague an analysis. While many procedures have been proposed to control false discovery, most of them either assume independent hypotheses or lack statistical power. A problem of particular interest is in financial asset pricing, where the goal is to determine which ``factors" lead to excess returns out of a large number of potential factors. Our contribution is two-fold. First, we show the consistency of Fama and French's prominent method under multiple testing. Second, we propose a novel method for false discovery control using double bootstrapping. We achieve superior statistical power to existing methods and prove that the false discovery rate is controlled. Simulations and a real data application illustrate the efficacy of our method over existing methods.

Read more
Methodology

Controlling the EWMA S^2 control chart false alarm behavior when the in-control variance level must be estimated

Investigating the problem of setting control limits in the case of parameter uncertainty is more accessible when monitoring the variance because only one parameter has to be estimated. Simply ignoring the induced uncertainty frequently leads to control charts with poor false alarm performances. Adjusting the unconditional in-control (IC) average run length (ARL) makes the situation even worse. Guaranteeing a minimum conditional IC ARL with some given probability is another very popular approach to solving these difficulties. However, it is very conservative as well as more complex and more difficult to communicate. We utilize the probability of a false alarm within the planned number of points to be plotted on the control chart. It turns out that adjusting this probability produces notably different limit adjustments compared to controlling the unconditional IC ARL. We then develop numerical algorithms to determine the respective modifications of the upper and two-sided exponentially weighted moving average (EWMA) charts based on the sample variance for normally distributed data. These algorithms are made available within an R package. Finally, the impacts of the EWMA smoothing constant and the size of the preliminary sample on the control chart design and its performance are studied.

Read more
Methodology

Copula-based Sensitivity Analysis for Multi-Treatment Causal Inference with Unobserved Confounding

Recent work has focused on the potential and pitfalls of causal identification in observational studies with multiple simultaneous treatments. On the one hand, a latent variable model fit to the observed treatments can identify essential aspects of the distribution of unobserved confounders. On the other hand, it has been shown that even when the latent confounder distribution is known exactly, causal effects are still not point identifiable. Thus, the practical benefits of latent variable modeling in multi-treatment settings remain unclear. We clarify these issues with a sensitivity analysis method that can be used to characterize the range of causal effects that are compatible with the observed data. Our method is based on a copula factorization of the joint distribution of outcomes, treatments, and confounders, and can be layered on top of arbitrary observed data models. We propose a practical implementation of this approach making use of the Gaussian copula, and establish conditions under which causal effects can be bounded. We also describe approaches for reasoning about effects, including calibrating sensitivity parameters, quantifying robustness of effect estimates, and selecting models which are most consistent with prior hypotheses.

Read more
Methodology

Copula-based measures of asymmetry between the lower and upper tail probabilities

We propose a copula-based measure of asymmetry between the lower and upper tail probabilities of bivariate distributions. The proposed measure has a simple form and possesses some desirable properties as a measure of asymmetry. The limit of the proposed measure as the index goes to the boundary of its domain can be expressed in a simple form under certain conditions on copulas. A sample analogue of the proposed measure for a sample from a copula is presented and its weak convergence to a Gaussian process is shown. Another sample analogue of the presented measure, which is based on a sample from a distribution on R 2 , is given. Simple methods for interval estimation and nonparametric testing based on the two sample analogues are presented. As an example, the presented measure is applied to daily returns of S&P500 and Nikkei225.

Read more
Methodology

Correlation Based Principal Loading Analysis

Principal loading analysis is a dimension reduction method that discards variables which have only a small distorting effect on the covariance matrix. We complement principal loading analysis and propose to rather use a mix of both, the correlation and covariance matrix instead. Further, we suggest to use rescaled eigenvectors and provide updated algorithms for all proposed changes.

Read more
Methodology

Couplings of the Random-Walk Metropolis algorithm

Couplings play a central role in contemporary Markov chain Monte Carlo methods and in the analysis of their convergence to stationarity. In most cases, a coupling must induce relatively fast meeting between chains to ensure good performance. In this paper we fix attention on the random walk Metropolis algorithm and examine a range of coupling design choices. We introduce proposal and acceptance step couplings based on geometric, optimal transport, and maximality considerations. We consider the theoretical properties of these choices and examine their implication for the meeting time of the chains. We conclude by extracting a few general principles and hypotheses on the design of effective couplings.

Read more
Methodology

Covariate Balancing by Uniform Transformer

In observational studies, it is important to balance covariates in different treatment groups in order to estimate treatment effects. One of the most commonly used methods for such purpose is the weighting method. The performance quality of this method usually depends on either the correct model specification for the propensity score or strong regularity conditions for the underlying model, which might not hold in practice. In this paper, we introduce a new robust and computationally efficient framework of weighting methods for covariate balancing, which allows us to conduct model-free inferences for the sake of robustness and integrate an extra `unlabeled' data set if available. Unlike existing methods, the new framework reduces the weights construction problem to a classical density estimation problem by applying a data-driven transformation to the observed covariates. We characterize the theoretical properties of the new estimators of average treatment effect under a nonparametric setting and show that they are able to work robustly under low regularity conditions. The new framework is also applied to several numerical examples using both simulated and real datasets to demonstrate its practical merits.

Read more
Methodology

Covariate adjustment in subgroup analyses of randomized clinical trials: A propensity score approach

Background: Subgroup analyses are frequently conducted in randomized clinical trials to assess evidence of heterogeneous treatment effect across patient subpopulations. Although randomization balances covariates within subgroups in expectation, chance imbalance may be amplified in small subgroups and harm the precision. Covariate adjustment in overall analysis of RCT is often conducted, via either ANCOVA or propensity score weighting, but for subgroup analysis has been rarely discussed. In this article, we develop propensity score weighting methodology for covariate adjustment to improve the precision and power of subgroup analyses in RCTs. Methods: We extend the propensity score weighting methodology to subgroup analyses by fitting a logistic propensity model with pre-specified covariate-subgroup interactions. We show that, by construction, overlap weighting exactly balances the covariates with interaction terms in each subgroup. Extensive simulations were performed to compare the operating characteristics of unadjusted, different propensity score weighting and the ANCOVA estimator. We apply these methods to the HF-ACTION trial to evaluate the effect of exercise training on 6-minute walk test in pre-specified subgroups. Results: Standard errors of the adjusted estimators are smaller than those of the unadjusted estimator. The propensity score weighting estimator is as efficient as ANCOVA, and is often more efficient when subgroup sample size is small (e.g.<125), and/or when outcome model is misspecified. The weighting estimators with full-interaction propensity model consistently outperform the standard main-effect propensity model. Conclusion: Propensity score weighting is a transparent and objective method to adjust chance imbalance of important covariates in subgroup analyses of RCTs. It is crucial to include the full covariate-subgroup interactions in the propensity score model.

Read more
Methodology

Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions

The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of this parameter is well-established. High-dimensional regimes do not admit such a convenience, however. As such, a variety of estimators have been derived to overcome the shortcomings of the sample covariance matrix in these settings. Yet, the question of selecting an optimal estimator from among the plethora available remains largely unaddressed. Using the framework of cross-validated loss-based estimation, we develop the theoretical underpinnings of just such an estimator selection procedure. In particular, we propose a general class of loss functions for covariance matrix estimation and establish finite-sample risk bounds and conditions for the asymptotic optimality of the cross-validated estimator selector with respect to these loss functions. We evaluate our proposed approach via a comprehensive set of simulation experiments and demonstrate its practical benefits by application in the exploratory analysis of two single-cell transcriptome sequencing datasets. A free and open-source software implementation of the proposed methodology, the cvCovEst R package, is briefly introduced.

Read more
Methodology

Cumulative deviation of a subpopulation from the full population

Assessing equity in treatment of a subpopulation often involves assigning numerical "scores" to all individuals in the full population such that similar individuals get similar scores; matching via propensity scores or appropriate covariates is common, for example. Given such scores, individuals with similar scores may or may not attain similar outcomes independent of the individuals' memberships in the subpopulation. The traditional graphical methods for visualizing inequities are known as "reliability diagrams" or "calibrations plots," which bin the scores into a partition of all possible values, and for each bin plot both the average outcomes for only individuals in the subpopulation as well as the average outcomes for all individuals; comparing the graph for the subpopulation with that for the full population gives some sense of how the averages for the subpopulation deviate from the averages for the full population. Unfortunately, real data sets contain only finitely many observations, limiting the usable resolution of the bins, and so the conventional methods can obscure important variations due to the binning. Fortunately, plotting cumulative deviation of the subpopulation from the full population as proposed in this paper sidesteps the problematic coarse binning. The cumulative plots encode subpopulation deviation directly as the slopes of secant lines for the graphs. Slope is easy to perceive even when the constant offsets of the secant lines are irrelevant. The cumulative approach avoids binning that smooths over deviations of the subpopulation from the full population. Such cumulative aggregation furnishes both high-resolution graphical methods and simple scalar summary statistics (analogous to those of Kuiper and of Kolmogorov and Smirnov used in statistical significance testing for comparing probability distributions).

Read more

Ready to get started?

Join us today