Featured Researches

Econometrics

Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice

The importance of asymmetries in prediction problems arising in economics has been recognized for a long time. In this paper, we focus on binary choice problems in a data-rich environment with general loss functions. In contrast to the asymmetric regression problems, the binary choice with general loss functions and high-dimensional datasets is challenging and not well understood. Econometricians have studied binary choice problems for a long time, but the literature does not offer computationally attractive solutions in data-rich environments. In contrast, the machine learning literature has many computationally attractive algorithms that form the basis for much of the automated procedures that are implemented in practice, but it is focused on symmetric loss functions that are independent of individual characteristics. One of the main contributions of our paper is to show that the theoretically valid predictions of binary outcomes with arbitrary loss functions can be achieved via a very simple reweighting of the logistic regression, or other state-of-the-art machine learning techniques, such as boosting or (deep) neural networks. We apply our analysis to racial justice in pretrial detention.

Read more
Econometrics

Binary Classification Tests, Imperfect Standards, and Ambiguous Information

New binary classification tests are often evaluated relative to a pre-established test. For example, rapid Antigen tests for the detection of SARS-CoV-2 are assessed relative to more established PCR tests. In this paper, I argue that the new test can be described as producing ambiguous information when the pre-established is imperfect. This allows for a phenomenon called dilation -- an extreme form of non-informativeness. As an example, I present hypothetical test data satisfying the WHO's minimum quality requirement for rapid Antigen tests which leads to dilation. The ambiguity in the information arises from a missing data problem due to imperfection of the established test: the joint distribution of true infection and test results is not observed. Using results from Copula theory, I construct the (usually non-singleton) set of all these possible joint distributions, which allows me to assess the new test's informativeness. This analysis leads to a simple sufficient condition to make sure that a new test is not a dilation. I illustrate my approach with applications to data from three COVID-19 related tests. Two rapid Antigen tests satisfy my sufficient condition easily and are therefore informative. However, less accurate procedures, like chest CT scans, may exhibit dilation.

Read more
Econometrics

Binary Response Models for Heterogeneous Panel Data with Interactive Fixed Effects

In this paper, we investigate binary response models for heterogeneous panel data with interactive fixed effects by allowing both the cross sectional dimension and the temporal dimension to diverge. From a practical point of view, the proposed framework can be applied to predict the probability of corporate failure, conduct credit rating analysis, etc. Theoretically and methodologically, we establish a link between a maximum likelihood estimation and a least squares approach, provide a simple information criterion to detect the number of factors, and achieve the asymptotic distributions accordingly. In addition, we conduct intensive simulations to examine the theoretical findings. In the empirical study, we focus on the sign prediction of stock returns, and then use the results of sign forecast to conduct portfolio analysis. By implementing rolling-window based out-of-sample forecasts, we show the finite-sample performance and demonstrate the practical relevance of the proposed model and estimation method.

Read more
Econometrics

Blocked Clusterwise Regression

A recent literature in econometrics models unobserved cross-sectional heterogeneity in panel data by assigning each cross-sectional unit a one-dimensional, discrete latent type. Such models have been shown to allow estimation and inference by regression clustering methods. This paper is motivated by the finding that the clustered heterogeneity models studied in this literature can be badly misspecified, even when the panel has significant discrete cross-sectional structure. To address this issue, we generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple, imperfectly-correlated latent variables that describe its response-type to different covariates. We give inference results for a k-means style estimator of our model and develop information criteria to jointly select the number clusters for each latent variable. Monte Carlo simulations confirm our theoretical results and give intuition about the finite-sample performance of estimation and model selection. We also contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting. Our results suggest that over-fitting can be severe in k-means style estimators when the number of clusters is over-specified.

Read more
Econometrics

Bootstrap Inference for Quantile Treatment Effects in Randomized Experiments with Matched Pairs

This paper examines methods of inference concerning quantile treatment effects (QTEs) in randomized experiments with matched-pairs designs (MPDs). Standard multiplier bootstrap inference fails to capture the negative dependence of observations within each pair and is therefore conservative. Analytical inference involves estimating multiple functional quantities that require several tuning parameters. Instead, this paper proposes two bootstrap methods that can consistently approximate the limit distribution of the original QTE estimator and lessen the burden of tuning parameter choice. Most especially, the inverse propensity score weighted multiplier bootstrap can be implemented without knowledge of pair identities.

Read more
Econometrics

Bootstrapping Non-Stationary Stochastic Volatility

In this paper we investigate how the bootstrap can be applied to time series regressions when the volatility of the innovations is random and non-stationary. The volatility of many economic and financial time series displays persistent changes and possible non-stationarity. However, the theory of the bootstrap for such models has focused on deterministic changes of the unconditional variance and little is known about the performance and the validity of the bootstrap when the volatility is driven by a non-stationary stochastic process. This includes near-integrated volatility processes as well as near-integrated GARCH processes. This paper develops conditions for bootstrap validity in time series regressions with non-stationary, stochastic volatility. We show that in such cases the distribution of bootstrap statistics (conditional on the data) is random in the limit. Consequently, the conventional approaches to proving bootstrap validity, involving weak convergence in probability of the bootstrap statistic, fail to deliver the required results. Instead, we use the concept of `weak convergence in distribution' to develop and establish novel conditions for validity of the wild bootstrap, conditional on the volatility process. We apply our results to several testing problems in the presence of non-stationary stochastic volatility, including testing in a location model, testing for structural change and testing for an autoregressive unit root. Sufficient conditions for bootstrap validity include the absence of statistical leverage effects, i.e., correlation between the error process and its future conditional variance. The results are illustrated using Monte Carlo simulations, which indicate that the wild bootstrap leads to size control even in small samples.

Read more
Econometrics

Bounding Infection Prevalence by Bounding Selectivity and Accuracy of Tests: With Application to Early COVID-19

I propose novel partial identification bounds on infection prevalence from information on test rate and test yield. The approach utilizes user-specified bounds on (i) test accuracy and (ii) the extent to which tests are targeted, formalized as restriction on the effect of true infection status on the odds ratio of getting tested and thereby embeddable in logit specifications. The motivating application is to the COVID-19 pandemic but the strategy may also be useful elsewhere. Evaluated on data from the pandemic's early stage, even the weakest of the novel bounds are reasonably informative. Notably, and in contrast to speculations that were widely reported at the time, they place the infection fatality rate for Italy well above the one of influenza by mid-April.

Read more
Econometrics

Bounds on Distributional Treatment Effect Parameters using Panel Data with an Application on Job Displacement

This paper develops new techniques to bound distributional treatment effect parameters that depend on the joint distribution of potential outcomes -- an object not identified by standard identifying assumptions such as selection on observables or even when treatment is randomly assigned. I show that panel data and an additional assumption on the dependence between untreated potential outcomes for the treated group over time (i) provide more identifying power for distributional treatment effect parameters than existing bounds and (ii) provide a more plausible set of conditions than existing methods that obtain point identification. I apply these bounds to study heterogeneity in the effect of job displacement during the Great Recession. Using standard techniques, I find that workers who were displaced during the Great Recession lost on average 34\% of their earnings relative to their counterfactual earnings had they not been displaced. Using the methods developed in the current paper, I also show that the average effect masks substantial heterogeneity across workers.

Read more
Econometrics

Bounds on direct and indirect effects under treatment/mediator endogeneity and outcome attrition

Causal mediation analysis aims at disentangling a treatment effect into an indirect mechanism operating through an intermediate outcome or mediator, as well as the direct effect of the treatment on the outcome of interest. However, the evaluation of direct and indirect effects is frequently complicated by non-ignorable selection into the treatment and/or mediator, even after controlling for observables, as well as sample selection/outcome attrition. We propose a method for bounding direct and indirect effects in the presence of such complications using a method that is based on a sequence of linear programming problems. Considering inverse probability weighting by propensity scores, we compute the weights that would yield identification in the absence of complications and perturb them by an entropy parameter reflecting a specific amount of propensity score misspecification to set-identify the effects of interest. We apply our method to data from the National Longitudinal Survey of Youth 1979 to derive bounds on the explained and unexplained components of a gender wage gap decomposition that is likely prone to non-ignorable mediator selection and outcome attrition.

Read more
Econometrics

Breaking Ties: Regression Discontinuity Design Meets Market Design

Many schools in large urban districts have more applicants than seats. Centralized school assignment algorithms ration seats at over-subscribed schools using randomly assigned lottery numbers, non-lottery tie-breakers like test scores, or both. The New York City public high school match illustrates the latter, using test scores and other criteria to rank applicants at ``screened'' schools, combined with lottery tie-breaking at unscreened ``lottery'' schools. We show how to identify causal effects of school attendance in such settings. Our approach generalizes regression discontinuity methods to allow for multiple treatments and multiple running variables, some of which are randomly assigned. The key to this generalization is a local propensity score that quantifies the school assignment probabilities induced by lottery and non-lottery tie-breakers. The local propensity score is applied in an empirical assessment of the predictive value of New York City's school report cards. Schools that receive a high grade indeed improve SAT math scores and increase graduation rates, though by much less than OLS estimates suggest. Selection bias in OLS estimates is egregious for screened schools.

Read more

Ready to get started?

Join us today