Featured Researches


Applying Data Synthesis for Longitudinal Business Data across Three Countries

Data on businesses collected by statistical agencies are challenging to protect. Many businesses have unique characteristics, and distributions of employment, sales, and profits are highly skewed. Attackers wishing to conduct identification attacks often have access to much more information than for any individual. As a consequence, most disclosure avoidance mechanisms fail to strike an acceptable balance between usefulness and confidentiality protection. Detailed aggregate statistics by geography or detailed industry classes are rare, public-use microdata on businesses are virtually inexistant, and access to confidential microdata can be burdensome. Synthetic microdata have been proposed as a secure mechanism to publish microdata, as part of a broader discussion of how to provide broader access to such data sets to researchers. In this article, we document an experiment to create analytically valid synthetic data, using the exact same model and methods previously employed for the United States, for data from two different countries: Canada (LEAP) and Germany (BHP). We assess utility and protection, and provide an assessment of the feasibility of extending such an approach in a cost-effective way to other data.

Read more

Approximate Bayes factors for unit root testing

This paper introduces a feasible and practical Bayesian method for unit root testing in financial time series. We propose a convenient approximation of the Bayes factor in terms of the Bayesian Information Criterion as a straightforward and effective strategy for testing the unit root hypothesis. Our approximate approach relies on few assumptions, is of general applicability, and preserves a satisfactory error rate. Among its advantages, it does not require the prior distribution on model's parameters to be specified. Our simulation study and empirical application on real exchange rates show great accordance between the suggested simple approach and both Bayesian and non-Bayesian alternatives.

Read more

Approximate Maximum Likelihood for Complex Structural Models

Indirect Inference (I-I) is a popular technique for estimating complex parametric models whose likelihood function is intractable, however, the statistical efficiency of I-I estimation is questionable. While the efficient method of moments, Gallant and Tauchen (1996), promises efficiency, the price to pay for this efficiency is a loss of parsimony and thereby a potential lack of robustness to model misspecification. This stands in contrast to simpler I-I estimation strategies, which are known to display less sensitivity to model misspecification precisely due to their focus on specific elements of the underlying structural model. In this research, we propose a new simulation-based approach that maintains the parsimony of I-I estimation, which is often critical in empirical applications, but can also deliver estimators that are nearly as efficient as maximum likelihood. This new approach is based on using a constrained approximation to the structural model, which ensures identification and can deliver estimators that are nearly efficient. We demonstrate this approach through several examples, and show that this approach can deliver estimators that are nearly as efficient as maximum likelihood, when feasible, but can be employed in many situations where maximum likelihood is infeasible.

Read more

Arctic Amplification of Anthropogenic Forcing: A Vector Autoregressive Analysis

On September 15th 2020, Arctic sea ice extent (SIE) ranked second-to-lowest in history and keeps trending downward. The understanding of how feedback loops amplify the effects of external CO2 forcing is still limited. We propose the VARCTIC, which is a Vector Autoregression (VAR) designed to capture and extrapolate Arctic feedback loops. VARs are dynamic simultaneous systems of equations, routinely estimated to predict and understand the interactions of multiple macroeconomic time series. The VARCTIC is a parsimonious compromise between full-blown climate models and purely statistical approaches that usually offer little explanation of the underlying mechanism. Our completely unconditional forecast has SIE hitting 0 in September by the 2060's. Impulse response functions reveal that anthropogenic CO2 emission shocks have an unusually durable effect on SIE -- a property shared by no other shock. We find Albedo- and Thickness-based feedbacks to be the main amplification channels through which CO2 anomalies impact SIE in the short/medium run. Further, conditional forecast analyses reveal that the future path of SIE crucially depends on the evolution of CO2 emissions, with outcomes ranging from recovering SIE to it reaching 0 in the 2050's. Finally, Albedo and Thickness feedbacks are shown to play an important role in accelerating the speed at which predicted SIE is heading towards 0.

Read more

Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy

Despite their popularity, machine learning predictions are sensitive to potential unobserved predictors. This paper proposes a general algorithm that assesses how the omission of an unobserved variable with high explanatory power could affect the predictions of the model. Moreover, the algorithm extends the usage of machine learning from pointwise predictions to inference and sensitivity analysis. In the application, we show how the framework can be applied to data with inherent uncertainty, such as students' scores in a standardized assessment on financial literacy. First, using Bayesian Additive Regression Trees (BART), we predict students' financial literacy scores (FLS) for a subgroup of students with missing FLS. Then, we assess the sensitivity of predictions by comparing the predictions and performance of models with and without a highly explanatory synthetic predictor. We find no significant difference in the predictions and performances of the augmented (i.e., the model with the synthetic predictor) and original model. This evidence sheds a light on the stability of the predictive model used in the application. The proposed methodology can be used, above and beyond our motivating empirical example, in a wide range of machine learning applications in social and health sciences.

Read more

Assessing Sensitivity to Unconfoundedness: Estimation and Inference

This paper provides a set of methods for quantifying the robustness of treatment effects estimated using the unconfoundedness assumption (also known as selection on observables or conditional independence). Specifically, we estimate and do inference on bounds on various treatment effect parameters, like the average treatment effect (ATE) and the average effect of treatment on the treated (ATT), under nonparametric relaxations of the unconfoundedness assumption indexed by a scalar sensitivity parameter c. These relaxations allow for limited selection on unobservables, depending on the value of c. For large enough c, these bounds equal the no assumptions bounds. Using a non-standard bootstrap method, we show how to construct confidence bands for these bound functions which are uniform over all values of c. We illustrate these methods with an empirical application to effects of the National Supported Work Demonstration program. We implement these methods in a companion Stata module for easy use in practice.

Read more

Assessing the Sensitivity of Synthetic Control Treatment Effect Estimates to Misspecification Error

We propose a sensitivity analysis for Synthetic Control (SC) treatment effect estimates to interrogate the assumption that the SC method is well-specified, namely that choosing weights to minimize pre-treatment prediction error yields accurate predictions of counterfactual post-treatment outcomes. Our data-driven procedure recovers the set of treatment effects consistent with the assumption that the misspecification error incurred by the SC method is at most the observable misspecification error incurred when using the SC estimator to predict the outcomes of some control unit. We show that under one definition of misspecification error, our procedure provides a simple, geometric motivation for comparing the estimated treatment effect to the distribution of placebo residuals to assess estimate credibility. When we apply our procedure to several canonical studies that report SC estimates, we broadly confirm the conclusions drawn by the source papers.

Read more

Asset Prices and Capital Share Risks: Theory and Evidence

An asset pricing model using long-run capital share growth risk has recently been found to successfully explain U.S. stock returns. Our paper adopts a recursive preference utility framework to derive an heterogeneous asset pricing model with capital share risks.While modeling capital share risks, we account for the elevated consumption volatility of high income stockholders. Capital risks have strong volatility effects in our recursive asset pricing model. Empirical evidence is presented in which capital share growth is also a source of risk for stock return volatility. We uncover contrasting unconditional and conditional asset pricing evidence for capital share risks.

Read more

Asymptotic Normality for Multivariate Random Forest Estimators

Regression trees and random forests are popular and effective non-parametric estimators in practical applications. A recent paper by Athey and Wager shows that the random forest estimate at any point is asymptotically Gaussian; in this paper, we extend this result to the multivariate case and show that the vector of estimates at multiple points is jointly normal. Specifically, the covariance matrix of the limiting normal distribution is diagonal, so that the estimates at any two points are independent in sufficiently deep trees. Moreover, the off-diagonal term is bounded by quantities capturing how likely two points belong to the same partition of the resulting tree. Our results relies on certain a certain stability property when constructing splits, and we give examples of splitting rules for which this assumption is and is not satisfied. We test our proposed covariance bound and the associated coverage rates of confidence intervals in numerical simulations.

Read more

Asymptotic Properties of the Maximum Likelihood Estimator in Endogenous Regime-Switching Models

This study proves the asymptotic properties of the maximum likelihood estimator (MLE) in a wide range of endogenous regime-switching models. This class of models extends the constant state transition probability in Markov-switching models to a time-varying probability that includes information from observations. A feature of importance in this proof is the mixing rate of the state process conditional on the observations, which is time varying owing to the time-varying transition probabilities. Consistency and asymptotic normality follow from the almost deterministic geometric decaying bound of the mixing rate. Relying on low-level assumptions that have been shown to hold in general, this study provides theoretical foundations for statistical inference in most endogenous regime-switching models in the literature. As an empirical application, an endogenous regime-switching autoregressive conditional heteroscedasticity (ARCH) model is estimated and analyzed with the obtained inferential results.

Read more

Ready to get started?

Join us today