Featured Researches

Econometrics

Recent Developments on Factor Models and its Applications in Econometric Learning

This paper makes a selective survey on the recent development of the factor model and its application on statistical learnings. We focus on the perspective of the low-rank structure of factor models, and particularly draws attentions to estimating the model from the low-rank recovery point of view. The survey mainly consists of three parts: the first part is a review on new factor estimations based on modern techniques on recovering low-rank structures of high-dimensional models. The second part discusses statistical inferences of several factor-augmented models and applications in econometric learning models. The final part summarizes new developments dealing with unbalanced panels from the matrix completion perspective.

Read more
Econometrics

Recovering Network Structure from Aggregated Relational Data using Penalized Regression

Social network data can be expensive to collect. Breza et al. (2017) propose aggregated relational data (ARD) as a low-cost substitute that can be used to recover the structure of a latent social network when it is generated by a specific parametric random effects model. Our main observation is that many economic network formation models produce networks that are effectively low-rank. As a consequence, network recovery from ARD is generally possible without parametric assumptions using a nuclear-norm penalized regression. We demonstrate how to implement this method and provide finite-sample bounds on the mean squared error of the resulting estimator for the distribution of network links. Computation takes seconds for samples with hundreds of observations. Easy-to-use code in R and Python can be found at this https URL.

Read more
Econometrics

Recurrent Conditional Heteroskedasticity

We propose a new class of financial volatility models, which we call the REcurrent Conditional Heteroskedastic (RECH) models, to improve both the in-sample analysis and out-of-sample forecast performance of the traditional conditional heteroskedastic models. In particular, we incorporate auxiliary deterministic processes, governed by recurrent neural networks, into the conditional variance of the traditional conditional heteroskedastic models, e.g. the GARCH-type models, to flexibly capture the dynamics of the underlying volatility. The RECH models can detect interesting effects in financial volatility overlooked by the existing conditional heteroskedastic models such as the GARCH (Bollerslev, 1986), GJR (Glosten et al., 1993) and EGARCH (Nelson, 1991). The new models often have good out-of-sample forecasts while still explain well the stylized facts of financial volatility by retaining the well-established structures of the econometric GARCH-type models. These properties are illustrated through simulation studies and applications to four real stock index datasets. An user-friendly software package together with the examples reported in the paper are available at this https URL.

Read more
Econometrics

Regression Discontinuity Design with Many Thresholds

Numerous empirical studies employ regression discontinuity designs with multiple cutoffs and heterogeneous treatments. A common practice is to normalize all the cutoffs to zero and estimate one effect. This procedure identifies the average treatment effect (ATE) on the observed distribution of individuals local to existing cutoffs. However, researchers often want to make inferences on more meaningful ATEs, computed over general counterfactual distributions of individuals, rather than simply the observed distribution of individuals local to existing cutoffs. This paper proposes a consistent and asymptotically normal estimator for such ATEs when heterogeneity follows a non-parametric function of cutoff characteristics in the sharp case. The proposed estimator converges at the minimax optimal rate of root-n for a specific choice of tuning parameters. Identification in the fuzzy case, with multiple cutoffs, is impossible unless heterogeneity follows a finite-dimensional function of cutoff characteristics. Under parametric heterogeneity, this paper proposes an ATE estimator for the fuzzy case that optimally combines observations to maximize its precision.

Read more
Econometrics

Regression Discontinuity Design with Multivalued Treatments

We study identification and estimation in the Regression Discontinuity Design (RDD) with a multivalued treatment variable. We also allow for the inclusion of covariates. We show that without additional information, treatment effects are not identified. We give necessary and sufficient conditions that lead to identification of LATEs as well as of weighted averages of the conditional LATEs. We show that if the first stage discontinuities of the multiple treatments conditional on covariates are linearly independent, then it is possible to identify multivariate weighted averages of the treatment effects with convenient identifiable weights. If, moreover, treatment effects do not vary with some covariates or a flexible parametric structure can be assumed, it is possible to identify (in fact, over-identify) all the treatment effects. The over-identification can be used to test these assumptions. We propose a simple estimator, which can be programmed in packaged software as a Two-Stage Least Squares regression, and packaged standard errors and tests can also be used. Finally, we implement our approach to identify the effects of different types of insurance coverage on health care utilization, as in Card, Dobkin and Maestas (2008).

Read more
Econometrics

Regularized Solutions to Linear Rational Expectations Models

This paper proposes an algorithm for computing regularized solutions to linear rational expectations models. The algorithm allows for regularization cross-sectionally as well as across frequencies. A variety of numerical examples illustrate the advantage of regularization.

Read more
Econometrics

Reproducing Kernel Methods for Nonparametric and Semiparametric Treatment Effects

We propose a family of reproducing kernel ridge estimators for nonparametric and semiparametric policy evaluation. The framework includes (i) treatment effects of the population, of subpopulations, and of alternative populations; (ii) the decomposition of a total effect into a direct effect and an indirect effect (mediated by a particular mechanism); and (iii) effects of sequences of treatments. Treatment and covariates may be discrete or continuous, and low, high, or infinite dimensional. We consider estimation of means, increments, and distributions of counterfactual outcomes. Each estimator is an inner product in a reproducing kernel Hilbert space (RKHS), with a one line, closed form solution. For the nonparametric case, we prove uniform consistency and provide finite sample rates of convergence. For the semiparametric case, we prove root n consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. We evaluate our estimators in simulations then estimate continuous, heterogeneous, incremental, and mediated treatment effects of the US Jobs Corps training program for disadvantaged youth.

Read more
Econometrics

Rethinking travel behavior modeling representations through embeddings

This paper introduces the concept of travel behavior embeddings, a method for re-representing discrete variables that are typically used in travel demand modeling, such as mode, trip purpose, education level, family type or occupation. This re-representation process essentially maps those variables into a latent space called the \emph{embedding space}. The benefit of this is that such spaces allow for richer nuances than the typical transformations used in categorical variables (e.g. dummy encoding, contrasted encoding, principal components analysis). While the usage of latent variable representations is not new per se in travel demand modeling, the idea presented here brings several innovations: it is an entirely data driven algorithm; it is informative and consistent, since the latent space can be visualized and interpreted based on distances between different categories; it preserves interpretability of coefficients, despite being based on Neural Network principles; and it is transferrable, in that embeddings learned from one dataset can be reused for other ones, as long as travel behavior keeps consistent between the datasets. The idea is strongly inspired on natural language processing techniques, namely the word2vec algorithm. Such algorithm is behind recent developments such as in automatic translation or next word prediction. Our method is demonstrated using a model choice model, and shows improvements of up to 60\% with respect to initial likelihood, and up to 20% with respect to likelihood of the corresponding traditional model (i.e. using dummy variables) in out-of-sample evaluation. We provide a new Python package, called PyTre (PYthon TRavel Embeddings), that others can straightforwardly use to replicate our results or improve their own models. Our experiments are themselves based on an open dataset (swissmetro).

Read more
Econometrics

Revealing Cluster Structures Based on Mixed Sampling Frequencies

This paper proposes a new linearized mixed data sampling (MIDAS) model and develops a framework to infer clusters in a panel regression with mixed frequency data. The linearized MIDAS estimation method is more flexible and substantially simpler to implement than competing approaches. We show that the proposed clustering algorithm successfully recovers true membership in the cross-section, both in theory and in simulations, without requiring prior knowledge of the number of clusters. This methodology is applied to a mixed-frequency Okun's law model for state-level data in the U.S. and uncovers four meaningful clusters based on the dynamic features of state-level labor markets.

Read more
Econometrics

Revealing gender-specific costs of STEM in an extended Roy model of major choice

We derive sharp bounds on the non consumption utility component in an extended Roy model of sector selection. We interpret this non consumption utility component as a compensating wage differential. The bounds are derived under the assumption that potential wages in each sector are (jointly) stochastically monotone with respect to an observed selection shifter. The lower bound can also be interpreted as the minimum cost subsidy necessary to change sector choices and make them observationally indistinguishable from choices made under the classical Roy model of sorting on potential wages only. The research is motivated by the analysis of women's choice of university major and their underrepresentation in mathematics intensive fields. With data from a German graduate survey, and using the proportion of women on the STEM faculty at the time of major choice as our selection shifter, we find high costs of choosing the STEM sector for women from the former West Germany, especially for low realized incomes and low proportion of women on the STEM faculty, interpreted as a scarce presence of role models.

Read more

Ready to get started?

Join us today