[PDF] Debiased/Double Machine Learning for Instrumental Variable Quantile Regressions

Abstract

In this study, we investigate estimation and inference on a low-dimensional causal parameter in the presence of high-dimensional controls in an instrumental variable quantile regression. Our proposed econometric procedure builds on the Neyman-type orthogonal moment conditions of a previous study Chernozhukov, Hansen and Wuthrich (2018) and is thus relatively insensitive to the estimation of the nuisance parameters. The Monte Carlo experiments show that the estimator copes well with high-dimensional controls. We also apply the procedure to empirically reinvestigate the quantile treatment effect of 401(k) participation on accumulated wealth.

Full PDF

DDebiased/Double Machine Learning for Instrumental VariableQuantile Regressions

Jau-er Chen a,b and

Jia-Jyun Tien c a Institute for International Strategy, Tokyo International University. b Center for Research in Econometric Theory and Applications, National Taiwan University. c Department of Economics, National Taiwan University.

Abstract

The aim of this paper is to investigate estimation and inference on a low-dimensionalcausal parameter in the presence of high-dimensional controls in an instrumental vari-able quantile regression. The estimation and inference are based on the Neyman-typeorthogonal moment conditions, that are relatively insensitive to the estimation of the nui-sance parameters. The Monte Carlo experiments show that the econometric procedureperforms well. We also apply the procedure to reinvestigate two empirical studies: thequantile treatment eﬀect of 401(k) participation on accumulated wealth, and the distri-butional eﬀect of job-training program participation on trainee earnings.Keywords: instrumental variable, quantile regression, treatment eﬀect, LASSO,double machine learning.

JEL Classiﬁcation: C21; C26.

Correspondence: Jau-er Chen. E-mail: [email protected] Address: 1-13-1 Matobakita Kawagoe, Saitama 350-1197, Japan.This version: September 2019. We are grateful to Masayuki Hirukawa, Tsung-Chih Lai, and Hsin-Yi Lin for discussions andcomments. This paper has beneﬁted from presentations at the 2nd International Conference on Econometrics and Statistics(EcoSta 2018), and the Ryukoku University. The authors declare no conﬂict of interest. The usual disclaimer applies.Funding: This research was partly funded by the personal research fund from Tokyo International University, and ﬁnanciallysupported by the Center for Research in Econometric Theory and Applications (Grant no. 107L900203) from The FeaturedAreas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education(MOE) in Taiwan. a r X i v : . [ ec on . E M ] S e p Introduction

Model selection and variable selection are widely discussed in the area of prediction. Muchless attention, however, has been paid to the modiﬁcation of prediction methods underthe context of causal machine learning in economics, cf. Athey (2017) and Athey (2018).As one of the pioneering papers, within the linear framework of instrumental variableestimation, Belloni et al. (2014) proposed a double-selection procedure to correct for anomitted variable bias in a high-dimensional framework. Constructing a general frameworkencompassing results from the aforementioned Belloni’s paper, Chernozhukov et al. (2015)and Chernozhukov et al. (2018a) proposed a uniﬁed procedure, double/debiased machinelearning (DML), which remains valid for nonlinear or semi-nonparametric models. Theaim of this paper is to investigate estimation and inference on a low-dimensional causalparameter in the presence of high-dimensional controls in an instrumental variable quantileregression. In particular, our procedure follows the idea outlined by Chernozhukov et al(2018b). To the best of our knowledge, the present study is the ﬁrst to investigate MonteCarlo performance and empirical studies of the double machine learning procedure withinthe framework of instrumental variable quantile regressions. The Monte Carlo experimentsshow that our econometric procedure performs well.Causal machine learning has been actively studied in economics in recent years, whichare based on two approaches: the double machine learning, cf. Chernozhukov et al.(2018), and the generalized random forests, cf. Athey, Tibshirani and Wager (2019). Chenand Hsiang (2019) investigate the generalized random forests model using instrumentalvariable quantile regression. In contrast to the DML for instrumental variable quantileregressions, their econometric procedure yields a measure of variable importance in termsof heterogeneity among control variables. Although related to our paper, Chen and Hsiang(2019) do not consider the setting of high-dimensional controls.We apply the proposed procedure to empirically investigate causal quantile eﬀects ofthe 401(k) participation on net ﬁnancial assets. Our empirical results signify that the401(k) participants with low savings propensity are more associated with the nonlinearincome eﬀect, which complements the ﬁndings concluded in Chernozhukov et al. (2018a)and Chiou et al. (2018). Another empirical example of the job training program partici-pation is investigated as well.The rest of the paper is organized as follows. The model speciﬁcation and estimationprocedure are introduced in Section 2. Section 3 presents Monte Carlo experiments.Section 4 presents two empirical applications. Section 5 concludes the paper. The Model

We brieﬂy review the conventional instrumental variable quantile regression (IVQR), andthen the IVQR within the framework of high-dimensional controls. Our DML procedurefor the IVQR is introduced in this section, which is constructed based on a tentativeprocedure suggested by Chernozhukov et al. (2018b).

The following conditional moment restriction yields an IVQR estimator. P [ Y ≤ q ( τ, D, X ) | X, Z ] = τ, (1)where q ( · ) is the structural quantile function, τ stands for the quantile index, D , X and Z are, respectively, the target variable, control variables and instruments. Condition (1)and linear structural quantile speciﬁcation leads to the following unconditional momentrestriction E [( τ − ( Y − D (cid:48) α − X (cid:48) β ≤ X, Z )is a vector of a function of instruments and control variables. The parameters depend onthe quantile of interest, but we suppress the τ associated with α and β for simplicity ofpresentation. Equation (2) leads to a particular moment condition for doing partiallingout: g τ ( V, α ; β, δ ) = ( τ − ( Y ≤ D (cid:48) α + X (cid:48) β )Ψ( α, δ ( α ))) (3)with “instrument” Ψ( α, δ ( α )) := ( Z − δ ( α ) X ) (4) δ ( α ) = M ( α ) J − ( α ) , where δ is a matrix parameter, M ( α ) = E [ ZX (cid:48) f ε (0 | X, Z )] , J ( α ) = E [ XX (cid:48) f ε (0 | X, Z )]and f ε (0 | X, Z ) is the conditional density of (cid:15) = Y − D (cid:48) α − X (cid:48) β ( α ) with β ( α ) deﬁned by E [( τ − ( Y ≤ D (cid:48) α + X (cid:48) β ( α )) X ] = 0 . (5) e construct the grid search interval for α ﬁrst and proﬁle out the coeﬃcient for each α in the interval on the exogenous variable by equation (5). That is,ˆ β ( a ) = arg min b ∈B N N (cid:88) i =1 ρ τ ( Y i − D (cid:48) i a − X (cid:48) i b ) . We build sample counterpart of the population moment condition based on equations(2)–(5). That is, ˆ g N ( a ) = 1 N N (cid:88) i =1 g ( V i , a, ˆ β ( a ) , ˆ δ ( a )) , (6)where ˆ δ ( a ) = (cid:99) M ( a ) (cid:98) J − ( a )for (cid:99) M ( a ) = 1 N h

N N (cid:88) i =1 Z i X (cid:48) i K h N (cid:0) Y i − D (cid:48) i a − X (cid:48) i ˆ β ( a ) (cid:1)(cid:98) J ( a ) = 1 N h

N N (cid:88) i =1 X i X (cid:48) i K h N (cid:0) Y i − D (cid:48) i a − X (cid:48) i ˆ β ( a ) (cid:1) where K h N is a kernel function with bandwidth h N . We thus can solve for the parametersthrough optimizing the GMM criterion function. Speciﬁcally,ˆ α ( τ ) = arg min a ∈A N ˆ g N ( a ) (cid:48) (cid:98) Σ( a, a ) − ˆ g N ( a ) (7) (cid:98) Σ( a , a ) = 1 N N (cid:88) i =1 g (cid:0) V i , a , ˆ β ( a ) (cid:1) g (cid:0) V i , a , ˆ β ( a ) (cid:1) (cid:48) where (cid:98) Σ( a , a ) is a weighting matrix used in the GMM estimation. Notice that theestimator ˆ α based on the inverse quantile regression (i.e. IVQR) is ﬁrst-order equivalentto the estimator deﬁned by the GMM. We modify the procedure introduced in Subsection 2.1 in order to deal with a dataset ofhigh-dimensional control variables. We construct the grid search interval for α and proﬁleout the coeﬃcients on exogenous variable using the L -norm penalized quantile regressionestimator: ˆ β ( a ) = arg min b ∈B n N (cid:88) i =1 ρ τ ( Y i − D (cid:48) i a − X (cid:48) i b ) + λ dim ( b ) (cid:88) j =1 | b j | . (8) n addition, we estimate (cid:99) M ( a ) = 1 N h

N N (cid:88) i =1 Z i X (cid:48) i K h N (cid:0) Y i − D (cid:48) i a − X (cid:48) i ˆ β ( a ) (cid:1)(cid:98) J ( a ) = 1 N h

N N (cid:88) i =1 X i X (cid:48) i K h N (cid:0) Y i − D (cid:48) i a − X (cid:48) i ˆ β ( a ) (cid:1) . We also do dimension reduction on J because of the large dimension of X . In partic-ular, we implement the following regularization.ˆ δ j ( a ) = arg min δ δ (cid:48) ˆ J ( a ) δ − ˆ M j ( a ) δ + ϑ || δ || . The regularization above does a weighting LASSO for each instrument variable on controlvariables, and consequently the L norm optimization obeys the Karush-Kuhn-Tuckercondition || ˆ δ j ( a ) (cid:48) ˆ J ( a ) − ˆ M j ( a ) || ∞ ≤ ϑ, ∀ j. (9)After implementing the double machine learning procedure outlined above for the IVQR,we now can solve for the low-dimensional causal parameter α through optimizing theGMM deﬁned as follows. The sample counterpart of the moment conditionˆ g N ( a ) = 1 N N (cid:88) i =1 (cid:0) τ − (cid:0) Y i − D (cid:48) i a − X (cid:48) i ˆ β ( a ) ≤ (cid:1)(cid:1) Ψ( a, ˆ δ ( a )) . (10)Accordingly, ˆ α = arg min a ∈A N ˆ g N ( a ) (cid:48) (cid:98) Σ( a, a ) − ˆ g N ( a ) . More importantly, the aforementioned double machine learning procedure (DML-IVQRhereafter) satisﬁes the Neyman orthogonality conditions, cf. Chernozhukov et al. (2018b).

Under the regularity conditions listed in Chernozhukov and Hansen (2008), the asymptoticnormality of the GMM estimator with a nonsmooth objective function is guaranteed. Wehave √ n ˆ g N ( a ) d −→ N (0 , Σ( a, a )) . (11)Consequently, it leads to N ˆ g N ( a ) (cid:48) (cid:98) Σ( a, a ) − ˆ g N ( a ) d −→ χ dim ( Z ) . e deﬁne W N ≡ N ˆ g N ( a ) (cid:48) (cid:98) Σ( a, a ) − ˆ g N ( a ) . It then follows that a valid (1 − p ) percent conﬁdence region for the true parameter, α ,may be constructed as the set CR := { α ∈ A : W N ( α ) ≤ c − p } , where c − p is the critical point such that P [ χ dim ( Z ) > c − p ] = p, and A can be numerical approximated by the grid { α j , j = 1 , ..., J } . The suggested double machine learning algorithm involves solving L1-norm optimizationwhich is a nontrivial task. Researchers often represent the L1-norm penalized quantileobjective function as a linear programming problem. Speciﬁcally,minimize θ ∈R ,θ ∈R p N (cid:88) i =1 ρ τ ( Y i − θ − W (cid:48) i θ ) + λ (cid:107) θ (cid:107) (12)minimize θ ∈R ,θ ∈R p ,ξ ∈R n N (cid:88) i =1 { τ ( ξ ) + + (1 − τ )( ξ ) − } + λ (cid:107) θ (cid:107) subject to θ + x (cid:48) i θ + ξ i = y i , i = 1 , . . . , n.z := [ θ +0 θ − ( θ + ) (cid:48) ( θ − ) (cid:48) ( ξ + ) (cid:48) ( ξ − ) (cid:48) ] (cid:48) c := [ 0 0 000 (cid:48) (cid:48) τ (cid:48) (1 − τ )111 (cid:48) ] (cid:48) a := [ 0 0 111 (cid:48) (cid:48) (cid:48) (cid:48) ] (cid:48) A := [ 111 (cid:48) − (cid:48) X − X I (cid:48) − I (cid:48) ] b := Y, where θ = [ α (cid:48) , β (cid:48) ] (cid:48) and W = [ D (cid:48) , X (cid:48) ] (cid:48) .However, it turns out that the computation is challenging and time-consuming. Forinstance, it often meets the singular design within the high dimensional framework. Asan alternative, we utilize the algorithm developed by Yi and Huang (2017) who use theHuber loss function to approximate the quantile loss function. In the equation (12), ρ τ isnot diﬀerentiable, and ρ τ ( t ) = (1 − τ ) t − + τ t + = 12 | t | + (2 τ − t. ince h τ ( t ) → | t | as τ → + , where h τ ( t ) is the Huber loss function of t deﬁned in Yi andHuang (2017), we have ρ τ ( t ) ≈ h τ ( t ) + (2 τ − t for small τ . Therefore the equation(12) can be approximated byminimize θ ∈R ,θ ∈R p N (cid:88) i =1 h τ ( Y i − θ − W (cid:48) i θ ) + (2 τ − Y i − θ − W (cid:48) i θ ) + λ (cid:107) θ (cid:107) . (13)The optimization above stands for the Huber approximation. This optimization problemis more computationally feasible for the sake of the diﬀerentiability of the loss function. We evaluate the ﬁnite-sample performance, in terms of RMSE and MAD, of the doublemachine learning for the IVQR. The following data generating process is modiﬁed fromthe one considered in Chen and Lee (2018). (cid:34) u i (cid:15) i (cid:35) ∼ N (cid:32) , (cid:34) . . (cid:35)(cid:33) x i z i v i  ∼ N (0 , I ) Z i = z i + v i + x i D i = Φ( z i + (cid:15) i ) X i = Φ( x i ) Y i = 1 + D i + X Ti + D i ∗ u i , where Φ( · ) is the cumulative distribution function of a standard normal random variable.Consequently, α ( τ ) = 1 + F − (cid:15) ( τ ) , where F (cid:15) ( · ) is the cumulative distribution function of (cid:15) . .1 Partialing out and nonPartialing out Z on X We focus on comparing MAD and RMSE resulting from diﬀerent models under the exactspeciﬁcation (10 control variables). po-GMM stands for doing partialing out Z on X .GMM stands for doing no partialing out Z on X . Table 1 shows that doing partialingout Z on X leads to an eﬃciency gain across quantiles especially when sample size ismoderate. Table 1: Partiailing out and nonPartialing out Z on Xn = 500 n = 1000RMSE MAD RMSE MAD α . (po-GMM) 0.1888 0.1510 0.1219 0.0950 α . (GMM) 0.4963 0.2559 0.1631 0.1138 α . (po-GMM) 0.1210 0.0966 0.0812 0.0654 α . (GMM) 0.1782 0.1179 0.0963 0.0754 α . (po-GMM) 0.0989 0.0716 0.0689 0.0436 α . (GMM) 0.1436 0.1016 0.0801 0.0542 α . (po-GMM) 0.1374 0.1066 0.0828 0.0676 α . (GMM) 0.2403 0.1710 0.1146 0.0848 α . (po-GMM) 0.2437 0.1839 0.1391 0.1067 α . (GMM) 0.8483 0.5340 0.3481 0.1967 The date generating process considers ten control variables. po-GMM stands fordoing partialing out Z on X . GMM stands for doing no partialing out Z on X . We now evaluate the ﬁnite-sample performance of the IVQR with high-dimensional con-trols. The data generating process involves 100 control variables with an approximatesparsity structure. In particular, the exact model (true model) depends only on 10 rele-vant control variables out of the 100 controls. GMM uses 100 control variables withoutregularization. Table 2 shows that the RMSE and MAD stemmed from the DML-IVQRare close to those from the exact model. In addition, Figure 1 plots distributions of theIVQR estimator with/without double machine learning. The DML-IVQR stands for thedouble machine learning for the IVQR with high-dimensional controls. Histograms sig-nify that the DML-IVQR estimator is more eﬃcient and less biased than the IVQR usingmany control variables. Since a weak-identiﬁcation robust inference procedure resultsnaturally form the IVQR, cf. Chernozhukov and Hansen (2008), we construct the robustconﬁdence regions for the GMM and the DML-IVQR estimators. Figure 2 signiﬁes that, α . (GMM) 0.7648 0.6645 0.3917 0.3442 α . (exact-GMM) 0.1888 0.1510 0.1219 0.0950 α . (DML-IVQR) 0.3112 0.2389 0.1376 0.1085 α . (GMM) 0.2712 0.2212 0.1646 0.1361 α . (exact-GMM) 0.1210 0.0966 0.0812 0.0654 α . (DML-IVQR) 0.1562 0.1254 0.0991 0.0804 α . (GMM) 0.1627 0.1234 0.1038 0.0754 α . (exact-GMM) 0.0989 0.0716 0.0689 0.0436 α . (DML-IVQR) 0.1168 0.0846 0.0775 0.0510 α . (GMM) 0.3421 0.2806 0.1747 0.1452 α . (exact-GMM) 0.1374 0.1066 0.0828 0.0676 α . (DML-IVQR) 0.1495 0.1167 0.0930 0.0741 α . (GMM) 0.9449 0.8032 0.4320 0.3681 α . (exact-GMM) 0.2437 0.1839 0.1391 0.1067 α . (DML-IVQR) 0.3567 0.2608 0.1649 0.1231 across quantiles, the weak-identiﬁcation (or weak-instrument) robust conﬁdence regionbased on the DML-IVQR is relatively sharp. The Monte Carlo experiments show thatthe DML-IVQR procedure performs well. Notice: DML-IVQR results are plotted in green. Results from the GMM with many controls are in orange.

We reinvestigate impact of the 401(k) participation on accumulated wealth. Total wealthor net ﬁnancial asset is the outcome variable Y . Treatment variable D is a binary variablestanding for participation in the 401(k) plan. Instrument Z is an indicator for beingeligible to enroll in the 401(k) plan. The vector of covariates X consists of income,age, family size, married, an IRA individual retirement account, a deﬁned beneﬁt statusindicator, a home ownership indicator and the diﬀerent education-year indicator variables.The data consists of 9915 observations.Following the regression speciﬁcation in Chernozhukov and Hansen (2004), Table 3presents quantile treatment eﬀects obtained from diﬀerent estimation procedures whichhave been deﬁned in the previous section including IVQR, po-GMM and GMM. Thecorresponding results are similar. As to the high-dimensional analysis, we create 119technical control variables including those constructed by the polynomial bases, inter-action terms, and cubic splines (thresholds). To ensure each basis has equal length, weutilize the minimax normalization for all technical control variables. Consequently, we usethe plug-in method to determine the value of penalty when doing the LASSO under themoment condition, and tune the penalty in the quantile L1-norm objective function basedon the Huber approximation by 5-fold cross validation. The DML-IVQR also implements feature normalization of the outcome variable for the sake of computational eﬃciency.To make the estimated treatment eﬀects across diﬀerent estimation procedures roughlycomparable, Table 4 shows the eﬀect obtained through the DML-IVQR multiplied bythe standard deviation of the outcome variable. Weak identiﬁcation/instrument robustinference on quantile treatment eﬀects are depicted in Figures 4 and 5. Yet, the robustconﬁdence interval widens as the sample size becomes fewer at the upper quantiles; esti-mated quantile treatment eﬀects are signiﬁcantly diﬀerent from zero. We could use theresult from the DML-IVQR as a data-driven robustness check on those summarized inthe Table 3.Tables 5 and 6 present the selected important variables across diﬀerent quantiles. Theapproximate sparsity is asymmetric across the conditional distribution in the sense thatthe number of selected variables decreases as the quantile index τ increases. However, ithinges on the relatively small number of observations at the upper quantiles as well. Ourempirical results also signify that the 401(k) participants with low savings propensity aremore associated with the nonlinear income eﬀect than those with high savings propensity,which complements the results concluded in Chernozhukov et al. (2018a) and Chiou etal. (2018). In this particular example, τ captures the rank variable which governs theunobservable heterogeneity: savings propensity. Small values of τ represent participantswith low savings propensity. The nonlinear income eﬀects, across quantile ranging from(0, 0.5], are picked up by the selected variables such as max(0 , inc − . , inc − . , inc − .

2) and etc. Technical variables in terms of age, education, familysize, and income are more frequently selected. In addition, these four variables are alsoidentiﬁed as important variables in the context of the generalized random forests, cf. Chenand Hsiang (2019). × × We create 119 technical control variables including those constructed by the polynomial bases, interaction terms, andcubic splines (thresholds). The DML-IVQR estimates the distributional eﬀect which signiﬁes an asymmetric patternsimilar to the one identiﬁed in Chernozhukov and Hansen (2004).

Figure 4: Weak Instrument Robust Inference, P401(K) on TW with hqreg L1-norm14igure 5: Weak Instrument Robust Inference, P401(K) on NFTA with hqreg L1-norm15able 5: Total WealthQuantile Selected Variables0.15 ira , educ , educ , age ∗ ira , age ∗ inc , f size ∗ educ , f size ∗ hmortira ∗ educ , ira ∗ inc , hval ∗ inc , marr , male , i a twoearn , marr ∗ f size , pira ∗ inc , max (0 , age − . max (0 , educ − . max (0 , educ − . max (0 , age − . ira , age ∗ f size , age ∗ ira , age ∗ incf size ∗ educ , ira ∗ educ , ira ∗ inchval ∗ inc , marr , male , i twoearn , marr ∗ f sizepira ∗ inc , twoearn ∗ f size , max (0 , inc − . inc , age ∗ f size , age ∗ ira , age ∗ incf size ∗ educ , ira ∗ educ , ira ∗ hval , ira ∗ inchval ∗ inc , male , a a pira ∗ inc , twoearn ∗ age , twoearn ∗ f sizetwoearn ∗ hmort , twoearn ∗ educ , max (0 , educ − . inc , ira , age ∗ ira , age ∗ hvalage ∗ inc , educ ∗ inc , hval ∗ inc , pira ∗ inc , pira ∗ age inc , ira , age ∗ hval , age ∗ inc , ira ∗ educeduc ∗ inc , hval ∗ inc , pira ∗ inc , pira ∗ hval Selected variables across τ , tuned via cross validation. ira : individual retirement account (IRA), inc : income, fsize : family size, hequity : home equity, hva l:home value, educ : education years, marr : married, smcol : college, db : deﬁned beneﬁt pension, hown :home owner, hmort : home mortgage, a

1: less than 30 years old, a

2: 30-35 years old, a

3: 36-44 years old, a

4: 45-54 years old, a

5: 55 years old or older, i < $10 K , i

2: $10 − K , i

3: $20 − K , i

4: $30 − K , i

5: $40 − K , i

6: $50 − K , i

7: $75 K +. ira , educ , f size , hval , educ , age ∗ educ , age ∗ hmortage ∗ inc , f size ∗ hmort , f size ∗ inc , ira ∗ educ , ira ∗ inchval ∗ inc , marr , db , male , i i i i twoearn , marr ∗ f sizepira ∗ inc , pira ∗ educ , twoearn ∗ inc , twoearn ∗ iramax (0 , age − . max (0 , age − . max (0 , age − . max (0 , inc − . max (0 , inc − . max (0 , educ − . ira , hmort , age ∗ hmort , age ∗ inc , f size ∗ hmort , f size ∗ incira ∗ educ , ira ∗ inc , hval ∗ inc , db , smcol , malei i i i a a twoearn , pira ∗ inc , pira ∗ agepira ∗ f size , twoearn ∗ inc , twoearn ∗ iratwoearn ∗ hmort , max (0 , age − . max (0 , age − . max (0 , inc − . max (0 , inc − . max (0 , inc − . max (0 , educ − . age , ira , age ∗ f size , age ∗ ira , age ∗ incf size ∗ educ , f size ∗ hmort , ira ∗ educ , ira ∗ inc , hval ∗ inc , hownmale , i i a a a pira ∗ incpira ∗ f size , twoearn ∗ inc , twoearn ∗ f sizetwoearn ∗ hmort , twoearn ∗ educ , max (0 , inc − . ira , age ∗ inc , hval ∗ inc , pira ∗ inc , pira ∗ age ira , age ∗ inc , educ ∗ inc , hval ∗ inc , pira ∗ inc Selected variables across τ , tuned via cross validation. ira : individual retirement account (IRA), inc : income, fsize : family size, hequity : home equity, hva l:home value, educ : education years, marr : married, smcol : college, db : deﬁned beneﬁt pension, hown :home owner, hmort : home mortgage, a

1: less than 30 years old, a

2: 30-35 years old, a

3: 36-44 years old, a

4: 45-54 years old, a

5: 55 years old or older, i < $10 K , i

2: $10 − K , i

3: $20 − K , i

4: $30 − K , i

5: $40 − K , i

6: $50 − K , i

7: $75 K +. .2 Eﬀects of subsidized training on male and female trainee earnings Abadie, Angrist and Imbens (2002) use the Job Training Partnership Act (JTPA) datato estimate the quantile treatment eﬀect of job training on the earning distribution. Thedata are from Title II of the JTPA in early 1990s, which consist of 11,204 samples,5,102 of them are male, and 6,102 of them are female. In estimation, they take thirty-month earnings as the outcome variable, enrollment for JTPA service as the treatmentvariable, and a randomized o er of JTPA enrollment as the instrumental variable. Thecontrol variables include the binary variables of black and Hispanic applicants, high-school graduates, married applicants, 5 age-group, AFDC receipt (for women), whetherthe applicant worked at least 12 weeks in the 12 months preceding random assignment,the dummies for the original recommended service strategy (classroom, OJT/JSA, other)and a dummy for whether earnings data are from the second follow-up survey.Table 7 presents quantile treatment eﬀects for male and female groups respectivelyobtained from several estimation procedures including IVQR, po-GMM, and GMM. Asto the high-dimensional analysis, we create 85 technical control variables including thoseconstructed by the polynomial bases, interaction terms, and cubic splines (thresholds).Table 8 shows the quantile treatment eﬀect obtained through the DML-IVQR. Table7 together with the existing ﬁndings in the literature suggest that for female only, jobtraining program generates signiﬁcantly positive treatment eﬀect on earnings at 0.5 and0.75 quantiles. The DML-IVQR signiﬁes similar results, which can be conﬁrmed bythe identiﬁcation-robust conﬁdence intervals depicted in Figures 6 and 7. The selectedvariables are collected in the online appendix . Thus, the existing empirical conclusionsin the literature is reassured by the IVQR using double machine learning procedure. Table 7: Estimations with Abadie et. al. (2002)’s SpeciﬁcationQuantiles 0.1 0.15 0.25 0.5 0.75 0.85 0.9Male(IVQR) 0 -200 400 500 3300 3100 1700Male(po-GMM) 0 -100 500 1900 5000 6800 7800Male(GMM) 0 -100 500 1600 5100 5800 7200Female(IVQR) 0 0 400 1600 2500 1900 1400Female(po-GMM) 0 200 700 3300 5200 6500 6900Female(GMM) 100 200 700 3200 5200 6500 6900 Selected variables for the male group: https://github.com/FieldTien/DML-QR/blob/master/Empirical_work/hqreg_data/selected_male.csv ; selected variables for the female group: https://github.com/FieldTien/DML-QR/blob/master/Empirical_work/hqreg_data/selected_female.csv × × We create 85 technical control variables including those constructed from the polynomial bases, interaction terms,and cubic splines (thresholds).

Figure 6: Weak Instrument Robust Inference. The male group.19igure 7: Weak Instrument Robust Inference. The female group.20

Conclusion

The performance of a debiased/double machine learning algorithm within the frameworkof high-dimensional IVQR is investigated. The simulation results signify that the proposedprocedure performs more eﬃciently than those based on the conventional estimator withmany controls. Furthermore, we evaluate the corresponding weak-identiﬁcation robustconﬁdence interval of the low-dimensional causal parameter. Given a large number oftechnical controls, we reinvestigate quantile treatment eﬀects of the 401(k) participationon accumulated wealth and then highlight the non-linear income eﬀects driven by the thesavings propensity. eferences Abadie A. Angrist J. and G. Imbens. 2002. “Instrumental Variables Estimates of theEﬀect of Subsidized Training on the Quantiles of Trainee Earnings,”

Econometrica ,70(1): 91–117.Athey, S. 2017. “Beyond Prediction: Using Big Data for Policy Problems,”

Science , 355:483–485.Athey, S. 2018. “The Impact of Machine Learning on Economics,” working paper, StanfordGSB.Athey, S., Tibshirani, J., and S. Wager. 2019. “Generalized Random Forests,”

The Annalsof Statistics , 47(2): 1148–1178.Belloni, A., Chernozhukov V., and C. Hansen. 2014. “High-Dimensional Methods andInference on Structural and Treatment Eﬀects,”

Journal of Economic Perspectives ,28: 29–50.Chen, J.-E. and C.-W. Hsiang. 2019. “Causal Random Forests Model using InstrumentalVariable Quantile Regression,” working paper, Center for Research in EconometricTheory and Applications, National Taiwan University.Chen, L.-Y., and S. Lee. 2018. “Exact Computation of GMM Estimators for InstrumentalVariable Quantile Regression Models,”

Journal of Applied Econometrics , forthcom-ing.Chernozhukov, V., and C. Hansen. 2004. “The Impact of 401(k) Participation on theWealth Distribution: An Instrumental Quantile Regression Analysis,”

Review ofEconomics and Statistics , 86: 735–751.Chernozhukov, V. and C. Hansen. 2008. “Instrumental Variable Quantile Regression: ARobust Inference Approach,”

Journal of Econometrics , 142: 379–398.Chernozhukov, V., Hansen C., and M. Spindler. 2015. “Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach,”

Annual Review ofEconomics , 7: 649–688.Chernozhukov, V., Chetverikov, D., Demirer, M., Duﬂo, E., Hansen, C., Newey, W., andJ. Robins. 2018a. “Double/debiased Machine Learning for Treatment and Structural arameters,” Econometrics Journal , 21: C1–C68.Chernozhukov, V., Hansen, C., and K. W¨uthrich. 2018b. “Instrumental Variable QuantileRegression,”

Handbook of Quantile Regression .Chiou, Y.-Y., Chen, M.-Y., and J.-E. Chen. 2018. “Nonparametric Regression with Mul-tiple Thresholds: Estimation and Inference,”

Journal of Econometrics , 206(2): 472–514.Yi, C., and J. Huang. 2017. “Semismooth Newton Coordinate Descent Algorithm forElastic-net Penalized Huber Loss Regression and Quantile Regression,”

Journal ofComputational and Graphical Statistics , 26(3): 547–557., 26(3): 547–557.