EEntropy Balancing for Continuous Treatments ∗ Stefan T¨ubbicke † Discussion Paper
This version: May 29, 2020
Abstract
Interest in evaluating the effects of continuous treatments has been on the rise recently.To facilitate the estimation of causal effects in this setting, the present paper intro-duces entropy balancing for continuous treatments (EBCT) by extending the originalentropy balancing methodology of Hainm¨uller (2012). In order to estimate balanc-ing weights, the proposed approach solves a globally convex constrained optimizationproblem, allowing for computationally efficient implementation. EBCT weights reli-ably eradicate Pearson correlations between covariates and the continuous treatmentvariable. This is the case even when other methods based on the generalized propensityscore tend to yield insufficient balance due to strong selection into different treatmentintensities. Moreover, the optimization procedure is more successful in avoiding ex-treme weights attached to a single unit. Extensive Monte-Carlo simulations show thattreatment effect estimates using EBCT display similar or lower bias and uniformlylower root mean squared error. These properties make EBCT an attractive methodfor the evaluation of continuous treatments. Software implementation is available forStata and R.
Keywords:
Balancing weights, Continuous Treatment, Monte-Carlo simulation, Ob-servational studies
JEL codes:
C14, C21, C87 ∗ The author would like to thank Marco Caliendo, Guido Imbens, Martin Lange, Cosima Obst and SylviRzepka as well as participants of the 2019 annual conference of the European Economic Association forhelpful comments. † University of Potsdam, e-mail: [email protected] . -Corresponding address: University of Potsdam, Chair of Empirical Economics, August-Bebel-Str. 89,14482 Potsdam, Germany. Tel: +49 331 977 3781. Fax: +49 331 977 3210. a r X i v : . [ ec on . E M ] M a y Introduction
Methods for balancing covariate distributions have become an essential in the tool-kit tocontrol for confounding due to observed covariates. While binary treatments are the mostcommon case encountered in practice, situations in which all units receive some treatmentwith different intensity or dose are also pervasive in economics and other disciplines. Theevaluation of such continuous treatments has gained more attention recently. Examplesinclude the evaluation of job training programs with varying duration (Choe et al. , 2015;Flores et al. , 2012; Kluve et al. , 2012) and subsidies of different magnitude to firms orentire regions (Becker et al. , 2012; Bia and Mattei, 2012; Mitze et al. , 2015). Similar to thebinary case, many covariate balancing methods based on the generalized propensity score(GPS, Imbens, 2000) require an iterative estimation procedure until satisfactory balance isachieved. This is due to the fact that the GPS balances covariates only asymptotically (seeHirano and Imbens, 2004; Imai and van Dyk, 2004; Robins et al. , 2000). Therefore, recentdevelopments such as the covariate balancing generalized propensity score (CBGPS, Fong et al. , 2018) and the generalized boosted modeling approach (GBM, Zhu et al. , 2015) aimto simplify the estimation process by means of algorithmic optimization.This paper makes three main contributions to this literature. First, it extends thenon-parametric entropy balancing approach by Hainmueller (2012) for the estimation ofbalancing weights from the binary treatment framework to the context of continuous treat-ments. Similar to the original approach, the entropy balancing for continuous treatments(EBCT) algorithm solves a globally convex optimization problem. Balancing weights areobtained by minimizing the deviation from (uniform) base weights subject to zero correla-tion and normalization constraints. The convex nature of the optimization approach allowsfor efficient software implementation, converging much faster than other non-parametricbalancing methods considered. To facilitate application of the method, software implemen-tation for Stata is provided by the author in the
EBCT ado-package. Implementation for R is also available through the WeightIt package (Greifer, 2020). Second, the paper showsthat the proposed EBCT method delivers superior finite sample balance in terms of corre-lations between the treatment variable and covariates in comparison to other re-weightingapproaches based on the GPS. In fact, EBCT consistently delivers perfect balance evenwhen other methods tend to fail in this regard due to relatively strong selection into treat- To install the package, type “ssc install EBCT, replace” in the command window.
To analyze causal effects in the context of continuous treatments, it is useful to discuss mat-ters in terms of the potential outcomes framework, mainly attributed to Roy (1951) andRubin (1974). Following the notation of Imbens (2000) and Hirano and Imbens (2004),let us assume that we observe an i.i.d. sample of N individuals i with a vector of pre-treatment covariates X i ∈ R K , where K is the number of covariates. Furthermore, wehave information on a post-treatment outcome Y i and some treatment received with acertain intensity measured by T i with possible values T . The potential outcomes are givenby Y i ( t ) – also often called the unit-dose response – denoting the outcome that wouldhave been observed had the unit received treatment with intensity t . Aggregating theseunit-level responses leads to the dose-response function (DRF) E [ Y i ( t )]. Along with itsderivative dE [ Y i ( t )] /dt , the DRF represents the key relationship to be estimated in prac-tice. If treatment intensities were randomly assigned, comparisons of average outcomesbetween individuals with different treatment intensities would directly give consistent es-timates of these quantities. However, as this is mostly not the case even in experimentalsettings, the following three identifying assumptions need to be invoked in order to obtain3onsistent estimates in observational studies. Identifying Assumptions
First, conditional on observed pre-treatment covariates X ,potential outcomes must be independent of the treatment intensity received, i.e. Y i ( t ) ⊥⊥ T i | X i ∀ t ∈ T . (1)This assumption is called the conditional independence assumption (CIA, Lechner, 2001)also known as the selection-on-observables assumption (Heckman and Robb, 1985) andrequires that the researcher observes all covariates X that simultaneously determine theselection into different treatment intensities as well as the outcome of interest. This ispotentially a very strong assumption and needs to be discussed on a case-by-case basis forthe application at hand. The second assumption requires there to be common support, i.e.the conditional density of treatment needs to be positive over T : f T | X ( T = t | X i ) > ∀ t ∈ T . (2)If the common support assumption is violated, the sample needs to be trimmed and theDRF is estimated on the subset of observations in order to avoid extrapolation (Crump et al. , 2009; Lechner and Strittmatter, 2019). Lastly, one needs to assume the so-calledstable-unit treatment value assumption (SUTVA, see Rubin, 1980), requiring that eachindividual’s outcome only depends on their own level of treatment intensity. Essentially,this rules out general equilibrium and spill-over effects of treatment (see Imbens andWooldridge, 2009; Manski, 2013, for examples).While not the focus of this paper, the estimation of effects of continuous treatmentsmay be combined with other identification approaches than selection-on-observables. Forexample, methods described may be applied to the estimation of treatment effects in aconditional Difference-in-Differences setting (Abadie, 2005) where all units are affectedby some natural experiment to a different degree. Alternatively, estimating DRFs of con-tinuous instrumental variables (Angrist et al. , 1996) that are only valid conditional oncovariates is likely to expand knowledge about which units are actually induced to receivesome treatment by the instrument. Moreover, the outcome model also needs to be specified correctly, calling for flexible functional formswhen estimating the DRF parametrically or using non-parametric techniques to model this relationship. Re-weighting) Methods based on the Generalized Propensity Score
Non-parametric estimation of DRFs by comparing outcomes of individuals with exactly thesame set of X but different T quickly becomes infeasible with growing dimension of X .To avoid this curse of dimensionality, Rosenbaum and Rubin (1983) show for the binarytreatment case that it is sufficient to condition on the scalar propensity score instead of themultidimensional vector X in order to control for confounding due to observed covariates.Similarly, Hirano and Imbens (2004) show that the conditional independence assumptionalso holds by conditioning on the generalized propensity score (GPS) R = f T | X ( T | X ),i.e. the conditional density of the treatment intensity evaluated at T and X . In order toestimate the GPS, Hirano and Imbens (2004) assume the treatment follows a normal dis-tribution and perform an Ordinary Least Squares (OLS) regression of T on X and obtainthe GPS as ˆ R i = 1 √ π ˆ σ exp (cid:26) − σ ( T i − ˆ β (cid:48) X i ) (cid:27) , (3)where ˆ β is the regression coefficient vector and ˆ σ is the standard error of the disturbanceterm. Based on this estimated GPS, Hirano and Imbens (2004) advocate estimating theDRF by controlling for the GPS via flexible parametric regression. As conditioning on the correctly specified
GPS balances X across different levels of T only in expectation, (iteratively) checking covariate balance is a necessary step in theestimation of DRFs. Hirano and Imbens (2004) suggest to do this by conducting GPS-adjusted t -tests on the equality of means across strata in T . As this can be somewhatcumbersome and the stratification may lead to information loss (Austin, 2019), a differ-ent strand of literature followed the idea of estimating balancing weights instead. Theseweights allow to directly assess the resulting balancing quality by comparing (absolute)Pearson correlations in the raw data and in the re-weighted sample. One such approach– originating from inverse probability weighting (IPW, Horvitz and Thompson, 1952) – isprovided by Robins et al. (2000) who generalize IPW and show that weights defined as w i = f T ( T i ) f T | X ( T i | X i ) (4) Software is provided by Bia and Mattei (2008) in the doseresponse ado-package for STATA. The sub-classification approach by Imai and van Dyk (2004) faces similar issues and is therefore notdiscussed at this point. et al. (2000) estimate (un-)conditional densities f T ( T ) and f T | X ( T | X ) based on OLS regressions and the normalityassumption. With the goal of avoiding iterative balance-checking, re-specification and estimation ofthe GPS, Fong et al. (2018) generalize the covariate balancing propensity score (CBPS)methodology of Imai and Ratkovic (2014) to include continuous treatments. For their para-metric approach – henceforth covariate balancing generalized propensity score (CBGPS) –they derive the parametric structure of balancing weights based on the normality assump-tion. Parameters are estimated using the generalized method of moments by minimizingsquared Pearson correlations between the treatment and covariates in the re-weightedsample. Moreover, Fong et al. (2018) also provide a non-parametric version – denoted asnpCBGPS for the remainder of the paper. This approach obviates the need to specify aparametric structure for balancing weights by maximizing the empirical likelihood (Owen,1990; Qin and Lawless, 1994) subject to imbalance constraints. The imbalance constraintsare chosen to allow for some finite sample imbalance in order to improve the convergenceproperties of the proposed algorithm. Compared to the CBGPS, the non-parametric ap-proach is likely to come with a computational cost which may be quite substantial fordatasets with a large number of observations and/or covariates that need to be balanced.Another non-parametric approach for the estimation of balancing weights is providedby Zhu et al. (2015). Their approach is based on machine-learning techniques and adaptsgeneralized boosted models (GBM) of Mccaffrey et al. (2005) to the context of continuoustreatments. GBM uses a boosting algorithm to estimate the GPS, plugging resulting es-timates into equation (4). Balance in terms of absolute Pearson correlations is optimizedvia the number of regression trees grown. As will become clear in the remainder of the paper, existing automated balancingapproaches based on the GPS mostly improve upon balancing quality relative to IPW.However, the re-weighting procedures tend not to achieve satisfactory balance when selec- For an assessment of the performance of other IPW methods for continuous exposures with differentdistribution assumptions and estimation approaches, see Naimi et al. (2014). Huffman and van Gameren (2018) further generalize the CBGPS approach of Fong et al. (2018) toallow for time-varying interventions. The pre-specified degree of imbalance in these constraints is left as a tuning parameter for the re-searcher. For the purpose of this paper, the tuning parameter will be left at its pre-specified level. For simulations, the maximum number of titerations is set to 20,000 with a shrinkage of 0.05%.
The original entropy balancing (EB) method by Hainmueller (2012) is a non-parametricpre-processing tool to estimate balancing weights for binary treatments, i.e. it re-weightscontrol units to exactly match pre-specified covariate moments of the treatment group.The convex nature of the optimization problem solved by EB guarantees excellent balanc-ing properties of resulting weights. Moreover, Zhao and Percival (2017) show that EB isdoubly-robust (Robins and Rotnitzky, 1995) and that it reaches the semi-parametric effi-ciency bound derived by Hahn (1998). These properties make EB an attractive candidatefor the extension to the context of continuous treatments in order to improve upon exist-ing balancing approaches. The remainder of this section introduces the proposed entropybalancing for continuous treatments (EBCT) approach.
Entropy Balancing for Continuous Treatments
For notational convenience, assumethat the treatment intensity and covariates are standardized to mean zero. Furthermore,define the column vector g ( T i , X i ) = [ T i , X Ti , T i X Ti ] T . The EBCT method aims to solvethe following constrained minimization problem:min w H ( w ) = N (cid:88) i =1 h ( w i ) s.t. N (cid:88) i =1 w i g ( T i , X i ) = 0 N (cid:88) i =1 w i = 1 w i > ∀ i (5)EBCT minimizes the loss function H ( w ) subject to the balancing constraints and the nor-malizing constraints that weights have to sum up to one and be strictly positive. Weightsthat satisfy (5) retain unconditional means of covariates as well as the treatment variableand most importantly, they purge the treatment variable from its correlation with covari-ates. The inclusion of higher-order or interaction terms in the list of covariates allows the7esearcher to achieve balance not just regarding the mean of covariates, but also regardinghigher and cross-moments. Compared to the original EB method, the optimization prob-lem (5) differs in terms of balancing constraints imposed and the set of units for whichbalancing weights are being estimated. Essentially, EBCT re-weights all units to achieve zero correlations between the treatment variable and covariates. Implementation
In oder to implement the EBCT approach, one needs to decide uponthe loss function H ( w ). Following Hainmueller (2012), this paper uses a loss function basedon the Kullback (1959) entropy metric h ( w i ) = w i ln ( w i /q i ), where q i are some base weightschosen by the analyst. If no base weights are specified, uniform weights q i = 1 /N ∀ i are used. This implies that EBCT chooses balancing weights such that they differ aslittle as possible from baseline weights while achieving zero correlation in the re-weightedsample. Notice that the loss function attains a minimum at w i = q i ∀ i and is undefinedfor non-positive weights. The latter property allows to drop the positivity constraint onweights, reducing the optimization problem to one with only equality constrains. Usingthe Lagrange method, the constrained optimization can be re-written as an unconstrainedoptimization asmin w,λ,γ L ( w, λ, γ ) = N (cid:88) i =1 w i ln ( w i /q i ) − λ (cid:40) N (cid:88) i =1 w i − (cid:41) − γ T (cid:40) N (cid:88) i =1 w i g ( T i , X i ) (cid:41) , (6)where λ and γ are Lagrange-multipliers on the constraints. As ∂ L /∂w i > w i > w i , the optimization problem (6) has a globalminimum if the constraints are consistent (Boyd and Vandenberghe, 2004, chapter 5). Inorder to reduce the dimensionality of the optimization problem, the implied structure ofbalancing weights is obtained by re-arranging the first-order condition ∂ L /∂w i = 0 andplugging the result into the condition ∂ L /∂λ = 0. This yields the weighting function interms of the Lagrange-multipliers γ , q i and g ( T i , X i ) as w i = q i exp (cid:8) γ T g ( T i , X i ) (cid:9)(cid:80) Ni =1 q i exp { γ T g ( T i , X i ) } , (7)where λ has been cancelled out. Hence, weights implied by EBCT are a log-linear function8f a linear index containing covariates, the treatment intensity and their cross-products.Substituting this expression into the Lagrange function yields the dual L d as L d ( γ ) = − ln (cid:32) N (cid:88) i =1 q i exp (cid:8) γ T g ( T i , X i ) (cid:9)(cid:33) . (8)Differentiating L d with respect to γ yields the 2 K + 1 first-order conditions in 2 K + 1unknowns (cid:40) N (cid:88) i =1 exp (cid:8) γ ∗ T g ( T i , X i ) (cid:9) g ( T i , X i ) (cid:80) Ni =1 exp (cid:8) γ ∗ T g ( T i , X i ) (cid:9) (cid:41) = 0 , (9)where γ ∗ refer to the multiplier values at the optimum. As equations (9) are non-linear inthose multipliers, they have to be solved for numerically. This is done using a quasi-Newtonoptimization approach. Due to the convexity of the optimization problem, the algorithmtends to converge much faster than GBM and npCBGPS, especially in large datasets. Once values for γ ∗ are obtained, balancing weights are backed out using (7) for subsequentanalysis. As noted by Hainmueller (2012), optimization can be performed iteratively tolimit the influence of units with potentially extreme weights. To do so, the researcherestimates EBCT weights and truncates excessive weights beyond some threshold, e.g. 4%as suggested by Imbens (2004). Then, the estimation is repeated with truncated weightsas base weights. Resulting weights still lead to finite sample balance but display smallermaximum weights. In this section, the finite sample properties of weighting approaches in terms of balancingoutcomes as well as resulting effect estimates are compared using Monte-Carlo simula-tions. In general, the simulation design is chosen to mimic relevant features of datasetsencountered in empirical practice and is similar in spirit to the design by Hainmueller Note that first-order conditions have been multiplied by -1. In comparison, finding the optimum of (6)requires choosing N + 2( K + 1) parameters in total. Computation times for the estimation of balancing weights using the approaches described are givenin section 4. Because the Hirano and Imbens (2004) and the Imai and van Dyk (2004) procedures do not allow todirectly assess covariate balance in terms of correlations, their approaches are excluded from the compar-ison. See also Austin (2018) for additional evidence on finite sample performance of existing estimatorsbased on the GPS. N = 200 ,
500 and 1 , R = 1 ,
000 independent replications are performed.As simulation results are quite similar across sample sizes, only results for N = 200 arepresented in the main text. Additional results for larger sample sizes can be found inAppendix A. Simulation Design
Each simulated dataset consists of ten (partially correlated) co-variates X , ..., X entering the selection equation: X ∼ U [0 , X ∼ χ , X to X are binary indicators based on one underlying standard normal variable with cut-offs of( −∞ ,-1], (-1, 0] and (0,1] , X ∼ B ( p = 0 .
5) and X to X are jointly standard normalwith covariance of 0.2. The treatment equation is specified as T i = X i + 0 . X i + 1 . X i + X i + 0 . X i + X i +0 . X i + 0 . X i + 0 . X i + 0 . X i + σε i , (10)where ε is a standard normal error term and σ is the scale parameter governing the stan-dard deviation of the composite error term. Based on this equation, moderate selectivityinto treatment is generated with σ = 4 and strong selection is obtained by setting σ = 2. To investigate the robustness of the estimation approaches to mis-specification of the se-lection equation, three different specifications for the estimation of balancing weights areassumed: • Specification 1: ˆ E [ T | X ] = α + (cid:80) k α k X k • Specification 2: ˆ E [ T | X ] = α + (cid:80) k / ∈{ , } α k X k + α √ X + α X The interval (1,+ ∞ ) serves as a reference category with a coefficient of zero. While these labels are obviously quite arbitrary, values of σ have been chosen to roughly mirror theselectivity patterns of the empirical applications presented in section 4. Specification 3: ˆ E [ T | X ] = α + (cid:80) k / ∈{ , , } α k X k + α √ X + α X + α X .Specification one is correctly specified and is thus expected to yield the lowest bias androot mean squared error (RMSE). Specification two introduces mild mis-specification viamis-measurement of X and X . Finally, specification three further increases bias by falselyomitting X from the specification, replacing it by X instead.The outcome equation is modeled as Y i = ( X i + X i ) η + X i + X i + X i + T i + ξ i , (11)where ξ ∼ N (0 , η . Threescenarios are considered. In outcome design one, η is set to one, yielding a linear specifi-cation of Y in X . For mild deviations from linearity and additivity in covariates in designtwo, η = 1 .
25 is used and moderate non-linearity/additivity is obtained by setting η = 1 . Balancing Quality
First and foremost, the different estimation procedures aim to bal-ance covariates across different treatment intensities. Hence, an important criterion re-garding the empirical performance of these procedures is the degree to which they actuallydeliver finite sample balance. Simulation results on the distribution of balancing qualityindicators for both the moderate and strong selection into treatment under correct speci-fication can be found in the left and the right panel of Figure 1, respectively. Results forthe mis-specified cases are not presented as they yield the same conclusions. In the spiritof Diamond and Sekhon (2013), the largest absolute Pearson correlation coefficient is usedas a balancing indicator, putting more focus on the least balanced covariate instead of theaverage balancing quality, recognizing that even small imbalances may lead to substantial11ias if the covariate is a strong predictor of the outcome.[Insert Figure 1 about here]When selection into treatment is moderate, i.e. initial maximum absolute Pearson corre-lations are around 35%, all methods tend to improve balancing in the re-weighted datasetto some degree. IPW tends to result in highly variable balancing quality and sometimeseven leads to an increase in imbalance. Similar results are obtained for the GBM approach.The parametric CBGPS reduces maximum correlations much closer towards zero but stilldisplays substantial variability in balancing outcomes. The non-parametric CBGPS ap-proach outperforms its parametric counterpart by consistently delivering correlations nearzero in the re-weighted data when selection into treatment is moderate. However, whenselection into treatment is strong, i.e. when initial absolute correlation attain a maximumof around 45%, the simulation results show that now even npCBGPS yields more variablebalancing quality, frequently surpassing the 0.1 rule-of-thumb threshold proposed by Zhu et al. (2015). While balancing quality tends to deteriorate with initial imbalance for theother approaches, EBCT effectively eliminates correlations in the re-weighted simulationdata independent of the magnitude of initial correlations.
Distribution of Balancing Weights
While the balancing quality of weights is cer-tainly an important criterion, it may be the case that finite sample balance comes at thecost of overly large weights for just a few units when estimating treatment effects. This islikely to substantially reduce the performance of resulting estimates and should be avoided(Robins and Wang, 2000; Kang and Schafer, 2007). To provide some evidence of the per-formance of the balancing approaches in this regard, Figure 2 displays the distributionof the maximum weight share held by a single unit across all simulations with correctlyspecified balancing weights, again split by the degree of selection into treatment. [Insert Figure 2 about here]The results suggest that maximum weight shares for IPW, the parametric and the non-parametric CBGPS as well as GBM are fairly similar most of the time, although IPW, The graph only shows maximum weights up to a value of 40% in order to enhance visibility of theweight distribution. Especially IPW and GBM often lead to larger weight shares in the case of strongselection into treatment. Similar to the analysis of balancing quality, an analysis based on the mis-specified E [ T | X ] gives rise to the same conclusions. Bias and Root Mean Squared Error
Turning to the finite sample properties of effectestimates based on the re-weighting approaches, Table 1 compares Monte-Carlo results onabsolute bias and the root mean squared error (RMSE).[Insert Table 1 about here]As expected, mis-specification of selection equation generally increases bias and RMSEfor all estimators. Similarly, increases in the degree of non-linearity/additivity in the out-come equation tend to lead to larger bias and RMSE. Regarding the individual perfor-mance of estimators, IPW, the parametric CBGPS and GBM tend to display the highestabsolute bias and RMSE. The non-parametric CBGPS reduces bias and RMSE comparedto its parametric counterpart in all simulation scenarios. However, when faced with strongselection into treatment the npCBGPS yields biased estimates even when the selectionequation is correctly specified. If the outcome is sufficiently non-linear and non-additivein covariates, bias is substantial. EBCT on the other hand consistently delivers essentiallyunbiased estimates even when selection into treatment is strong as long as all relevantand correctly measured covariates are included in the specification of balancing weights.Moreover, EBCT yields the lowest RMSE across all simulation scenarios independent ofwhether the selection equation is correctly specified or not.
In this section, the EBCT methodology and comparison methods are applied to the esti-mation of dose-response functions using real data from two well-known examples on thesize of lottery winnings and labor earnings as well as smoking intensity and medical expen-ditures. An additional empirical example on the evaluation of a place-based developmentsubsidy is presented in Appendix B. Note that the analysis in this section remains agnos-tic about the validity of the conditional independence assumption and hence, estimates13re only interpreted as conditional associations. For simplicity, outcome regressions areperformed via weighted least squares regression based on the estimated balancing weightsusing a cubic polynomial in the respective treatment variable. From these estimates, thedose-response functions E [ Y i ( t )] and their derivatives dE [ Y i ( t )] /dt are obtained. Stan-dard errors are estimated using 1,000 bootstrap replications (Efron and Tibshirani, 1986;MacKinnon, 2006). Lottery Winnings and Earnings
First, the association between the size of lotterywinnings and subsequent labor market earnings is re-analyzed using the Hirano and Imbens(2004) survey data on Megabucks lottery winners in Massachusetts from the mid-1980s. The dataset contains information on the prize amount measured in $1,000, labor earnings six years after winning the lottery, as well as some covariates ( age , winning year , workingstatus when winning the lottery, years of high school , years of college , an indicator for being male , the number of tickets bought and previous earnings in the years one to six prior towinning). While the prize amount is randomly assigned, survey and item non-responselead to non-zero correlations of covariates with the treatment variable. The estimationsample consists of N = 201 lottery winners. To make the normality assumption usedby IPW and CBGPS more credible, the treatment variable T is log(prize amount) . Thesame specification as used in Hirano and Imbens (2004) is employed to estimate balancingweights. An overview of Pearson correlations as well as mean absolute correlations beforeand after weighting can be found in Table 2.[Insert Table 2 about here]Raw Pearson correlations range in absolute value from close to zero for the numberof tickets bought to almost 30% in the case of the male indicator. Correlation coefficientsbased on the re-weighted samples show that all balancing approaches lead to substantialimprovements in overall covariate balance as indicated by the mean absolute correlation.The non-parametric CBGPS and EBCT perform best, delivering essentially perfect finite The data were originally analyzed by Imbens et al. (2001). Compared to Hirano and Imbens (2004), a complete case analysis is performed, i.e. individuals withmissing data on subsequent labor market earnings were dropped. Moreover, one individual with much lowerlottery winnings than the rest of the sample was excluded. Computation times to obtain weights are far below one second for IPW, CBPS, and EBCT. ThenpCBGPS (GBM) algorithm takes about 3.5 (6.5) seconds to converge. All computations were performedon computer with an 2,7 GHz Intel Core i5 CPU and 8 GB 1600 MHz DDR3 RAM. years in high school and winning year , it leads to an increase in imbalanceafter weighting. Table 2 also provides the maximum weight assigned to a single individual,which ranges from 1.35% (EBCT) to 2.94% (GBM).[Insert Figure 3 about here]Figure 3 shows the estimated DRFs based on the different weighting approaches. Inaccordance to the results of Hirano and Imbens (2004), there is a clear downward-slopingrelationship between the prize amount and subsequent labor earnings . The general patternsobtained across methods are quite similar with slightly more variation in DRF estimates inthe tails of the prize distribution. However, derivatives of the DRFs are only significantlydifferent from zero at the 10% significance level (indicated by a +) for log(prize amount) in the range of 3 to 4.5 log-points. Hence, despite the differences, none of the slopes of theDRFs are statistically different from zero in the tails.
Smoking Intensity and Medical Expenditures
Next, the relationship between smok-ing intensity and medical expenditures is re-visited using data of Imai and van Dyk (2004),originally analyzed by Johnson et al. (2003). The data stem from the National MedicalExpenditure Survey 1987, covering current or previous cigarette smokers and their infor-mation on their smoking behavior, (validated) medical expenditures and some backgroundcharacteristics. Based on the data on past cigarette consumption, the number of pack years smoked (= smoking duration in years · cigarette packs smoked per day ) is generated asa measure of smoking intensity. Available background characteristics are the continuous( starting ) age , a male indicator and categorical variables on race , seatbelt usage , educa-tion , marital status , census region of residence and poverty status . For the estimation ofbalancing weights, squared and cubed terms in ( starting ) age are also included in thespecification. The treatment variable is again taken in logarithmic terms to reduce theskewness of the distribution and make the normality assumption by IPW and CBGPSmore plausible. The estimation sample consists of N = 9 ,
408 individuals. An overview Computation times to obtain weights are, in ascending order: 0.8 seconds (EBCT), 3.5 seconds (IPW),8.5 seconds (CBPS), 4.5 minutes (GBM) and 15 minutes (npCBGPS). Compared to the analysis by Imai and van Dyk (2004), individuals below the 1st percentile andindividuals above the 99th percentile of the pack years distribution are dropped as the density of smokersin this region of the distribution is extremely small. Moreover, individuals above the 99th percentile of themedical expenditure distribution are dropped to reduce problems with extreme outliers.
15f (mean absolute) Pearson correlations before and after weighting is given by Table 3.[Insert Table 3 about here]Before weighting there is a substantial absolute correlation between log(pack years) andthe polynomials in age (38 to 47%). Correlations with the other covariates are smaller inmagnitude but often substantive nonetheless. Compared to the unweighted sample, IPWincreases the mean absolute correlation between covariates and the smoking intensity from12% to 16%, leading to higher correlations in magnitude for starting age , the male indi-cator, seatbelt usage , region of residence and income . Clearly, this is a very unfavorablebalancing outcome. The parametric CBGPS reduces correlations for almost all variables orleads to slight increases in correlations for variables that were initially almost completelyuncorrelated with the treatment. Its non-parametric counterpart is about as successfulin reducing absolute correlations for covariates that were initially heavily related to thetreatment. However, it leads to larger increases in imbalance for variables with lower initialcorrelations. For example, the correlation between the male indicator and the treatmentincreases from 14% to 20% after weighting using the npCBGPS. GBM is almost as ef-fective in alleviating correlations between covariates and the treatment as the parametricCBGPS. Only the absolute correlation between the male indicator and the treatment of14% remains above the 0.1 threshold suggested by Zhu et al. (2015). In contrast to theother weighting approaches, EBCT yields perfect finite sample balance in terms of corre-lations also in this application. Moreover, it also produces the smallest maximum weightshares with 0.41% compared to 0.65% (GBM), 6.6% (npCBGPS), IPW (7.1%) and almost13% (CBGPS). Hence, especially the parametric CBGPS method achieves better balanceonly by allowing for relatively extreme weights in this setting.[Insert Figure 4 about here]Estimated DRFs are plotted in Figure 4. The estimates suggest a slightly negativerelationship between smoking intensity medical expenditures up to about 2.5 log(packyears) , after which there is a clear increase in medical expenditures with increased smok-ing intensity. Again, estimated DRFs differ mostly in the tails of the treatment variabledistribution. However, differences are much larger in this application, especially for low lev-els of the smoking intensity where some estimates suggest significant non-zero derivativesof the DRF. 16 Conclusions
This paper extends the entropy balancing methodology for the estimation of covariate bal-ancing weights to the context of continuous treatments. Owing to its flexibility and globallyconvex optimization problem, EBCT achieves superior finite sample balance compared toother re-weighting methods based on the GPS. In fact, EBCT eradicates correlations be-tween covariates and the continuous treatment variable even when selection into treatmentis strong. At the same time, EBCT effectively avoids assigning extreme weights to singleunits. Extensive Monte-Carlo simulations show that effect estimates using EBCT displaysimilar or lower bias and uniformly lower RMSE compared to the other weighting meth-ods considered. All in all, these properties make the proposed EBCT method an attractiveapproach for the estimation of dose-response functions.17 eferences
Abadie, A. (2005). Semiparametric difference-in-differences estimators.
Review of Eco-nomic Studies , (1), 1–19. Angrist, J. D. , Imbens, G. W. and
Rubin, D. B. (1996). Identification of causal effectsusing instrumental variables.
Journal of the American Statistical Association , (434),444–472. Austin, P. (2018). Assessing the performance of the generalized propensity score for esti-mating the effect of quantitative or continuous exposures on binary outcomes.
Statisticsin Medicine , (1), 1874–1894. Austin, P. C. (2019). Assessing covariate balance when using the generalized propensityscore with quantitative or continuous exposures.
Statistical Methods in Medical Research , (5), 1365–1377, pMID: 29415624. Becker, S. O. , Egger, P. H. and von Ehrlich, M. (2012). Too much of a good thing?On the growth effects of the EU’s regional policy.
European Economic Review , (4),648–668. Bia, M. and
Mattei, A. (2008). A stata package for the estimation of the dose-responsefunction through adjustment for the generalized propensity score.
Stata Journal , (3),354–373(20). — and — (2012). Assessing the effect of the amount of financial aids to Piedmont firmsusing the generalized propensity score. Statistical Methods & Applications , (4), 485–516. Boyd, S. and
Vandenberghe, L. (2004).
Convex Optimization . Cambridge UniversityPress.
Choe, C. , Flores-Lagunes, A. and
Lee, S.-J. (2015). Do dropouts with longer train-ing exposure benefit from training programs? korean evidence employing methods forcontinuous treatments.
Empirical Economics , (2), 849–881. Crump, R. , Hotz, V. J. , Imbens, G. W. and
Mitnik, O. A. (2009). Dealing withlimited overlap in estimation of average treatment effects.
Biometrika , (1), 187–199. Diamond, A. and
Sekhon, J. S. (2013). Genetic matching for estimating causal effects:A general multivariate matching method for achieving balance in observational studies.
Review of Economics and Statistics , (3), 932–945. Efron, B. and
Tibshirani, R. (1986). Bootstrap methods for standard errors, confidenceintervals, and other measures of statistical accuracy.
Statistical Science , (1), 54–75. Flores, C. A. , Flores-Lagunes, A. , Gonzalez, A. and
Neumann, T. C. (2012).Estimating the effects of length of exposure to instruction in a training program: Thecase of Job Corps.
Review of Economics and Statistics , (1), 153–171. Fong, C. , Hazlett, C. and
Imai, K. (2018). Covariate balancing propensity score for acontinuous treatment: Application to the efficacy of political advertisements.
The Annalsof Applied Statistics , (1), 156–177. Greifer, N. (2020).
WeightIt: Weighting for Covariate Balance in Observational Studies .R package version 0.9.0.
Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimationof average treatment effects.
Econometrica , (2), 315–331.18 ainmueller, J. (2012). Entropy balancing for causal effects: A multivariate reweightingmethod to produce balanced samples in observational studies. Political Analysis , (1),25–46. Heckman, J. J. and
Robb, R. (1985). Alternative methods for evaluating the impact ofinterventions: An overview.
Journal of Econometrics , (1), 239–267. Hirano, K. and
Imbens, G. W. (2004). The propensity score with continuous treat-ments. In A. Gelman and X.-L. Meng (eds.),
Applied Bayesian Modeling and CausalInference from Incomplete-Data Perspectives: An Essential Journey with Donald Ru-bin’s Statistical Family , Chichester, UK: John Wiley & Sons, Ltd, pp. 73–84.
Horvitz, D. G. and
Thompson, D. J. (1952). A generalization of sampling withoutreplacement from a finite universe.
Journal of the American Statistical Association , (260), 663–685. Huffman, C. and van Gameren, E. (2018). Covariate balancing inverse probabilityweights for time-varying continuous interventions.
Journal of Causal Inference , (2),1–17. Imai, K. and
Ratkovic, M. (2014). Covariate balancing propensity score.
Journal of theRoyal Statistical Society: Series B (Statistical Methodology) , (1), 243–263. — and van Dyk, D. A. (2004). Causal inference with general treatment regimes: Gener-alizing the propensity score. Journal of the American Statistical Association , (467),854–866. Imbens, G. (2004). Nonparametric Estimation of Average Treatment Effects under Exo-geneity: A Review.
The Review of Economics and Statistics , (1), 4–29. Imbens, G. W. (2000). The role of the propensity score in estimating dose-responsefunctions.
Biometrika , (3), 706–710. — , Rubin, D. B. and
Sacerdote, B. I. (2001). Estimating the effect of unearned incomeon labor earnings, savings, and consumption: Evidence from a survey of lottery players.
American Economic Review , (4), 778–794. — and Wooldridge, J. M. (2009). Recent developments in the econometrics of programevaluation.
Journal of Economic Literature , (1), 5–86. Johnson, E. , Dominici, F. , Griswold, M. and
Zeger, S. L. (2003). Disease casesand their medical costs attributable to smoking: An analysis of the national medicalexpenditure survey.
Journal of Econometrics , (1), 135 – 151. Kang, J. D. Y. and
Schafer, J. L. (2007). Demystifying double robustness: A compar-ison of alternative strategies for estimating a population mean from incomplete data.
Statistical Science , (4), 523–539. Kluve, J. , Schneider, H. , Uhlendorff, A. and
Zhao, Z. (2012). Evaluating con-tinuous training programs using the generalized propensity score.
Journal of the RoyalStatistical Society: Series A (Statistics in Society) , (2), 587–617. Kullback, S. (1959).
Information theory and statistics . Chichester, UK: John Wiley &Sons, Ltd.
Lechner, M. (2001). Identification and estimation of causal effects of multiple treatmentsunder the conditional independence assumption. In M. Lechner and F. Pfeiffer (eds.),
Econometric Evaluation of Labour Market Policies , Heidelberg: Physica-Verlag HD, pp.43–58. — and Strittmatter, A. (2019). Practical procedures to deal with common supportproblems in matching estimation.
Econometric Reviews , (2), 193–207.19 acKinnon, J. G. (2006). Bootstrap methods in econometrics. Economic Record , ,2–18. Manski, C. F. (2013). Identification of treatment response with social interactions.
Econometrics Journal , (1), 1–23. Mccaffrey, D. , Ridgeway, G. and
Morral, A. (2005). Propensity score estimationwith boosted regression for evaluating causal effects in observational studies.
Psycho-logical methods , , 403–25. Mitze, T. , Paloyo, A. R. and
Alecke, B. (2015). Is there a purchase limit on re-gional growth? A quasi-experimental evaluation of investment grants using matchingtechniques.
International Regional Science Review , (4), 388–412. Naimi, A. , Moodie, E. , Auger, N. and
Kaufman, J. (2014). Constructing inverseprobability weights for continuous exposures - A comparison of methods.
Epidemoiology , (1), 292–299. Owen, A. (1990). Empirical likelihood ratio confidence regions.
The Annals of Statistics , (1), 90–120. Qin, J. and
Lawless, J. (1994). Empirical likelihood and general estimating equations.
The Annals of Statistics , (1), 300–325. Robins, J. , Hern´an, M. and
Brumback, B. (2000). Marginal structural models andcausal inference in epidemiology.
Epidemiology , (5), 550–560. Robins, J. M. and
Rotnitzky, A. (1995). Semiparametric efficiency in multivariateregression models with missing data.
Journal of the American Statistical Association , (429), 122–129. — and Wang, N. (2000). Inference for imputation estimators.
Biometrika , (1), 113–124. Rosenbaum, P. and
Rubin, D. (1983). The central role of the propensity score in obser-vational studies for causal effects.
Biometrika , (1), 41–55. Roy, A. D. (1951). Some thoughts on the distribution of earnings.
Oxford EconomicPapers , (2), 135–146. Rubin, D. (1974). Estimating Causal Effects of Treatments in Randomised and Nonran-domised Studies.
Journal of Educational Psychology , (5), 688–701. — (1980). Comment on Basu, D. - Randomization analysis of experimental data: TheFisher randomization test. Journal of the American Statistical Association , (371),591–593. Zhao, Q. and
Percival, D. (2017). Entropy balancing is doubly robust.
Journal ofCausal Inference , . Zhu, Y. , Coffman, D. L. and
Ghosh, D. (2015). A boosting algorithm for estimatinggeneralized propensity scores with continuous treatments.
Journal of Causal Inference , (1), 25–40. 20 ables and Figures Table 1: Simulation Results - Bias and Root Mean Squared Error ( N = 200) Moderate Selection σ = 4 Strong Selection σ = 2Degree of non-linearity/additivity None Mild Moderate None Mild Moderatein Y | X η = 1 η = 1 . η = 1 . η = 1 η = 1 . η = 1 . E [ T | X ]IPW 4.1 15.8 6.1 20.1 8.6 26.4 18.8 37.0 30.1 47.8 50.5 71.2CBGPS 5.8 13.8 9.4 16.6 14.2 21.7 25.4 32.8 42.5 48.0 70.8 77.1npCBGPS 0.6 12.9 0.9 13.0 0.7 13.2 4.5 31.8 6.7 33.0 11.5 35.9GBM 9.5 15.6 14.9 21.6 24.2 31.8 30.9 42.1 55.3 70.4 101.0 132.1EBCT 0.4 11.9 0.6 11.9 0.3 12.1 0.6 30.8 0.6 31.8 1.0 31.6Specification 2: Mildly mis-specified E [ T | X ]IPW 7.6 16.5 10.1 20.5 14.8 29.7 28.3 41.2 40.4 53.2 61.4 78.9CBGPS 8.8 15.1 12.9 18.3 19.5 25.3 33.4 39.1 48.2 53.2 76.9 81.9npCBGPS 4.6 13.6 5.8 13.9 7.3 15.0 19.2 36.3 21.2 36.6 27.8 43.2GBM 10.6 16.6 16.4 22.0 26.4 34.0 35.3 43.6 54.9 67.6 96.8 119.3EBCT 4.7 12.9 5.7 13.1 6.9 14.3 18.2 34.0 18.8 33.9 22.1 37.3Specification 3: Strongly mis-specified E [ T | X ]IPW 19.9 24.7 33.6 39.3 60.8 71.1 62.2 71.4 113.5 128.6 220.9 259.8CBGPS 19.0 22.5 32.2 35.9 57.5 62.3 54.3 58.5 94.6 99.4 173.7 183.6npCBGPS 17.9 21.9 30.8 34.5 56.0 61.2 53.9 60.3 94.3 100.8 169.0 179.1GBM 19.9 23.7 34.0 37.9 60.2 66.8 57.6 63.2 99.7 107.4 189.9 208.1EBCT 17.8 21.3 30.7 34.0 55.8 60.5 54.0 59.5 93.4 99.1 168.3 177.4 Note: This table shows absolute bias and root mean squared error (RMSE) measured in percent of the truetreatment effect from the 18 Monte-Carlo simulation scenarios for N = 200. For each scenario, R = 1 , Y on the treatment intensity T . Re-weighting approaches employed are inverse probability weightingestimated via OLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensityscores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) as well asthe novel entropy balancing for continuous treatments (EBCT). (Weighted) Corr( log(prize amount) , X k )Covariate Unweighted IPW CBGPS npCBGPS GBM EBCTAge 0.21 0.06 0.02 0.00 0.11 0.00Years in high school -0.03 0.00 0.00 0.00 -0.07 0.00Years in college 0.04 0.05 0.03 0.00 0.01 0.00Male 0.29 0.07 0.03 0.00 0.12 0.00Tickets bought 0.03 0.01 0.00 0.00 0.02 0.00Working then 0.07 0.02 0.02 0.00 0.03 0.00Winning year 0.03 -0.01 0.01 0.00 0.10 0.00Earnings year-1 0.13 0.01 0.02 0.00 -0.02 0.00Earnings year-2 0.19 0.02 0.02 0.00 0.00 0.00Earnings year-3 0.21 0.02 0.02 0.00 0.01 0.00Earnings year-4 0.21 0.02 0.03 0.00 0.03 0.00Earnings year-5 0.16 0.02 0.02 0.00 0.00 0.00Earnings year-6 0.17 0.06 0.04 0.00 0.02 0.00Mean absolute correlation 0.14 0.03 0.02 0.00 0.04 0.00Maximum weight in % 1.68 1.68 2.39 2.94 1.35 Note: The table shows Pearson correlations between the treatment variable t = log(prize amount) and covariates in the raw sample of Hirano and Imbens (2004) as well as in the re-weighed samples.Re-weighting approaches employed are inverse probability weighting estimated via OLS (IPW, seeRobins et al. , 2000), (non-) parametric covariate balancing generalized propensity scores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) as wellas the novel entropy balancing for continuous treatments (EBCT). (Weighted) Corr( log(pack years) , X k )Covariate X k Unweighted IPW CBGPS npCBGPS GBM EBCTStarting ageLinear -0.14 0.20 -0.07 -0.06 -0.08 0.00Squared -0.13 0.19 -0.04 -0.05 -0.07 0.00Cubed -0.11 0.16 -0.02 -0.04 -0.06 0.00AgeLinear 0.47 -0.34 0.06 0.07 0.01 0.00Squared 0.43 -0.32 0.06 0.07 0.01 0.00Cubed 0.38 -0.29 0.05 0.08 0.01 0.00Male 0.14 0.23 0.04 0.20 0.14 0.00RaceBlack -0.14 0.19 -0.01 -0.06 -0.01 0.00Other 0.19 -0.24 0.02 -0.10 0.03 0.00Seatbelt usageSometimes -0.01 0.2 -0.03 -0.03 0.05 0.00Often -0.04 -0.18 0.05 0.02 -0.11 0.00EducationHigh school 0.01 -0.04 0.02 0.10 0.06 0.00Some college -0.04 -0.02 -0.03 -0.01 0.04 0.00College degree 0.11 -0.03 0.00 -0.10 -0.01 0.00Other 0.11 -0.18 -0.01 0.02 -0.02 0.00Marital statusWidowed 0.04 0.07 -0.02 0.13 0.05 0.00Separated -0.02 0.11 -0.02 -0.03 0.04 0.00Never married -0.26 0.17 0.07 -0.09 -0.01 0.00Census regionMid-west 0.02 0.16 -0.03 0.07 0.05 0.00South -0.02 -0.20 0.05 -0.07 -0.08 0.00West -0.01 0.15 -0.02 -0.02 0.01 0.00Poverty statusPoor -0.02 -0.03 -0.02 0.10 -0.01 0.00Low income -0.01 -0.04 -0.01 -0.09 0.00 0.00Middle income 0.00 -0.15 -0.03 0.10 -0.01 0.00High income 0.03 0.18 -0.02 -0.02 0.00 0.00Mean absolute correlation 0.12 0.16 0.03 0.08 0.04 0.00Maximum weight in % 7.1 12.9 6.6 0.7 0.4
Note: The table shows absolute Pearson correlations between the treatment variable t = ln(packyears) and covariates in the raw sample of Imai and van Dyk (2004) as well as in the re-weighedsamples. Re-weighting approaches employed are inverse probability weighting estimated via OLS(IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensity scores(np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) aswell as the novel entropy balancing for continuous treatments (EBCT). lllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll M a x i m u m A b s o l u t e P ea r s on C o rr e l a t i on 00 . . . . . U n w e i gh t ed I P W C B G PS np C B G PS G B M EB C T Moderate Selection lllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll M a x i m u m A b s o l u t e P ea r s on C o rr e l a t i on 00 . . . . . U n w e i gh t ed I P W C B G PS np C B G PS G B M EB C T Strong Selection
Note: This graph plots the distribution of the maximum absolute correlation coefficients be-tween the treatment intensity T and the covariates X for Monte-Carlo simulation designs withcorrectly specified E [ T | X ], split by the degree of selection into treatment, both in the rawsample ( unweighted ) as well as in re-weighed samples. Re-weighting approaches employed are in-verse probability weighting estimated via OLS (IPW, see Robins et al. , 2000), (non-) parametriccovariate balancing generalized propensity scores (np-/CBGPS, see Fong et al. , 2018), general-ized boosted modeling (GBM, see Zhu et al. , 2015) as well as the novel entropy balancing forcontinuous treatments (EBCT). llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll M a x i m u m W e i gh t S ha r e i n % I P W C B G PS np C B G PS G B M EB C T Moderate Selection llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll M a x i m u m W e i gh t S ha r e i n % I P W C B G PS np C B G PS G B M EB C T Strong Selection
Note: This graph plots the distribution of the maximum weight shares for Monte-Carlo simu-lation designs with correctly specified E [ T | X ], split by the degree of selection into treatmentin re-weighed samples. Re-weighting approaches employed are inverse probability weighting esti-mated via OLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalizedpropensity scores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, seeZhu et al. , 2015) as well as the novel entropy balancing for continuous treatments (EBCT). t = Log(Prize amount) E [ E a r n i ng s ( t ) ] IPW CBGPS npCBGPS GBM EBCT
Note: This graph shows the estimated dose-response function (DRF) between the log(prizeamount) and subsequent labor earnings based on a weighted least squares regression using a cubicspecification. Re-weighting approaches employed are inverse probability weighting estimated viaOLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensityscores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. ,2015) as well as the novel entropy balancing for continuous treatments (EBCT). Values of theDRF are marked with a + if its derivative is significantly different from zero at least at the 10%level based on bootstrapped standard errors obtained using R = 1 ,
000 replications. t = Log(Pack years) E [ M ed i c a l e x pend i t u r e ( t ) ] IPW CBGPS npCBGPS GBM EBCT
Note: This graph shows the estimated dose-response function(DRF) between the log(pack years) and medical expenditures based on a weighted least squares regression using a cubic specification.Re-weighting approaches employed are inverse probability weighting estimated via OLS (IPW,see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensity scores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) aswell as the novel entropy balancing for continuous treatments (EBCT). Values of the DRF aremarked with a + if its derivative is significantly different from zero at least at the 10% level basedon bootstrapped standard errors obtained using R = 1 ,
000 replications. ppendix A Table A.1: Simulation Results - Bias and Root Mean Squared Error ( N = 500) Moderate Selection σ = 4 Strong Selection σ = 2Degree of non-linearity/additivity None Mild Moderate None Mild Moderatein Y | X η = 1 η = 1 . η = 1 . η = 1 η = 1 . η = 1 . E [ T | X ]IPW 1.7 12.3 4.5 15.2 5.4 22.3 16.5 32.3 26.6 42.0 46.4 64.6CBGPS 2.5 9.1 5.1 10.9 7.4 14.1 22.5 28.2 36.3 40.5 61.4 66.3npCBGPS 0.1 9.0 0.5 9.3 0.4 9.3 2.5 25.7 3.3 26.1 6.0 27.2GBM 4.7 10.3 9.0 14.0 13.5 19.5 21.5 30.9 36.8 49.3 64.2 79.1EBCT 0.0 7.4 0.6 7.7 0.1 7.5 0.6 19.3 0.1 18.9 0.4 20.2Specification 2: Mildly mis-specified E [ T | X ]IPW 6.6 12.7 8.3 16.6 11.0 22.0 25.7 36.1 35.8 48.0 55.7 68.4CBGPS 6.7 10.8 8.6 12.7 12.5 16.8 30.1 34.1 43.8 47.8 69.3 73.1npCBGPS 4.9 10.0 5.8 10.7 7.0 11.4 16.7 28.4 19.5 30.1 24.5 33.4GBM 8.2 11.9 11.5 15.3 16.7 21.1 28.5 35.3 42.1 50.5 66.5 79.3EBCT 5.2 9.0 5.6 9.4 6.6 9.9 17.2 25.0 19.7 26.7 22.7 28.9Specification 3: Strongly mis-specified E [ T | X ]IPW 19.7 22.5 34.5 37.4 62.0 72.5 71.6 80.2 128.8 142.8 263.8 306.0CBGPS 18.7 20.5 32.7 34.6 57.4 60.7 59.2 62.0 104.9 108.5 198.7 207.7npCBGPS 18.5 20.5 32.1 34.1 56.7 60.3 56.6 59.9 100.2 103.8 186.5 193.8GBM 19.5 21.7 33.7 35.9 59.6 64.1 63.1 67.1 113.9 121.2 217.3 236.7EBCT 18.3 19.8 31.7 33.2 55.9 58.3 56.1 58.3 97.3 99.8 179.5 184.7 Note: This table shows absolute bias and root mean squared error (RMSE) measured in percent of the true treatmenteffect from the 18 Monte-Carlo simulation scenarios for N = 500. For each scenario, R = 1 ,
000 independentreplications are performed. Effect estimates are obtained through weighted least squares regression of the outcome Y on the treatment intensity T . Re-weighting approaches employed are inverse probability weighting estimatedvia OLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensity scores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) as well as the novelentropy balancing for continuous treatments (EBCT). N = 1 , Moderate Selection σ = 4 Strong Selection σ = 2Degree of non-linearity/additivity None Mild Moderate None Mild Moderatein Y | X η = 1 η = 1 . η = 1 . η = 1 η = 1 . η = 1 . E [ T | X ]IPW 1.1 10.6 2.4 12.1 4.9 16.8 13.7 26.8 22.4 38.5 36.7 58.3CBGPS 1.1 6.9 1.7 7.3 3.7 9.5 18.6 23.7 31.6 36.0 52.8 57.3npCBGPS 0.1 7.5 0.1 7.4 0.8 7.8 1.0 22.8 2.7 22.8 4.0 26.0GBM 3.6 8.0 5.8 9.7 10.2 14.8 17.2 25.2 29.3 37.3 50.9 75.1EBCT 0.1 5.4 0.0 5.3 0.4 5.5 0.8 14.3 0.2 14.3 0.6 14.4Specification 2: Mildly mis-specified E [ T | X ]IPW 6.0 10.1 7.2 13.6 9.8 23.3 24.8 33.5 33.8 42.5 49.3 65.0CBGPS 5.7 8.4 6.8 9.5 8.8 11.9 27.8 31.3 39.9 42.9 61.5 65.0npCBGPS 5.2 8.9 5.6 9.1 6.7 10.1 16.7 26.8 19.4 27.7 23.2 31.6GBM 7.1 9.9 9.4 11.8 12.8 16.2 26.3 31.7 35.3 40.1 53.1 62.8EBCT 5.1 7.3 5.6 7.7 6.4 8.4 17.5 21.8 19.8 23.5 22.0 25.7Specification 3: Strongly mis-specified E [ T | X ]IPW 19.7 21.3 34.5 36.9 61.7 66.4 73.5 80.8 146.0 162.1 288.1 326.2CBGPS 18.8 19.8 33.0 34.1 58.7 60.5 62.6 65.0 115.3 118.9 215.0 222.8npCBGPS 18.9 20.2 32.7 34.2 58.2 60.4 58.1 60.9 105.8 109.0 190.5 196.5GBM 19.5 20.5 33.4 34.9 60.2 62.9 65.8 69.1 122.4 128.2 240.0 255.4EBCT 18.5 19.2 31.8 32.6 56.4 57.6 56.7 58.1 101.2 102.8 181.0 184.5 Note: This table shows absolute bias and root mean squared error (RMSE) measured in percent of the true treatmenteffect from the 18 Monte-Carlo simulation scenarios for N = 1 , R = 1 ,
000 independentreplications are performed. Effect estimates are obtained through weighted least squares regression of the outcome Y on the treatment intensity T . Re-weighting approaches employed are inverse probability weighting estimatedvia OLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensity scores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) as well as the novelentropy balancing for continuous treatments (EBCT). ppendix B: Additional Application This appendix provides an additional empirical application of the EBCT methodologyusing data kindly provided by Mitze et al. (2015). They evaluate the effects of the largestGerman place-based regional development subsidy (RDS) program (“Gemeinschaftsauf-gabe Verbesserung der regionalen Wirtschaftsstruktur”) on regional productivity growth.Since German re-unification in 1990, the program granted subsidies in excess of e
60 billionto lagging regions.The data stem from the Federal statistics agency and they are measured at the county-level. The dataset contains information on counties’ subsidy receipt per capita , regionalproductivity growth and several covariates ( lagged productivity , lagged labor productivitygrowth , lagged employment , lagged employment growth , local investment intensity , averagefirm size , turnover from exports , human capital , population density , a net-migration indi-cator , an urban indicator , information on settlement structure and time dummies ). Theperiod of observation covers the years 1993-2008. All variables are measured in 3-year in-tervals. In total, there are 869 treated county-period observations. The treatment variablehas been Box-Cox transformed with λ ≈ .
15 to reduce its skewness. For the estimationof balancing weights, the same specification as in Mitze et al. (2015) is used.Table B.1 displays the correlations between covariates and the treatment variable be-fore and after weighting. Before weighting, there is a substantial negative correlation ofabout -0.6 between lagged labor productivity and treatment. Associations of similar mag-nitude but of opposite sign are given for lagged productivity growth and human capital endowment. This suggests that highly subsidized regions would have performed betterthan other regions even without the subsidy due to catch up growth. Neglecting these dif-ferences, or failing to achieve balance, is thus likely to overstate the effects of the subsidy onregional development. To adjust for this divergence in pre-treatment characteristics, bal-ancing weights are estimated. All approaches reduce mean absolute correlations betweencovariates and the treatment variable. However, only EBCT can reduce these correlationsto zero. For all other approaches, there remain sizable correlations, especially with respectto lagged labor productivity ( growth ). This lack of balance was also documented by Mitze et al. (2015) in their original analysis using the Hirano-Imbens approach. Similar to the Computation times are, in ascending order, 0.05 seconds (IPW), 0.25 seconds (CBGPS), 0.6 seconds(EBCT), 22 seconds (GBM) and 1.6 minutes (npCBGPS). (Weighted) Corr((
Subsidy amount λ − λ , X k )Covariate X k Unweighted IPW CBGPS npCBGPS GBM EBCTLog(lagged labor productivity) -0.62 -0.26 -0.27 -0.06 -0.51 0.00Lagged labor prod. growth 0.52 0.33 0.34 0.31 0.11 0.00Log(lagged employment) -0.05 0.00 0.00 -0.08 -0.02 0.00Lagged employment growth -0.06 -0.02 -0.06 -0.16 0.13 0.00Log(investment intensity) 0.50 0.19 0.21 0.06 0.18 0.00Log(average firm size) -0.37 -0.19 -0.11 -0.02 -0.14 0.00Log(foreign turnover) -0.40 -0.14 -0.11 -0.04 -0.22 0.00Log(share manufacturing sector) -0.42 -0.25 -0.16 -0.01 -0.23 0.00Log(human capital) 0.52 0.27 0.20 0.03 0.25 0.00Log(population density) -0.16 -0.05 -0.06 0.06 0.09 0.00Net-migration indicator -0.29 -0.2 -0.15 -0.10 -0.14 0.00Urban indicator 0.07 -0.02 0.01 0.08 -0.15 0.00Settlement structure 0.21 0.01 0.04 0.07 -0.10 0.00time dummy 1 0.01 0.05 0.04 -0.10 0.10 0.00time dummy 2 0.06 0.07 0.05 -0.03 0.12 0.00time dummy 3 -0.01 -0.02 -0.02 -0.04 0.07 0.00time dummy 4 -0.05 0.02 -0.01 0.04 -0.17 0.00Mean Absolute Correlation 0.25 0.12 0.11 0.08 0.16 0.00Maximum weight in % 5.92 17.5 28.7 9.35 5.44
Note: The table shows absolute Pearson correlations between the treatment variable t =( Subsidyamount λ − λ with λ = and covariates in the raw sample of Mitze et al. (2015) as well as in there-weighed samples. Re-weighting approaches employed are inverse probability weighting estimated viaOLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensity scores(np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) as wellas the novel entropy balancing for continuous treatments (EBCT). Estimated DRFs are plotted in Figure B.1. Most estimates suggest a positive relationshipbetween the subsidy and regional productivity growth for medium values of the treatmentintensity. IPW weights yield unrealistic estimates with excessively large gains from thesubsidy in the upper tail of the distribution. While point estimates are relatively similaracross regressions based on (np-)CBGPS, GBM and EBCT, none of the derivatives of theDRF using EBCT are statistically significant at the 10% level. This contrasts with thefindings of Mitze et al. (2015) who report significantly positive derivatives of the DRFfor treatment intensities around the center of the distribution. As EBCT estimates arethe most credible due to their superior balancing quality, one should be skeptical of thevalidity of these findings. 31igure B.1: Empirical Application – Regional Development t = (Subsidy amount l −1) / l E [ P r odu c t i v i t y g r o w t h i n % ( t ) ] IPW CBGPS npCBGPS GBM EBCT
Note: This graph shows the estimated dose-response function(DRF) between the t =( Subsidyamount λ − λ with λ = and productivity growth based on a weighted least squares regressionusing a cubic specification. Re-weighting approaches employed are inverse probability weightingestimated via OLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing general-ized propensity scores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM,see Zhu et al. , 2015) as well as the novel entropy balancing for continuous treatments (EBCT).Values of the DRF are marked with a + if its derivative is significantly different from zero at leastat the 10% level based on bootstrapped standard errors obtained using R = 1 ,
000 replications.000 replications.