[PDF] Entropy Balancing for Continuous Treatments

Abstract

This paper introduces entropy balancing for continuous treatments (EBCT) by extending the original entropy balancing methodology of Hainmüller (2012). In order to estimate balancing weights, the proposed approach solves a globally convex constrained optimization problem. EBCT weights reliably eradicate Pearson correlations between covariates and the continuous treatment variable. This is the case even when other methods based on the generalized propensity score tend to yield insufficient balance due to strong selection into different treatment intensities. Moreover, the optimization procedure is more successful in avoiding extreme weights attached to a single unit. Extensive Monte-Carlo simulations show that treatment effect estimates using EBCT display similar or lower bias and uniformly lower root mean squared error. These properties make EBCT an attractive method for the evaluation of continuous treatments.

Full PDF

EEntropy Balancing for Continuous Treatments ∗ Stefan T¨ubbicke † Discussion Paper

This version: May 29, 2020

Abstract

Interest in evaluating the eﬀects of continuous treatments has been on the rise recently.To facilitate the estimation of causal eﬀects in this setting, the present paper intro-duces entropy balancing for continuous treatments (EBCT) by extending the originalentropy balancing methodology of Hainm¨uller (2012). In order to estimate balanc-ing weights, the proposed approach solves a globally convex constrained optimizationproblem, allowing for computationally eﬃcient implementation. EBCT weights reli-ably eradicate Pearson correlations between covariates and the continuous treatmentvariable. This is the case even when other methods based on the generalized propensityscore tend to yield insuﬃcient balance due to strong selection into diﬀerent treatmentintensities. Moreover, the optimization procedure is more successful in avoiding ex-treme weights attached to a single unit. Extensive Monte-Carlo simulations show thattreatment eﬀect estimates using EBCT display similar or lower bias and uniformlylower root mean squared error. These properties make EBCT an attractive methodfor the evaluation of continuous treatments. Software implementation is available forStata and R.

Keywords:

Balancing weights, Continuous Treatment, Monte-Carlo simulation, Ob-servational studies

JEL codes:

C14, C21, C87 ∗ The author would like to thank Marco Caliendo, Guido Imbens, Martin Lange, Cosima Obst and SylviRzepka as well as participants of the 2019 annual conference of the European Economic Association forhelpful comments. † University of Potsdam, e-mail: [email protected] . -Corresponding address: University of Potsdam, Chair of Empirical Economics, August-Bebel-Str. 89,14482 Potsdam, Germany. Tel: +49 331 977 3781. Fax: +49 331 977 3210. a r X i v : . [ ec on . E M ] M a y Introduction

Methods for balancing covariate distributions have become an essential in the tool-kit tocontrol for confounding due to observed covariates. While binary treatments are the mostcommon case encountered in practice, situations in which all units receive some treatmentwith diﬀerent intensity or dose are also pervasive in economics and other disciplines. Theevaluation of such continuous treatments has gained more attention recently. Examplesinclude the evaluation of job training programs with varying duration (Choe et al. , 2015;Flores et al. , 2012; Kluve et al. , 2012) and subsidies of diﬀerent magnitude to ﬁrms orentire regions (Becker et al. , 2012; Bia and Mattei, 2012; Mitze et al. , 2015). Similar to thebinary case, many covariate balancing methods based on the generalized propensity score(GPS, Imbens, 2000) require an iterative estimation procedure until satisfactory balance isachieved. This is due to the fact that the GPS balances covariates only asymptotically (seeHirano and Imbens, 2004; Imai and van Dyk, 2004; Robins et al. , 2000). Therefore, recentdevelopments such as the covariate balancing generalized propensity score (CBGPS, Fong et al. , 2018) and the generalized boosted modeling approach (GBM, Zhu et al. , 2015) aimto simplify the estimation process by means of algorithmic optimization.This paper makes three main contributions to this literature. First, it extends thenon-parametric entropy balancing approach by Hainmueller (2012) for the estimation ofbalancing weights from the binary treatment framework to the context of continuous treat-ments. Similar to the original approach, the entropy balancing for continuous treatments(EBCT) algorithm solves a globally convex optimization problem. Balancing weights areobtained by minimizing the deviation from (uniform) base weights subject to zero correla-tion and normalization constraints. The convex nature of the optimization approach allowsfor eﬃcient software implementation, converging much faster than other non-parametricbalancing methods considered. To facilitate application of the method, software implemen-tation for Stata is provided by the author in the

EBCT ado-package. Implementation for R is also available through the WeightIt package (Greifer, 2020). Second, the paper showsthat the proposed EBCT method delivers superior ﬁnite sample balance in terms of corre-lations between the treatment variable and covariates in comparison to other re-weightingapproaches based on the GPS. In fact, EBCT consistently delivers perfect balance evenwhen other methods tend to fail in this regard due to relatively strong selection into treat- To install the package, type “ssc install EBCT, replace” in the command window.

To analyze causal eﬀects in the context of continuous treatments, it is useful to discuss mat-ters in terms of the potential outcomes framework, mainly attributed to Roy (1951) andRubin (1974). Following the notation of Imbens (2000) and Hirano and Imbens (2004),let us assume that we observe an i.i.d. sample of N individuals i with a vector of pre-treatment covariates X i ∈ R K , where K is the number of covariates. Furthermore, wehave information on a post-treatment outcome Y i and some treatment received with acertain intensity measured by T i with possible values T . The potential outcomes are givenby Y i ( t ) – also often called the unit-dose response – denoting the outcome that wouldhave been observed had the unit received treatment with intensity t . Aggregating theseunit-level responses leads to the dose-response function (DRF) E [ Y i ( t )]. Along with itsderivative dE [ Y i ( t )] /dt , the DRF represents the key relationship to be estimated in prac-tice. If treatment intensities were randomly assigned, comparisons of average outcomesbetween individuals with diﬀerent treatment intensities would directly give consistent es-timates of these quantities. However, as this is mostly not the case even in experimentalsettings, the following three identifying assumptions need to be invoked in order to obtain3onsistent estimates in observational studies. Identifying Assumptions

First, conditional on observed pre-treatment covariates X ,potential outcomes must be independent of the treatment intensity received, i.e. Y i ( t ) ⊥⊥ T i | X i ∀ t ∈ T . (1)This assumption is called the conditional independence assumption (CIA, Lechner, 2001)also known as the selection-on-observables assumption (Heckman and Robb, 1985) andrequires that the researcher observes all covariates X that simultaneously determine theselection into diﬀerent treatment intensities as well as the outcome of interest. This ispotentially a very strong assumption and needs to be discussed on a case-by-case basis forthe application at hand. The second assumption requires there to be common support, i.e.the conditional density of treatment needs to be positive over T : f T | X ( T = t | X i ) > ∀ t ∈ T . (2)If the common support assumption is violated, the sample needs to be trimmed and theDRF is estimated on the subset of observations in order to avoid extrapolation (Crump et al. , 2009; Lechner and Strittmatter, 2019). Lastly, one needs to assume the so-calledstable-unit treatment value assumption (SUTVA, see Rubin, 1980), requiring that eachindividual’s outcome only depends on their own level of treatment intensity. Essentially,this rules out general equilibrium and spill-over eﬀects of treatment (see Imbens andWooldridge, 2009; Manski, 2013, for examples).While not the focus of this paper, the estimation of eﬀects of continuous treatmentsmay be combined with other identiﬁcation approaches than selection-on-observables. Forexample, methods described may be applied to the estimation of treatment eﬀects in aconditional Diﬀerence-in-Diﬀerences setting (Abadie, 2005) where all units are aﬀectedby some natural experiment to a diﬀerent degree. Alternatively, estimating DRFs of con-tinuous instrumental variables (Angrist et al. , 1996) that are only valid conditional oncovariates is likely to expand knowledge about which units are actually induced to receivesome treatment by the instrument. Moreover, the outcome model also needs to be speciﬁed correctly, calling for ﬂexible functional formswhen estimating the DRF parametrically or using non-parametric techniques to model this relationship. Re-weighting) Methods based on the Generalized Propensity Score

Non-parametric estimation of DRFs by comparing outcomes of individuals with exactly thesame set of X but diﬀerent T quickly becomes infeasible with growing dimension of X .To avoid this curse of dimensionality, Rosenbaum and Rubin (1983) show for the binarytreatment case that it is suﬃcient to condition on the scalar propensity score instead of themultidimensional vector X in order to control for confounding due to observed covariates.Similarly, Hirano and Imbens (2004) show that the conditional independence assumptionalso holds by conditioning on the generalized propensity score (GPS) R = f T | X ( T | X ),i.e. the conditional density of the treatment intensity evaluated at T and X . In order toestimate the GPS, Hirano and Imbens (2004) assume the treatment follows a normal dis-tribution and perform an Ordinary Least Squares (OLS) regression of T on X and obtainthe GPS as ˆ R i = 1 √ π ˆ σ exp (cid:26) − σ ( T i − ˆ β (cid:48) X i ) (cid:27) , (3)where ˆ β is the regression coeﬃcient vector and ˆ σ is the standard error of the disturbanceterm. Based on this estimated GPS, Hirano and Imbens (2004) advocate estimating theDRF by controlling for the GPS via ﬂexible parametric regression. As conditioning on the correctly speciﬁed

GPS balances X across diﬀerent levels of T only in expectation, (iteratively) checking covariate balance is a necessary step in theestimation of DRFs. Hirano and Imbens (2004) suggest to do this by conducting GPS-adjusted t -tests on the equality of means across strata in T . As this can be somewhatcumbersome and the stratiﬁcation may lead to information loss (Austin, 2019), a diﬀer-ent strand of literature followed the idea of estimating balancing weights instead. Theseweights allow to directly assess the resulting balancing quality by comparing (absolute)Pearson correlations in the raw data and in the re-weighted sample. One such approach– originating from inverse probability weighting (IPW, Horvitz and Thompson, 1952) – isprovided by Robins et al. (2000) who generalize IPW and show that weights deﬁned as w i = f T ( T i ) f T | X ( T i | X i ) (4) Software is provided by Bia and Mattei (2008) in the doseresponse ado-package for STATA. The sub-classiﬁcation approach by Imai and van Dyk (2004) faces similar issues and is therefore notdiscussed at this point. et al. (2000) estimate (un-)conditional densities f T ( T ) and f T | X ( T | X ) based on OLS regressions and the normalityassumption. With the goal of avoiding iterative balance-checking, re-speciﬁcation and estimation ofthe GPS, Fong et al. (2018) generalize the covariate balancing propensity score (CBPS)methodology of Imai and Ratkovic (2014) to include continuous treatments. For their para-metric approach – henceforth covariate balancing generalized propensity score (CBGPS) –they derive the parametric structure of balancing weights based on the normality assump-tion. Parameters are estimated using the generalized method of moments by minimizingsquared Pearson correlations between the treatment and covariates in the re-weightedsample. Moreover, Fong et al. (2018) also provide a non-parametric version – denoted asnpCBGPS for the remainder of the paper. This approach obviates the need to specify aparametric structure for balancing weights by maximizing the empirical likelihood (Owen,1990; Qin and Lawless, 1994) subject to imbalance constraints. The imbalance constraintsare chosen to allow for some ﬁnite sample imbalance in order to improve the convergenceproperties of the proposed algorithm. Compared to the CBGPS, the non-parametric ap-proach is likely to come with a computational cost which may be quite substantial fordatasets with a large number of observations and/or covariates that need to be balanced.Another non-parametric approach for the estimation of balancing weights is providedby Zhu et al. (2015). Their approach is based on machine-learning techniques and adaptsgeneralized boosted models (GBM) of Mccaﬀrey et al. (2005) to the context of continuoustreatments. GBM uses a boosting algorithm to estimate the GPS, plugging resulting es-timates into equation (4). Balance in terms of absolute Pearson correlations is optimizedvia the number of regression trees grown. As will become clear in the remainder of the paper, existing automated balancingapproaches based on the GPS mostly improve upon balancing quality relative to IPW.However, the re-weighting procedures tend not to achieve satisfactory balance when selec- For an assessment of the performance of other IPW methods for continuous exposures with diﬀerentdistribution assumptions and estimation approaches, see Naimi et al. (2014). Huﬀman and van Gameren (2018) further generalize the CBGPS approach of Fong et al. (2018) toallow for time-varying interventions. The pre-speciﬁed degree of imbalance in these constraints is left as a tuning parameter for the re-searcher. For the purpose of this paper, the tuning parameter will be left at its pre-speciﬁed level. For simulations, the maximum number of titerations is set to 20,000 with a shrinkage of 0.05%.

The original entropy balancing (EB) method by Hainmueller (2012) is a non-parametricpre-processing tool to estimate balancing weights for binary treatments, i.e. it re-weightscontrol units to exactly match pre-speciﬁed covariate moments of the treatment group.The convex nature of the optimization problem solved by EB guarantees excellent balanc-ing properties of resulting weights. Moreover, Zhao and Percival (2017) show that EB isdoubly-robust (Robins and Rotnitzky, 1995) and that it reaches the semi-parametric eﬃ-ciency bound derived by Hahn (1998). These properties make EB an attractive candidatefor the extension to the context of continuous treatments in order to improve upon exist-ing balancing approaches. The remainder of this section introduces the proposed entropybalancing for continuous treatments (EBCT) approach.

Entropy Balancing for Continuous Treatments

For notational convenience, assumethat the treatment intensity and covariates are standardized to mean zero. Furthermore,deﬁne the column vector g ( T i , X i ) = [ T i , X Ti , T i X Ti ] T . The EBCT method aims to solvethe following constrained minimization problem:min w H ( w ) = N (cid:88) i =1 h ( w i ) s.t. N (cid:88) i =1 w i g ( T i , X i ) = 0 N (cid:88) i =1 w i = 1 w i > ∀ i (5)EBCT minimizes the loss function H ( w ) subject to the balancing constraints and the nor-malizing constraints that weights have to sum up to one and be strictly positive. Weightsthat satisfy (5) retain unconditional means of covariates as well as the treatment variableand most importantly, they purge the treatment variable from its correlation with covari-ates. The inclusion of higher-order or interaction terms in the list of covariates allows the7esearcher to achieve balance not just regarding the mean of covariates, but also regardinghigher and cross-moments. Compared to the original EB method, the optimization prob-lem (5) diﬀers in terms of balancing constraints imposed and the set of units for whichbalancing weights are being estimated. Essentially, EBCT re-weights all units to achieve zero correlations between the treatment variable and covariates. Implementation

In oder to implement the EBCT approach, one needs to decide uponthe loss function H ( w ). Following Hainmueller (2012), this paper uses a loss function basedon the Kullback (1959) entropy metric h ( w i ) = w i ln ( w i /q i ), where q i are some base weightschosen by the analyst. If no base weights are speciﬁed, uniform weights q i = 1 /N ∀ i are used. This implies that EBCT chooses balancing weights such that they diﬀer aslittle as possible from baseline weights while achieving zero correlation in the re-weightedsample. Notice that the loss function attains a minimum at w i = q i ∀ i and is undeﬁnedfor non-positive weights. The latter property allows to drop the positivity constraint onweights, reducing the optimization problem to one with only equality constrains. Usingthe Lagrange method, the constrained optimization can be re-written as an unconstrainedoptimization asmin w,λ,γ L ( w, λ, γ ) = N (cid:88) i =1 w i ln ( w i /q i ) − λ (cid:40) N (cid:88) i =1 w i − (cid:41) − γ T (cid:40) N (cid:88) i =1 w i g ( T i , X i ) (cid:41) , (6)where λ and γ are Lagrange-multipliers on the constraints. As ∂ L /∂w i > w i > w i , the optimization problem (6) has a globalminimum if the constraints are consistent (Boyd and Vandenberghe, 2004, chapter 5). Inorder to reduce the dimensionality of the optimization problem, the implied structure ofbalancing weights is obtained by re-arranging the ﬁrst-order condition ∂ L /∂w i = 0 andplugging the result into the condition ∂ L /∂λ = 0. This yields the weighting function interms of the Lagrange-multipliers γ , q i and g ( T i , X i ) as w i = q i exp (cid:8) γ T g ( T i , X i ) (cid:9)(cid:80) Ni =1 q i exp { γ T g ( T i , X i ) } , (7)where λ has been cancelled out. Hence, weights implied by EBCT are a log-linear function8f a linear index containing covariates, the treatment intensity and their cross-products.Substituting this expression into the Lagrange function yields the dual L d as L d ( γ ) = − ln (cid:32) N (cid:88) i =1 q i exp (cid:8) γ T g ( T i , X i ) (cid:9)(cid:33) . (8)Diﬀerentiating L d with respect to γ yields the 2 K + 1 ﬁrst-order conditions in 2 K + 1unknowns (cid:40) N (cid:88) i =1 exp (cid:8) γ ∗ T g ( T i , X i ) (cid:9) g ( T i , X i ) (cid:80) Ni =1 exp (cid:8) γ ∗ T g ( T i , X i ) (cid:9) (cid:41) = 0 , (9)where γ ∗ refer to the multiplier values at the optimum. As equations (9) are non-linear inthose multipliers, they have to be solved for numerically. This is done using a quasi-Newtonoptimization approach. Due to the convexity of the optimization problem, the algorithmtends to converge much faster than GBM and npCBGPS, especially in large datasets. Once values for γ ∗ are obtained, balancing weights are backed out using (7) for subsequentanalysis. As noted by Hainmueller (2012), optimization can be performed iteratively tolimit the inﬂuence of units with potentially extreme weights. To do so, the researcherestimates EBCT weights and truncates excessive weights beyond some threshold, e.g. 4%as suggested by Imbens (2004). Then, the estimation is repeated with truncated weightsas base weights. Resulting weights still lead to ﬁnite sample balance but display smallermaximum weights. In this section, the ﬁnite sample properties of weighting approaches in terms of balancingoutcomes as well as resulting eﬀect estimates are compared using Monte-Carlo simula-tions. In general, the simulation design is chosen to mimic relevant features of datasetsencountered in empirical practice and is similar in spirit to the design by Hainmueller Note that ﬁrst-order conditions have been multiplied by -1. In comparison, ﬁnding the optimum of (6)requires choosing N + 2( K + 1) parameters in total. Computation times for the estimation of balancing weights using the approaches described are givenin section 4. Because the Hirano and Imbens (2004) and the Imai and van Dyk (2004) procedures do not allow todirectly assess covariate balance in terms of correlations, their approaches are excluded from the compar-ison. See also Austin (2018) for additional evidence on ﬁnite sample performance of existing estimatorsbased on the GPS. N = 200 ,

500 and 1 , R = 1 ,

000 independent replications are performed.As simulation results are quite similar across sample sizes, only results for N = 200 arepresented in the main text. Additional results for larger sample sizes can be found inAppendix A. Simulation Design

Each simulated dataset consists of ten (partially correlated) co-variates X , ..., X entering the selection equation: X ∼ U [0 , X ∼ χ , X to X are binary indicators based on one underlying standard normal variable with cut-oﬀs of( −∞ ,-1], (-1, 0] and (0,1] , X ∼ B ( p = 0 .

5) and X to X are jointly standard normalwith covariance of 0.2. The treatment equation is speciﬁed as T i = X i + 0 . X i + 1 . X i + X i + 0 . X i + X i +0 . X i + 0 . X i + 0 . X i + 0 . X i + σε i , (10)where ε is a standard normal error term and σ is the scale parameter governing the stan-dard deviation of the composite error term. Based on this equation, moderate selectivityinto treatment is generated with σ = 4 and strong selection is obtained by setting σ = 2. To investigate the robustness of the estimation approaches to mis-speciﬁcation of the se-lection equation, three diﬀerent speciﬁcations for the estimation of balancing weights areassumed: • Speciﬁcation 1: ˆ E [ T | X ] = α + (cid:80) k α k X k • Speciﬁcation 2: ˆ E [ T | X ] = α + (cid:80) k / ∈{ , } α k X k + α √ X + α X The interval (1,+ ∞ ) serves as a reference category with a coeﬃcient of zero. While these labels are obviously quite arbitrary, values of σ have been chosen to roughly mirror theselectivity patterns of the empirical applications presented in section 4. Speciﬁcation 3: ˆ E [ T | X ] = α + (cid:80) k / ∈{ , , } α k X k + α √ X + α X + α X .Speciﬁcation one is correctly speciﬁed and is thus expected to yield the lowest bias androot mean squared error (RMSE). Speciﬁcation two introduces mild mis-speciﬁcation viamis-measurement of X and X . Finally, speciﬁcation three further increases bias by falselyomitting X from the speciﬁcation, replacing it by X instead.The outcome equation is modeled as Y i = ( X i + X i ) η + X i + X i + X i + T i + ξ i , (11)where ξ ∼ N (0 , η . Threescenarios are considered. In outcome design one, η is set to one, yielding a linear speciﬁ-cation of Y in X . For mild deviations from linearity and additivity in covariates in designtwo, η = 1 .

25 is used and moderate non-linearity/additivity is obtained by setting η = 1 . Balancing Quality

First and foremost, the diﬀerent estimation procedures aim to bal-ance covariates across diﬀerent treatment intensities. Hence, an important criterion re-garding the empirical performance of these procedures is the degree to which they actuallydeliver ﬁnite sample balance. Simulation results on the distribution of balancing qualityindicators for both the moderate and strong selection into treatment under correct speci-ﬁcation can be found in the left and the right panel of Figure 1, respectively. Results forthe mis-speciﬁed cases are not presented as they yield the same conclusions. In the spiritof Diamond and Sekhon (2013), the largest absolute Pearson correlation coeﬃcient is usedas a balancing indicator, putting more focus on the least balanced covariate instead of theaverage balancing quality, recognizing that even small imbalances may lead to substantial11ias if the covariate is a strong predictor of the outcome.[Insert Figure 1 about here]When selection into treatment is moderate, i.e. initial maximum absolute Pearson corre-lations are around 35%, all methods tend to improve balancing in the re-weighted datasetto some degree. IPW tends to result in highly variable balancing quality and sometimeseven leads to an increase in imbalance. Similar results are obtained for the GBM approach.The parametric CBGPS reduces maximum correlations much closer towards zero but stilldisplays substantial variability in balancing outcomes. The non-parametric CBGPS ap-proach outperforms its parametric counterpart by consistently delivering correlations nearzero in the re-weighted data when selection into treatment is moderate. However, whenselection into treatment is strong, i.e. when initial absolute correlation attain a maximumof around 45%, the simulation results show that now even npCBGPS yields more variablebalancing quality, frequently surpassing the 0.1 rule-of-thumb threshold proposed by Zhu et al. (2015). While balancing quality tends to deteriorate with initial imbalance for theother approaches, EBCT eﬀectively eliminates correlations in the re-weighted simulationdata independent of the magnitude of initial correlations.

Distribution of Balancing Weights

While the balancing quality of weights is cer-tainly an important criterion, it may be the case that ﬁnite sample balance comes at thecost of overly large weights for just a few units when estimating treatment eﬀects. This islikely to substantially reduce the performance of resulting estimates and should be avoided(Robins and Wang, 2000; Kang and Schafer, 2007). To provide some evidence of the per-formance of the balancing approaches in this regard, Figure 2 displays the distributionof the maximum weight share held by a single unit across all simulations with correctlyspeciﬁed balancing weights, again split by the degree of selection into treatment. [Insert Figure 2 about here]The results suggest that maximum weight shares for IPW, the parametric and the non-parametric CBGPS as well as GBM are fairly similar most of the time, although IPW, The graph only shows maximum weights up to a value of 40% in order to enhance visibility of theweight distribution. Especially IPW and GBM often lead to larger weight shares in the case of strongselection into treatment. Similar to the analysis of balancing quality, an analysis based on the mis-speciﬁed E [ T | X ] gives rise to the same conclusions. Bias and Root Mean Squared Error

Turning to the ﬁnite sample properties of eﬀectestimates based on the re-weighting approaches, Table 1 compares Monte-Carlo results onabsolute bias and the root mean squared error (RMSE).[Insert Table 1 about here]As expected, mis-speciﬁcation of selection equation generally increases bias and RMSEfor all estimators. Similarly, increases in the degree of non-linearity/additivity in the out-come equation tend to lead to larger bias and RMSE. Regarding the individual perfor-mance of estimators, IPW, the parametric CBGPS and GBM tend to display the highestabsolute bias and RMSE. The non-parametric CBGPS reduces bias and RMSE comparedto its parametric counterpart in all simulation scenarios. However, when faced with strongselection into treatment the npCBGPS yields biased estimates even when the selectionequation is correctly speciﬁed. If the outcome is suﬃciently non-linear and non-additivein covariates, bias is substantial. EBCT on the other hand consistently delivers essentiallyunbiased estimates even when selection into treatment is strong as long as all relevantand correctly measured covariates are included in the speciﬁcation of balancing weights.Moreover, EBCT yields the lowest RMSE across all simulation scenarios independent ofwhether the selection equation is correctly speciﬁed or not.

In this section, the EBCT methodology and comparison methods are applied to the esti-mation of dose-response functions using real data from two well-known examples on thesize of lottery winnings and labor earnings as well as smoking intensity and medical expen-ditures. An additional empirical example on the evaluation of a place-based developmentsubsidy is presented in Appendix B. Note that the analysis in this section remains agnos-tic about the validity of the conditional independence assumption and hence, estimates13re only interpreted as conditional associations. For simplicity, outcome regressions areperformed via weighted least squares regression based on the estimated balancing weightsusing a cubic polynomial in the respective treatment variable. From these estimates, thedose-response functions E [ Y i ( t )] and their derivatives dE [ Y i ( t )] /dt are obtained. Stan-dard errors are estimated using 1,000 bootstrap replications (Efron and Tibshirani, 1986;MacKinnon, 2006). Lottery Winnings and Earnings

First, the association between the size of lotterywinnings and subsequent labor market earnings is re-analyzed using the Hirano and Imbens(2004) survey data on Megabucks lottery winners in Massachusetts from the mid-1980s. The dataset contains information on the prize amount measured in $1,000, labor earnings six years after winning the lottery, as well as some covariates ( age , winning year , workingstatus when winning the lottery, years of high school , years of college , an indicator for being male , the number of tickets bought and previous earnings in the years one to six prior towinning). While the prize amount is randomly assigned, survey and item non-responselead to non-zero correlations of covariates with the treatment variable. The estimationsample consists of N = 201 lottery winners. To make the normality assumption usedby IPW and CBGPS more credible, the treatment variable T is log(prize amount) . Thesame speciﬁcation as used in Hirano and Imbens (2004) is employed to estimate balancingweights. An overview of Pearson correlations as well as mean absolute correlations beforeand after weighting can be found in Table 2.[Insert Table 2 about here]Raw Pearson correlations range in absolute value from close to zero for the numberof tickets bought to almost 30% in the case of the male indicator. Correlation coeﬃcientsbased on the re-weighted samples show that all balancing approaches lead to substantialimprovements in overall covariate balance as indicated by the mean absolute correlation.The non-parametric CBGPS and EBCT perform best, delivering essentially perfect ﬁnite The data were originally analyzed by Imbens et al. (2001). Compared to Hirano and Imbens (2004), a complete case analysis is performed, i.e. individuals withmissing data on subsequent labor market earnings were dropped. Moreover, one individual with much lowerlottery winnings than the rest of the sample was excluded. Computation times to obtain weights are far below one second for IPW, CBPS, and EBCT. ThenpCBGPS (GBM) algorithm takes about 3.5 (6.5) seconds to converge. All computations were performedon computer with an 2,7 GHz Intel Core i5 CPU and 8 GB 1600 MHz DDR3 RAM. years in high school and winning year , it leads to an increase in imbalanceafter weighting. Table 2 also provides the maximum weight assigned to a single individual,which ranges from 1.35% (EBCT) to 2.94% (GBM).[Insert Figure 3 about here]Figure 3 shows the estimated DRFs based on the diﬀerent weighting approaches. Inaccordance to the results of Hirano and Imbens (2004), there is a clear downward-slopingrelationship between the prize amount and subsequent labor earnings . The general patternsobtained across methods are quite similar with slightly more variation in DRF estimates inthe tails of the prize distribution. However, derivatives of the DRFs are only signiﬁcantlydiﬀerent from zero at the 10% signiﬁcance level (indicated by a +) for log(prize amount) in the range of 3 to 4.5 log-points. Hence, despite the diﬀerences, none of the slopes of theDRFs are statistically diﬀerent from zero in the tails.

Smoking Intensity and Medical Expenditures

Next, the relationship between smok-ing intensity and medical expenditures is re-visited using data of Imai and van Dyk (2004),originally analyzed by Johnson et al. (2003). The data stem from the National MedicalExpenditure Survey 1987, covering current or previous cigarette smokers and their infor-mation on their smoking behavior, (validated) medical expenditures and some backgroundcharacteristics. Based on the data on past cigarette consumption, the number of pack years smoked (= smoking duration in years · cigarette packs smoked per day ) is generated asa measure of smoking intensity. Available background characteristics are the continuous( starting ) age , a male indicator and categorical variables on race , seatbelt usage , educa-tion , marital status , census region of residence and poverty status . For the estimation ofbalancing weights, squared and cubed terms in ( starting ) age are also included in thespeciﬁcation. The treatment variable is again taken in logarithmic terms to reduce theskewness of the distribution and make the normality assumption by IPW and CBGPSmore plausible. The estimation sample consists of N = 9 ,

408 individuals. An overview Computation times to obtain weights are, in ascending order: 0.8 seconds (EBCT), 3.5 seconds (IPW),8.5 seconds (CBPS), 4.5 minutes (GBM) and 15 minutes (npCBGPS). Compared to the analysis by Imai and van Dyk (2004), individuals below the 1st percentile andindividuals above the 99th percentile of the pack years distribution are dropped as the density of smokersin this region of the distribution is extremely small. Moreover, individuals above the 99th percentile of themedical expenditure distribution are dropped to reduce problems with extreme outliers.

15f (mean absolute) Pearson correlations before and after weighting is given by Table 3.[Insert Table 3 about here]Before weighting there is a substantial absolute correlation between log(pack years) andthe polynomials in age (38 to 47%). Correlations with the other covariates are smaller inmagnitude but often substantive nonetheless. Compared to the unweighted sample, IPWincreases the mean absolute correlation between covariates and the smoking intensity from12% to 16%, leading to higher correlations in magnitude for starting age , the male indi-cator, seatbelt usage , region of residence and income . Clearly, this is a very unfavorablebalancing outcome. The parametric CBGPS reduces correlations for almost all variables orleads to slight increases in correlations for variables that were initially almost completelyuncorrelated with the treatment. Its non-parametric counterpart is about as successfulin reducing absolute correlations for covariates that were initially heavily related to thetreatment. However, it leads to larger increases in imbalance for variables with lower initialcorrelations. For example, the correlation between the male indicator and the treatmentincreases from 14% to 20% after weighting using the npCBGPS. GBM is almost as ef-fective in alleviating correlations between covariates and the treatment as the parametricCBGPS. Only the absolute correlation between the male indicator and the treatment of14% remains above the 0.1 threshold suggested by Zhu et al. (2015). In contrast to theother weighting approaches, EBCT yields perfect ﬁnite sample balance in terms of corre-lations also in this application. Moreover, it also produces the smallest maximum weightshares with 0.41% compared to 0.65% (GBM), 6.6% (npCBGPS), IPW (7.1%) and almost13% (CBGPS). Hence, especially the parametric CBGPS method achieves better balanceonly by allowing for relatively extreme weights in this setting.[Insert Figure 4 about here]Estimated DRFs are plotted in Figure 4. The estimates suggest a slightly negativerelationship between smoking intensity medical expenditures up to about 2.5 log(packyears) , after which there is a clear increase in medical expenditures with increased smok-ing intensity. Again, estimated DRFs diﬀer mostly in the tails of the treatment variabledistribution. However, diﬀerences are much larger in this application, especially for low lev-els of the smoking intensity where some estimates suggest signiﬁcant non-zero derivativesof the DRF. 16 Conclusions

This paper extends the entropy balancing methodology for the estimation of covariate bal-ancing weights to the context of continuous treatments. Owing to its ﬂexibility and globallyconvex optimization problem, EBCT achieves superior ﬁnite sample balance compared toother re-weighting methods based on the GPS. In fact, EBCT eradicates correlations be-tween covariates and the continuous treatment variable even when selection into treatmentis strong. At the same time, EBCT eﬀectively avoids assigning extreme weights to singleunits. Extensive Monte-Carlo simulations show that eﬀect estimates using EBCT displaysimilar or lower bias and uniformly lower RMSE compared to the other weighting meth-ods considered. All in all, these properties make the proposed EBCT method an attractiveapproach for the estimation of dose-response functions.17 eferences

Abadie, A. (2005). Semiparametric diﬀerence-in-diﬀerences estimators.

Review of Eco-nomic Studies , (1), 1–19. Angrist, J. D. , Imbens, G. W. and

Rubin, D. B. (1996). Identiﬁcation of causal eﬀectsusing instrumental variables.

Journal of the American Statistical Association , (434),444–472. Austin, P. (2018). Assessing the performance of the generalized propensity score for esti-mating the eﬀect of quantitative or continuous exposures on binary outcomes.

Statisticsin Medicine , (1), 1874–1894. Austin, P. C. (2019). Assessing covariate balance when using the generalized propensityscore with quantitative or continuous exposures.

Statistical Methods in Medical Research , (5), 1365–1377, pMID: 29415624. Becker, S. O. , Egger, P. H. and von Ehrlich, M. (2012). Too much of a good thing?On the growth eﬀects of the EU’s regional policy.

European Economic Review , (4),648–668. Bia, M. and

Mattei, A. (2008). A stata package for the estimation of the dose-responsefunction through adjustment for the generalized propensity score.

Stata Journal , (3),354–373(20). — and — (2012). Assessing the eﬀect of the amount of ﬁnancial aids to Piedmont ﬁrmsusing the generalized propensity score. Statistical Methods & Applications , (4), 485–516. Boyd, S. and

Vandenberghe, L. (2004).

Convex Optimization . Cambridge UniversityPress.

Choe, C. , Flores-Lagunes, A. and

Lee, S.-J. (2015). Do dropouts with longer train-ing exposure beneﬁt from training programs? korean evidence employing methods forcontinuous treatments.

Empirical Economics , (2), 849–881. Crump, R. , Hotz, V. J. , Imbens, G. W. and

Mitnik, O. A. (2009). Dealing withlimited overlap in estimation of average treatment eﬀects.

Biometrika , (1), 187–199. Diamond, A. and

Sekhon, J. S. (2013). Genetic matching for estimating causal eﬀects:A general multivariate matching method for achieving balance in observational studies.

Review of Economics and Statistics , (3), 932–945. Efron, B. and

Tibshirani, R. (1986). Bootstrap methods for standard errors, conﬁdenceintervals, and other measures of statistical accuracy.

Statistical Science , (1), 54–75. Flores, C. A. , Flores-Lagunes, A. , Gonzalez, A. and

Neumann, T. C. (2012).Estimating the eﬀects of length of exposure to instruction in a training program: Thecase of Job Corps.

Review of Economics and Statistics , (1), 153–171. Fong, C. , Hazlett, C. and

Imai, K. (2018). Covariate balancing propensity score for acontinuous treatment: Application to the eﬃcacy of political advertisements.

The Annalsof Applied Statistics , (1), 156–177. Greifer, N. (2020).

WeightIt: Weighting for Covariate Balance in Observational Studies .R package version 0.9.0.

Hahn, J. (1998). On the role of the propensity score in eﬃcient semiparametric estimationof average treatment eﬀects.

Econometrica , (2), 315–331.18 ainmueller, J. (2012). Entropy balancing for causal eﬀects: A multivariate reweightingmethod to produce balanced samples in observational studies. Political Analysis , (1),25–46. Heckman, J. J. and

Robb, R. (1985). Alternative methods for evaluating the impact ofinterventions: An overview.

Journal of Econometrics , (1), 239–267. Hirano, K. and

Imbens, G. W. (2004). The propensity score with continuous treat-ments. In A. Gelman and X.-L. Meng (eds.),

Applied Bayesian Modeling and CausalInference from Incomplete-Data Perspectives: An Essential Journey with Donald Ru-bin’s Statistical Family , Chichester, UK: John Wiley & Sons, Ltd, pp. 73–84.

Horvitz, D. G. and

Thompson, D. J. (1952). A generalization of sampling withoutreplacement from a ﬁnite universe.

Journal of the American Statistical Association , (260), 663–685. Huffman, C. and van Gameren, E. (2018). Covariate balancing inverse probabilityweights for time-varying continuous interventions.

Journal of Causal Inference , (2),1–17. Imai, K. and

Ratkovic, M. (2014). Covariate balancing propensity score.

Journal of theRoyal Statistical Society: Series B (Statistical Methodology) , (1), 243–263. — and van Dyk, D. A. (2004). Causal inference with general treatment regimes: Gener-alizing the propensity score. Journal of the American Statistical Association , (467),854–866. Imbens, G. (2004). Nonparametric Estimation of Average Treatment Eﬀects under Exo-geneity: A Review.

The Review of Economics and Statistics , (1), 4–29. Imbens, G. W. (2000). The role of the propensity score in estimating dose-responsefunctions.

Biometrika , (3), 706–710. — , Rubin, D. B. and

Sacerdote, B. I. (2001). Estimating the eﬀect of unearned incomeon labor earnings, savings, and consumption: Evidence from a survey of lottery players.

American Economic Review , (4), 778–794. — and Wooldridge, J. M. (2009). Recent developments in the econometrics of programevaluation.

Journal of Economic Literature , (1), 5–86. Johnson, E. , Dominici, F. , Griswold, M. and

Zeger, S. L. (2003). Disease casesand their medical costs attributable to smoking: An analysis of the national medicalexpenditure survey.

Journal of Econometrics , (1), 135 – 151. Kang, J. D. Y. and

Schafer, J. L. (2007). Demystifying double robustness: A compar-ison of alternative strategies for estimating a population mean from incomplete data.

Statistical Science , (4), 523–539. Kluve, J. , Schneider, H. , Uhlendorff, A. and

Zhao, Z. (2012). Evaluating con-tinuous training programs using the generalized propensity score.

Journal of the RoyalStatistical Society: Series A (Statistics in Society) , (2), 587–617. Kullback, S. (1959).

Information theory and statistics . Chichester, UK: John Wiley &Sons, Ltd.

Lechner, M. (2001). Identiﬁcation and estimation of causal eﬀects of multiple treatmentsunder the conditional independence assumption. In M. Lechner and F. Pfeiﬀer (eds.),

Econometric Evaluation of Labour Market Policies , Heidelberg: Physica-Verlag HD, pp.43–58. — and Strittmatter, A. (2019). Practical procedures to deal with common supportproblems in matching estimation.

Econometric Reviews , (2), 193–207.19 acKinnon, J. G. (2006). Bootstrap methods in econometrics. Economic Record , ,2–18. Manski, C. F. (2013). Identiﬁcation of treatment response with social interactions.

Econometrics Journal , (1), 1–23. Mccaffrey, D. , Ridgeway, G. and

Morral, A. (2005). Propensity score estimationwith boosted regression for evaluating causal eﬀects in observational studies.

Psycho-logical methods , , 403–25. Mitze, T. , Paloyo, A. R. and

Alecke, B. (2015). Is there a purchase limit on re-gional growth? A quasi-experimental evaluation of investment grants using matchingtechniques.

International Regional Science Review , (4), 388–412. Naimi, A. , Moodie, E. , Auger, N. and

Kaufman, J. (2014). Constructing inverseprobability weights for continuous exposures - A comparison of methods.

Epidemoiology , (1), 292–299. Owen, A. (1990). Empirical likelihood ratio conﬁdence regions.

The Annals of Statistics , (1), 90–120. Qin, J. and

Lawless, J. (1994). Empirical likelihood and general estimating equations.

The Annals of Statistics , (1), 300–325. Robins, J. , Hern´an, M. and

Brumback, B. (2000). Marginal structural models andcausal inference in epidemiology.

Epidemiology , (5), 550–560. Robins, J. M. and

Rotnitzky, A. (1995). Semiparametric eﬃciency in multivariateregression models with missing data.

Journal of the American Statistical Association , (429), 122–129. — and Wang, N. (2000). Inference for imputation estimators.

Biometrika , (1), 113–124. Rosenbaum, P. and

Rubin, D. (1983). The central role of the propensity score in obser-vational studies for causal eﬀects.

Biometrika , (1), 41–55. Roy, A. D. (1951). Some thoughts on the distribution of earnings.

Oxford EconomicPapers , (2), 135–146. Rubin, D. (1974). Estimating Causal Eﬀects of Treatments in Randomised and Nonran-domised Studies.

Journal of Educational Psychology , (5), 688–701. — (1980). Comment on Basu, D. - Randomization analysis of experimental data: TheFisher randomization test. Journal of the American Statistical Association , (371),591–593. Zhao, Q. and

Percival, D. (2017). Entropy balancing is doubly robust.

Journal ofCausal Inference , . Zhu, Y. , Coffman, D. L. and

Ghosh, D. (2015). A boosting algorithm for estimatinggeneralized propensity scores with continuous treatments.

Journal of Causal Inference , (1), 25–40. 20 ables and Figures Table 1: Simulation Results - Bias and Root Mean Squared Error ( N = 200) Moderate Selection σ = 4 Strong Selection σ = 2Degree of non-linearity/additivity None Mild Moderate None Mild Moderatein Y | X η = 1 η = 1 . η = 1 . η = 1 η = 1 . η = 1 . E [ T | X ]IPW 4.1 15.8 6.1 20.1 8.6 26.4 18.8 37.0 30.1 47.8 50.5 71.2CBGPS 5.8 13.8 9.4 16.6 14.2 21.7 25.4 32.8 42.5 48.0 70.8 77.1npCBGPS 0.6 12.9 0.9 13.0 0.7 13.2 4.5 31.8 6.7 33.0 11.5 35.9GBM 9.5 15.6 14.9 21.6 24.2 31.8 30.9 42.1 55.3 70.4 101.0 132.1EBCT 0.4 11.9 0.6 11.9 0.3 12.1 0.6 30.8 0.6 31.8 1.0 31.6Speciﬁcation 2: Mildly mis-speciﬁed E [ T | X ]IPW 7.6 16.5 10.1 20.5 14.8 29.7 28.3 41.2 40.4 53.2 61.4 78.9CBGPS 8.8 15.1 12.9 18.3 19.5 25.3 33.4 39.1 48.2 53.2 76.9 81.9npCBGPS 4.6 13.6 5.8 13.9 7.3 15.0 19.2 36.3 21.2 36.6 27.8 43.2GBM 10.6 16.6 16.4 22.0 26.4 34.0 35.3 43.6 54.9 67.6 96.8 119.3EBCT 4.7 12.9 5.7 13.1 6.9 14.3 18.2 34.0 18.8 33.9 22.1 37.3Speciﬁcation 3: Strongly mis-speciﬁed E [ T | X ]IPW 19.9 24.7 33.6 39.3 60.8 71.1 62.2 71.4 113.5 128.6 220.9 259.8CBGPS 19.0 22.5 32.2 35.9 57.5 62.3 54.3 58.5 94.6 99.4 173.7 183.6npCBGPS 17.9 21.9 30.8 34.5 56.0 61.2 53.9 60.3 94.3 100.8 169.0 179.1GBM 19.9 23.7 34.0 37.9 60.2 66.8 57.6 63.2 99.7 107.4 189.9 208.1EBCT 17.8 21.3 30.7 34.0 55.8 60.5 54.0 59.5 93.4 99.1 168.3 177.4 Note: This table shows absolute bias and root mean squared error (RMSE) measured in percent of the truetreatment eﬀect from the 18 Monte-Carlo simulation scenarios for N = 200. For each scenario, R = 1 , Y on the treatment intensity T . Re-weighting approaches employed are inverse probability weightingestimated via OLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensityscores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) as well asthe novel entropy balancing for continuous treatments (EBCT). (Weighted) Corr( log(prize amount) , X k )Covariate Unweighted IPW CBGPS npCBGPS GBM EBCTAge 0.21 0.06 0.02 0.00 0.11 0.00Years in high school -0.03 0.00 0.00 0.00 -0.07 0.00Years in college 0.04 0.05 0.03 0.00 0.01 0.00Male 0.29 0.07 0.03 0.00 0.12 0.00Tickets bought 0.03 0.01 0.00 0.00 0.02 0.00Working then 0.07 0.02 0.02 0.00 0.03 0.00Winning year 0.03 -0.01 0.01 0.00 0.10 0.00Earnings year-1 0.13 0.01 0.02 0.00 -0.02 0.00Earnings year-2 0.19 0.02 0.02 0.00 0.00 0.00Earnings year-3 0.21 0.02 0.02 0.00 0.01 0.00Earnings year-4 0.21 0.02 0.03 0.00 0.03 0.00Earnings year-5 0.16 0.02 0.02 0.00 0.00 0.00Earnings year-6 0.17 0.06 0.04 0.00 0.02 0.00Mean absolute correlation 0.14 0.03 0.02 0.00 0.04 0.00Maximum weight in % 1.68 1.68 2.39 2.94 1.35 Note: The table shows Pearson correlations between the treatment variable t = log(prize amount) and covariates in the raw sample of Hirano and Imbens (2004) as well as in the re-weighed samples.Re-weighting approaches employed are inverse probability weighting estimated via OLS (IPW, seeRobins et al. , 2000), (non-) parametric covariate balancing generalized propensity scores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) as wellas the novel entropy balancing for continuous treatments (EBCT). (Weighted) Corr( log(pack years) , X k )Covariate X k Unweighted IPW CBGPS npCBGPS GBM EBCTStarting ageLinear -0.14 0.20 -0.07 -0.06 -0.08 0.00Squared -0.13 0.19 -0.04 -0.05 -0.07 0.00Cubed -0.11 0.16 -0.02 -0.04 -0.06 0.00AgeLinear 0.47 -0.34 0.06 0.07 0.01 0.00Squared 0.43 -0.32 0.06 0.07 0.01 0.00Cubed 0.38 -0.29 0.05 0.08 0.01 0.00Male 0.14 0.23 0.04 0.20 0.14 0.00RaceBlack -0.14 0.19 -0.01 -0.06 -0.01 0.00Other 0.19 -0.24 0.02 -0.10 0.03 0.00Seatbelt usageSometimes -0.01 0.2 -0.03 -0.03 0.05 0.00Often -0.04 -0.18 0.05 0.02 -0.11 0.00EducationHigh school 0.01 -0.04 0.02 0.10 0.06 0.00Some college -0.04 -0.02 -0.03 -0.01 0.04 0.00College degree 0.11 -0.03 0.00 -0.10 -0.01 0.00Other 0.11 -0.18 -0.01 0.02 -0.02 0.00Marital statusWidowed 0.04 0.07 -0.02 0.13 0.05 0.00Separated -0.02 0.11 -0.02 -0.03 0.04 0.00Never married -0.26 0.17 0.07 -0.09 -0.01 0.00Census regionMid-west 0.02 0.16 -0.03 0.07 0.05 0.00South -0.02 -0.20 0.05 -0.07 -0.08 0.00West -0.01 0.15 -0.02 -0.02 0.01 0.00Poverty statusPoor -0.02 -0.03 -0.02 0.10 -0.01 0.00Low income -0.01 -0.04 -0.01 -0.09 0.00 0.00Middle income 0.00 -0.15 -0.03 0.10 -0.01 0.00High income 0.03 0.18 -0.02 -0.02 0.00 0.00Mean absolute correlation 0.12 0.16 0.03 0.08 0.04 0.00Maximum weight in % 7.1 12.9 6.6 0.7 0.4

Note: The table shows absolute Pearson correlations between the treatment variable t = ln(packyears) and covariates in the raw sample of Imai and van Dyk (2004) as well as in the re-weighedsamples. Re-weighting approaches employed are inverse probability weighting estimated via OLS(IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensity scores(np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) aswell as the novel entropy balancing for continuous treatments (EBCT). lllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll M a x i m u m A b s o l u t e P ea r s on C o rr e l a t i on 00 . . . . . U n w e i gh t ed I P W C B G PS np C B G PS G B M EB C T Moderate Selection lllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll M a x i m u m A b s o l u t e P ea r s on C o rr e l a t i on 00 . . . . . U n w e i gh t ed I P W C B G PS np C B G PS G B M EB C T Strong Selection

Note: This graph plots the distribution of the maximum absolute correlation coeﬃcients be-tween the treatment intensity T and the covariates X for Monte-Carlo simulation designs withcorrectly speciﬁed E [ T | X ], split by the degree of selection into treatment, both in the rawsample ( unweighted ) as well as in re-weighed samples. Re-weighting approaches employed are in-verse probability weighting estimated via OLS (IPW, see Robins et al. , 2000), (non-) parametriccovariate balancing generalized propensity scores (np-/CBGPS, see Fong et al. , 2018), general-ized boosted modeling (GBM, see Zhu et al. , 2015) as well as the novel entropy balancing forcontinuous treatments (EBCT). llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll M a x i m u m W e i gh t S ha r e i n % I P W C B G PS np C B G PS G B M EB C T Moderate Selection llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll M a x i m u m W e i gh t S ha r e i n % I P W C B G PS np C B G PS G B M EB C T Strong Selection

Note: This graph plots the distribution of the maximum weight shares for Monte-Carlo simu-lation designs with correctly speciﬁed E [ T | X ], split by the degree of selection into treatmentin re-weighed samples. Re-weighting approaches employed are inverse probability weighting esti-mated via OLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalizedpropensity scores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, seeZhu et al. , 2015) as well as the novel entropy balancing for continuous treatments (EBCT). t = Log(Prize amount) E [ E a r n i ng s ( t ) ] IPW CBGPS npCBGPS GBM EBCT

Note: This graph shows the estimated dose-response function (DRF) between the log(prizeamount) and subsequent labor earnings based on a weighted least squares regression using a cubicspeciﬁcation. Re-weighting approaches employed are inverse probability weighting estimated viaOLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensityscores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. ,2015) as well as the novel entropy balancing for continuous treatments (EBCT). Values of theDRF are marked with a + if its derivative is signiﬁcantly diﬀerent from zero at least at the 10%level based on bootstrapped standard errors obtained using R = 1 ,

000 replications. t = Log(Pack years) E [ M ed i c a l e x pend i t u r e ( t ) ] IPW CBGPS npCBGPS GBM EBCT

Note: This graph shows the estimated dose-response function(DRF) between the log(pack years) and medical expenditures based on a weighted least squares regression using a cubic speciﬁcation.Re-weighting approaches employed are inverse probability weighting estimated via OLS (IPW,see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensity scores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) aswell as the novel entropy balancing for continuous treatments (EBCT). Values of the DRF aremarked with a + if its derivative is signiﬁcantly diﬀerent from zero at least at the 10% level basedon bootstrapped standard errors obtained using R = 1 ,

000 replications. ppendix A Table A.1: Simulation Results - Bias and Root Mean Squared Error ( N = 500) Moderate Selection σ = 4 Strong Selection σ = 2Degree of non-linearity/additivity None Mild Moderate None Mild Moderatein Y | X η = 1 η = 1 . η = 1 . η = 1 η = 1 . η = 1 . E [ T | X ]IPW 1.7 12.3 4.5 15.2 5.4 22.3 16.5 32.3 26.6 42.0 46.4 64.6CBGPS 2.5 9.1 5.1 10.9 7.4 14.1 22.5 28.2 36.3 40.5 61.4 66.3npCBGPS 0.1 9.0 0.5 9.3 0.4 9.3 2.5 25.7 3.3 26.1 6.0 27.2GBM 4.7 10.3 9.0 14.0 13.5 19.5 21.5 30.9 36.8 49.3 64.2 79.1EBCT 0.0 7.4 0.6 7.7 0.1 7.5 0.6 19.3 0.1 18.9 0.4 20.2Speciﬁcation 2: Mildly mis-speciﬁed E [ T | X ]IPW 6.6 12.7 8.3 16.6 11.0 22.0 25.7 36.1 35.8 48.0 55.7 68.4CBGPS 6.7 10.8 8.6 12.7 12.5 16.8 30.1 34.1 43.8 47.8 69.3 73.1npCBGPS 4.9 10.0 5.8 10.7 7.0 11.4 16.7 28.4 19.5 30.1 24.5 33.4GBM 8.2 11.9 11.5 15.3 16.7 21.1 28.5 35.3 42.1 50.5 66.5 79.3EBCT 5.2 9.0 5.6 9.4 6.6 9.9 17.2 25.0 19.7 26.7 22.7 28.9Speciﬁcation 3: Strongly mis-speciﬁed E [ T | X ]IPW 19.7 22.5 34.5 37.4 62.0 72.5 71.6 80.2 128.8 142.8 263.8 306.0CBGPS 18.7 20.5 32.7 34.6 57.4 60.7 59.2 62.0 104.9 108.5 198.7 207.7npCBGPS 18.5 20.5 32.1 34.1 56.7 60.3 56.6 59.9 100.2 103.8 186.5 193.8GBM 19.5 21.7 33.7 35.9 59.6 64.1 63.1 67.1 113.9 121.2 217.3 236.7EBCT 18.3 19.8 31.7 33.2 55.9 58.3 56.1 58.3 97.3 99.8 179.5 184.7 Note: This table shows absolute bias and root mean squared error (RMSE) measured in percent of the true treatmenteﬀect from the 18 Monte-Carlo simulation scenarios for N = 500. For each scenario, R = 1 ,

000 independentreplications are performed. Eﬀect estimates are obtained through weighted least squares regression of the outcome Y on the treatment intensity T . Re-weighting approaches employed are inverse probability weighting estimatedvia OLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensity scores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) as well as the novelentropy balancing for continuous treatments (EBCT). N = 1 , Moderate Selection σ = 4 Strong Selection σ = 2Degree of non-linearity/additivity None Mild Moderate None Mild Moderatein Y | X η = 1 η = 1 . η = 1 . η = 1 η = 1 . η = 1 . E [ T | X ]IPW 1.1 10.6 2.4 12.1 4.9 16.8 13.7 26.8 22.4 38.5 36.7 58.3CBGPS 1.1 6.9 1.7 7.3 3.7 9.5 18.6 23.7 31.6 36.0 52.8 57.3npCBGPS 0.1 7.5 0.1 7.4 0.8 7.8 1.0 22.8 2.7 22.8 4.0 26.0GBM 3.6 8.0 5.8 9.7 10.2 14.8 17.2 25.2 29.3 37.3 50.9 75.1EBCT 0.1 5.4 0.0 5.3 0.4 5.5 0.8 14.3 0.2 14.3 0.6 14.4Speciﬁcation 2: Mildly mis-speciﬁed E [ T | X ]IPW 6.0 10.1 7.2 13.6 9.8 23.3 24.8 33.5 33.8 42.5 49.3 65.0CBGPS 5.7 8.4 6.8 9.5 8.8 11.9 27.8 31.3 39.9 42.9 61.5 65.0npCBGPS 5.2 8.9 5.6 9.1 6.7 10.1 16.7 26.8 19.4 27.7 23.2 31.6GBM 7.1 9.9 9.4 11.8 12.8 16.2 26.3 31.7 35.3 40.1 53.1 62.8EBCT 5.1 7.3 5.6 7.7 6.4 8.4 17.5 21.8 19.8 23.5 22.0 25.7Speciﬁcation 3: Strongly mis-speciﬁed E [ T | X ]IPW 19.7 21.3 34.5 36.9 61.7 66.4 73.5 80.8 146.0 162.1 288.1 326.2CBGPS 18.8 19.8 33.0 34.1 58.7 60.5 62.6 65.0 115.3 118.9 215.0 222.8npCBGPS 18.9 20.2 32.7 34.2 58.2 60.4 58.1 60.9 105.8 109.0 190.5 196.5GBM 19.5 20.5 33.4 34.9 60.2 62.9 65.8 69.1 122.4 128.2 240.0 255.4EBCT 18.5 19.2 31.8 32.6 56.4 57.6 56.7 58.1 101.2 102.8 181.0 184.5 Note: This table shows absolute bias and root mean squared error (RMSE) measured in percent of the true treatmenteﬀect from the 18 Monte-Carlo simulation scenarios for N = 1 , R = 1 ,

60 billionto lagging regions.The data stem from the Federal statistics agency and they are measured at the county-level. The dataset contains information on counties’ subsidy receipt per capita , regionalproductivity growth and several covariates ( lagged productivity , lagged labor productivitygrowth , lagged employment , lagged employment growth , local investment intensity , averageﬁrm size , turnover from exports , human capital , population density , a net-migration indi-cator , an urban indicator , information on settlement structure and time dummies ). Theperiod of observation covers the years 1993-2008. All variables are measured in 3-year in-tervals. In total, there are 869 treated county-period observations. The treatment variablehas been Box-Cox transformed with λ ≈ .

15 to reduce its skewness. For the estimationof balancing weights, the same speciﬁcation as in Mitze et al. (2015) is used.Table B.1 displays the correlations between covariates and the treatment variable be-fore and after weighting. Before weighting, there is a substantial negative correlation ofabout -0.6 between lagged labor productivity and treatment. Associations of similar mag-nitude but of opposite sign are given for lagged productivity growth and human capital endowment. This suggests that highly subsidized regions would have performed betterthan other regions even without the subsidy due to catch up growth. Neglecting these dif-ferences, or failing to achieve balance, is thus likely to overstate the eﬀects of the subsidy onregional development. To adjust for this divergence in pre-treatment characteristics, bal-ancing weights are estimated. All approaches reduce mean absolute correlations betweencovariates and the treatment variable. However, only EBCT can reduce these correlationsto zero. For all other approaches, there remain sizable correlations, especially with respectto lagged labor productivity ( growth ). This lack of balance was also documented by Mitze et al. (2015) in their original analysis using the Hirano-Imbens approach. Similar to the Computation times are, in ascending order, 0.05 seconds (IPW), 0.25 seconds (CBGPS), 0.6 seconds(EBCT), 22 seconds (GBM) and 1.6 minutes (npCBGPS). (Weighted) Corr((

Subsidy amount λ − λ , X k )Covariate X k Unweighted IPW CBGPS npCBGPS GBM EBCTLog(lagged labor productivity) -0.62 -0.26 -0.27 -0.06 -0.51 0.00Lagged labor prod. growth 0.52 0.33 0.34 0.31 0.11 0.00Log(lagged employment) -0.05 0.00 0.00 -0.08 -0.02 0.00Lagged employment growth -0.06 -0.02 -0.06 -0.16 0.13 0.00Log(investment intensity) 0.50 0.19 0.21 0.06 0.18 0.00Log(average ﬁrm size) -0.37 -0.19 -0.11 -0.02 -0.14 0.00Log(foreign turnover) -0.40 -0.14 -0.11 -0.04 -0.22 0.00Log(share manufacturing sector) -0.42 -0.25 -0.16 -0.01 -0.23 0.00Log(human capital) 0.52 0.27 0.20 0.03 0.25 0.00Log(population density) -0.16 -0.05 -0.06 0.06 0.09 0.00Net-migration indicator -0.29 -0.2 -0.15 -0.10 -0.14 0.00Urban indicator 0.07 -0.02 0.01 0.08 -0.15 0.00Settlement structure 0.21 0.01 0.04 0.07 -0.10 0.00time dummy 1 0.01 0.05 0.04 -0.10 0.10 0.00time dummy 2 0.06 0.07 0.05 -0.03 0.12 0.00time dummy 3 -0.01 -0.02 -0.02 -0.04 0.07 0.00time dummy 4 -0.05 0.02 -0.01 0.04 -0.17 0.00Mean Absolute Correlation 0.25 0.12 0.11 0.08 0.16 0.00Maximum weight in % 5.92 17.5 28.7 9.35 5.44

Note: The table shows absolute Pearson correlations between the treatment variable t =( Subsidyamount λ − λ with λ = and covariates in the raw sample of Mitze et al. (2015) as well as in there-weighed samples. Re-weighting approaches employed are inverse probability weighting estimated viaOLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing generalized propensity scores(np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM, see Zhu et al. , 2015) as wellas the novel entropy balancing for continuous treatments (EBCT). Estimated DRFs are plotted in Figure B.1. Most estimates suggest a positive relationshipbetween the subsidy and regional productivity growth for medium values of the treatmentintensity. IPW weights yield unrealistic estimates with excessively large gains from thesubsidy in the upper tail of the distribution. While point estimates are relatively similaracross regressions based on (np-)CBGPS, GBM and EBCT, none of the derivatives of theDRF using EBCT are statistically signiﬁcant at the 10% level. This contrasts with theﬁndings of Mitze et al. (2015) who report signiﬁcantly positive derivatives of the DRFfor treatment intensities around the center of the distribution. As EBCT estimates arethe most credible due to their superior balancing quality, one should be skeptical of thevalidity of these ﬁndings. 31igure B.1: Empirical Application – Regional Development t = (Subsidy amount l −1) / l E [ P r odu c t i v i t y g r o w t h i n % ( t ) ] IPW CBGPS npCBGPS GBM EBCT

Note: This graph shows the estimated dose-response function(DRF) between the t =( Subsidyamount λ − λ with λ = and productivity growth based on a weighted least squares regressionusing a cubic speciﬁcation. Re-weighting approaches employed are inverse probability weightingestimated via OLS (IPW, see Robins et al. , 2000), (non-) parametric covariate balancing general-ized propensity scores (np-/CBGPS, see Fong et al. , 2018), generalized boosted modeling (GBM,see Zhu et al. , 2015) as well as the novel entropy balancing for continuous treatments (EBCT).Values of the DRF are marked with a + if its derivative is signiﬁcantly diﬀerent from zero at leastat the 10% level based on bootstrapped standard errors obtained using R = 1 ,

000 replications.000 replications.