[PDF] Identifying causal channels of policy reforms with multiple treatments and different types of selection

Abstract

We study the identification of channels of policy reforms with multiple treatments and different types of selection for each treatment. We disentangle reform effects into policy effects, selection effects, and time effects under the assumption of conditional independence, common trends, and an additional exclusion restriction on the non-treated. Furthermore, we show the identification of direct- and indirect policy effects after imposing additional sequential conditional independence assumptions on mediating variables. We illustrate the approach using the German reform of the allocation system of vocational training for unemployed persons. The reform changed the allocation of training from a mandatory system to a voluntary voucher system. Simultaneously, the selection criteria for participants changed, and the reform altered the composition of course types. We consider the course composition as a mediator of the policy reform. We show that the empirical evidence from previous studies reverses when considering the course composition. This has important implications for policy conclusions.

Full PDF

IIdentifying causal channels of policy reforms withmultiple treatments and diﬀerent types of selection ∗ Annabelle Doerr

UC BerkeleyUniversity of Basel

Anthony Strittmatter

University of St. Gallen

October 13, 2020

Abstract

We study the identiﬁcation of channels of policy reforms with multiple treat-ments and diﬀerent types of selection for each treatment. We disentangle reformeﬀects into policy eﬀects, selection eﬀects, and time eﬀects under the assumption ofconditional independence, common trends, and an additional exclusion restrictionon the non-treated. Furthermore, we show the identiﬁcation of direct- and indirectpolicy eﬀects after imposing additional sequential conditional independence assump-tions on mediating variables. We illustrate the approach using the German reformof the allocation system of vocational training for unemployed persons. The reformchanged the allocation of training from a mandatory system to a voluntary vouchersystem. Simultaneously, the selection criteria for participants changed, and the re-form altered the composition of course types. We consider the course compositionas a mediator of the policy reform. We show that the empirical evidence from pre-vious studies reverses when considering the course composition. This has importantimplications for policy conclusions.JEL-Classiﬁcation: C21, J68, H43Keywords: Diﬀerence-in-Diﬀerences, Mediation Analysis, Treatment Eﬀects Evaluation, Admin-istrative Data, Training Voucher ∗ This study is part of the project “Regional Allocation Intensities, Eﬀectiveness and Reform Eﬀects ofTraining Vouchers in Active Labor Market Policies”, IAB project 1155. This is a joint project of the Insti-tute for Employment Research (IAB) and the University of Freiburg. We gratefully acknowledge ﬁnancialand material support from the IAB. The paper was presented at ESPE in Aarhus, CAFE Workshop inBørkop, SOLE in Washington, EALE in Ljubljana, Joint Research Centre of the European Commission,Centre for European Economic Research, and the University of Bern. We thank participants for helpfulcomments, in particular Hugo Bodory, Bernd Fitzenberger, Hans Fricke, Michael Lechner, Michael Knaus,Thomas Kruppe, Marie Paul, and Gesine Stephan. We are particularly grateful for detailed commentsand remarks from Conny Wunsch. Furthermore, we thank two anonymous referees. The usual disclaimerapplies. Correspondence: [email protected], [email protected] a r X i v : . [ ec on . E M ] O c t Introduction

A popular approach to evaluate the eﬀectiveness of policy reforms in quasi experimentalsettings is the Diﬀerence-in-Diﬀerences (DiD) method. The baseline version of DiD re-quires to observe one treatment and one control group. Both groups are untreated beforethe policy reform. After the reform the treatment group receives the treatment whilethe control group remains untreated. The eﬀectiveness of the reform can be estimatedby comparing the diﬀerences in outcomes of both groups before and after the reform im-plementation. This comparison will lead to unbiased estimates under the common trendassumption, i.e., when the outcomes of both groups would have developed parallel to eachother in the absence of the policy reform.Often the evaluation of policy instruments does not work that simply. For this reason,the literature proposes several extensions of the baseline DiD method. Some studies im-pose conditional independence assumptions to account for selection into treatment basedon observable characteristics (e.g., Abadie, 2005, Heckman, Ichimura, and Todd, 1997,Lechner, 2010). Other studies consider multiple treatments (e.g., Fricke, 2017). Thiscapture situations in which the policy of interest is the reform of an existing policy in-strument, for example an increase of treatment intensity, instead of the implementationof a new instrument. There are studies that combine both extensions (e.g., Felfe, Nollen-berger, and Rodriguez-Planas, 2014, Havnes and Mogstad, 2011a,b). Some studies evenconsider multiple treatments and diﬀerent types of selection for each treatment (e.g., Cardand Hyslop, 2005, Rinne, Uhlendorﬀ, and Zhao, 2013). However, the results from thesestudies are not formally identiﬁed without the implementation of a structural model forthe speciﬁc policy question.As the ﬁrst contribution of our paper, we formally show how policy reforms can be non-parametrically decomposed into eﬀects of changing the policy instrument and selectioneﬀects, as well as other time changing factors such as business cycle eﬀects. We mainlyrely on conditional independence and common trend assumptions. We highlight that anadditional exclusion restriction on the untreated is suﬃcient to identify the policy andtime eﬀects which is not recognised by the previous literature. The imposed assumptions2re necessary and suﬃcient to achieve additive separability, which is, for example, imposedby Rinne, Uhlendorﬀ, and Zhao (2013).Second, we focus on the direct and indirect channels through which changes in a pol-icy instrument may unfold their eﬀects. Relying on mediation analysis (see, e.g., Huber,Lechner, and Mellace, 2017, for a review about mediation analysis), we are the ﬁrst whoexplore direct and indirect eﬀects of a quasi-experimental policy reform. The existingapproaches of the mediation literature investigate direct and indirect treatment insteadof policy reform eﬀects (see, e.g., Flores and Flores-Lagunes, 2009, Huber, Lechner, andStrittmatter, 2018, Imai, Keele, and Yamamoto, 2010, Imai, Keele, Tingley, and Ya-mamoto, 2011, Petersen, Sinisi, and van der Laan, 2006, Van der Weele, 2009). Closelyrelated is the study by Deuchert, Huber, and Schelker (2019), who use common trendassumptions to identify direct and indirect policy eﬀects. In contrast, we rely on com-mon trend assumptions to identify the policy eﬀects and impose additional sequentialindependence assumptions on the mediators to identify the direct and indirect eﬀects ofthe policy reform on the outcome of interest.Third, we illustrate our approach using a large-scale reform of the allocation systemof unemployed individuals to vocational training in Germany. The reform replaced theexisting mandatory allocation system with a voucher allocation system. The voucher sys-tem oﬀers voluntary participation and participants have (some) inﬂuence on the coursechoice (see detailed discussion in Doerr, Fitzenberger, Kruppe, Paul, and Strittmatter,2017). Under the mandatory system , participation was compulsory and caseworkers inlocal employment agencies allocated participants to speciﬁc courses. Additionally, thereform changed the criteria for selecting unemployed persons into training programmes.Under the pre-reform system, caseworkers assigned training based on subjective criteria,whereas the new selection rule focuses on predicted future employment outcomes. Case-workers were incentivised to select unemployed persons with an expected re-employmentprobability of at least 70% within six months after the end of training. Accordingly, thisreform oﬀers a setting in which the overall reform eﬀect is a composition of time eﬀects, Relatedly, Huber, Schelker, and Strittmatter (2019) use a changes-in-changes framework to identify directand indirect channels. However, itis unexplored whether training is more eﬀective under voluntary or mandatory participa-tion, net of the course composition that might change under diﬀerent allocation systems.Our study also contributes to close this research gap.We build on the work of Rinne, Uhlendorﬀ, and Zhao (2013), who investigate thesame reform. They disentangle the eﬀects of the reform of the allocation systems fromthe changing selection criteria and ﬁnd positive but mostly insigniﬁcant short-term eﬀectsof the voucher reform. We replicate their results using a larger data set and an eﬃcient es-timation method. Rinne, Uhlendorﬀ, and Zhao (2013) observe 1,319 training participantsafter the reform and match control observation by single-nearest-neighbour matching. Incontrast, we obtained administrative data from the Federal Employment Agency of Ger-many, which contain the population of vocational training participants during the years2001-2004. Our evaluation sample consists of more than 26,000 training participants ineach time period. We apply the doubly robust and locally eﬃcient auxiliary-to-studytilting estimator proposed in Graham, De Xavier Pinto, and Egel (2016).Furthermore, our data allows us to consider long-term eﬀects over a time period of morethan seven years after programme entry (in contrast to 1.5 years in Rinne, Uhlendorﬀ, andZhao, 2013). Qualitatively we conﬁrm the ﬁndings for the time period under consideration For example, Perez-Johnson, Moore, and Santillano (2011) provide experimental evidence for the relativeeﬀectiveness of diﬀerent degrees of participants’ inﬂuence on the course choice under voluntary partic-ipation in training. They ﬁnd that increasing participants’ course choices has no eﬀects on his or herre-employment probability and negative eﬀects on his or her earnings.

4n Rinne, Uhlendorﬀ, and Zhao (2013). Our results suggest positive eﬀects of the reformof the allocation system in the short-term. Moreover, we ﬁnd that the reform of theallocation system reduces the re-employment probabilities between the ﬁrst and secondyear after the start of training. After three years, the eﬀects turn positive and remain onan approximately stable level until seven years after the training started. This suggeststhat it is crucial to consider long-term reform eﬀects.In contrast to Rinne, Uhlendorﬀ, and Zhao (2013), we consider the type and durationof training as mediators, i.e., intermediate outcomes on the causal path of the assignmentsystem to the individual labour market outcomes. Our results show that the short-termpositive eﬀects of the reform are mainly driven by a diﬀerent composition of training coursetypes and durations after the reform. More individuals participate in shorter courses inthe post-reform period which leads to an improvement of labour market outcomes inthe short-term but not in the long-term. This is almost a mechanical eﬀect, becauseparticipants in courses with short durations are distracted from intensive job search for ashorter time period.This is an example of Manski’s (1997) mixing problem in programme evaluations.Treatment variation occurs because participant can self-select into diﬀerent types and du-rations of training. This makes the evaluation of the treatment particularly complicated,because it is diﬃcult to disentangle variations in the allocation system or treatment.Manski (1997) suggests an partial identiﬁcation approach to address the mixing problem(see also the discussion in Gundersen, Kreider, Pepper, and Tarasuk, 2017). We followa diﬀerent strategy and use a mediation analysis framework (e.g. Imai, Keele, Tingley,and Yamamoto, 2011) to separate the eﬀects of the voluntary allocation system from thevariation in the types and durations of training.We are particularly interested in the eﬀects of voluntary participation net of the eﬀectsfrom a changing course composition. We ﬁnd negative employment eﬀects during the ﬁrstthree years after programme entry. During the lock-in period the re-employment chancesdecrease by up to four percentage points. A possible explanation for this result is lowerjob search intensity under voluntary participation which may be explained by a highermotivation to attend and complete the courses. The eﬀects tend to turn positive in5he long-term. Possibly, unemployed individuals accumulate more human capital underthe voluntary system than under the mandatory system, which pays oﬀ in the long-term. These results point out that causal channels largely aﬀect the policy conclusions.From a policy maker perspective, voluntary participation should only be oﬀered when theprogrammes’ objective is a long-term investment in human capital. Mandatory assignmentappears to be more successful in the short-term. Accordingly, this allocation systemshould be used when fast reintegration is the major programme goal.The remainder of this paper is structured as follows. In the next section, we showthe identiﬁcation of the policy eﬀect of a reform and its causal channels in a setting withmultiple treatments and selection. We discuss the parameter of interest, identiﬁcation,and estimation strategy. A detailed illustration of this approach using the example of theallocation reform of vocational training in Germany follows in Section 3. The ﬁnal sectionconcludes. Additional information is provided in Online Appendices A-E.

We deﬁne the parameter of interest within the potential outcome framework proposedby Rubin (1974). We denote random variables by capital letters and realized values bysmall letters. Assume we have a random sample of individuals from a large population.For each individual in the sample, we observe the treatment state D = d ∈ { , } whichindicates whether the individual receives a treatment D = 1 or not D = 0. Furthermore,we assume that a reform of the policy instrument took place at some point in time. Let T be an indicator for the time period that can take on the values t ∈ { , } for thepre-reform or post-reform time period, respectively. Finally, we consider a policy systemindicator S = s ∈ { b, a } that is b before the reform was implemented and a afterwards.We indicate the potential outcomes by Y dt ( s ). They can be stratiﬁed into eight groups: Y ( b ) and Y ( b ) indicate the potential outcomes that would be observed if under pre-reform system treatment in the pre- or post-reform period, respectively. Y ( a ) and6 ( a ) are the potential outcomes under post-reform system treatment in the pre- orpost-reform period. Y ( b ) and Y ( b ) are the potential outcomes under pre-reform systemnon-treatment before or after the reform. Y ( a ) and Y ( a ) are the analogous potentialoutcomes under post-reform system non-treatment in both time periods.We only observe one potential outcome for each individual. We never observe pre-reform system treatments after the reform took place ( Y ( b ), Y ( b )). Similarly, we neverobserve the post-reform system treatments in the pre-reform period ( Y ( a ), Y ( a )) be-cause the post-reform policy system was implemented as part of the reform. The observedoutcome equals Y = (cid:88) d ∈{ , } (cid:88) t ∈{ , } (cid:88) s ∈{ b,a } G ( d, t, s ) Y dt ( s ) , where G ( d, t, s ) is an indicator function with G ( d, t, s ) = 1 { D = d, T = t, S = s } for d, t ∈ { , } and s ∈ { b, a } , which is the stable unit treatment value assumption (SUTVA)(e.g., Cox, 1958).In our application, D speciﬁes whether an unemployed individual participates in a vo-cational training programme and S speciﬁes the training allocation system before (manda-tory system m ) and after the reform (voucher system v ). The outcome Y measuresdiﬀerent labour market outcomes.We are primary interest in the policy eﬀect, i.e., the eﬀect of the reform of the allocationinto training from a mandatory to a voucher system. Policy eﬀects are the expecteddiﬀerence of potential outcomes under the voucher and mandatory systems by holdingtreatment status and time period constant. In particular, we focus in our application onthe policy eﬀects under treatment in the post-reform period γ p = E [ Y ( v ) − Y ( m ) | D = 1 , T = 1] . Consider the following thought experiment to clarify the interpretation of this policyeﬀect: Compare the employment outcomes of training participants who receive a trainingvoucher after the reform with the employment outcomes that they would obtain if they Alternatively, the policy eﬀect could be deﬁned under non-treatment status or for the pre-reform period. γ ba = E [ Y ( v ) − Y ( v ) | D = 1 , T = 1] − E [ Y ( m ) − Y ( m ) | D = 1 , T = 0] . We show below how to decompose the overall eﬀect into the selection eﬀect, the timeeﬀect, and policy eﬀect. It is often the case, that reforms of policy instruments also eﬀectthe selection of those who are treated with the policy instrument. In our example, partof the reform was the implementation of stricter selection criteria of participants. As aconsequence, treated individuals before and after the reform may diﬀer in their observedcharacteristics. The selection eﬀect under the mandatory system in the pre-reform periodcan be formalised as γ sel = E [ Y ( m ) − Y ( m ) | D = 1 , T = 1] − E [ Y ( m ) − Y ( m ) | D = 1 , T = 0] . The treated population may change before and after the reform, but the policy systemand time eﬀects are held constant. The following thought experiment may clarify theinterpretation of the selection eﬀect: Assign participants from the post-reform period totraining in the pre-reform period. Then, compare them to actually observed participantsin the pre-reform period.Furthermore, the labour market outcomes of individuals could diﬀer before and afterthe reform because of time eﬀects even after controlling for treatment state and policysystem. In our setting, it is likely that business cycle eﬀects occur. In our application, wedeﬁne the business cycle eﬀects under the mandatory system for the treated population8fter the reform, which we formalise as γ bc = E [ Y ( m ) − Y ( m ) | D = 1 , T = 1] and γ bc = E [ Y ( m ) − Y ( m ) | D = 1 , T = 1] . The parameters γ bc deﬁnes business cycle eﬀects under treatment in the mandatorysystem and γ bc deﬁnes business cycle eﬀects under non-treatment in the mandatorysystem. In the following, we discuss the suﬃcient assumptions to identify the eﬀects ofinterest. The identiﬁcation of the overall reform eﬀect γ ba and selection eﬀects γ sel from the jointdistribution of random variables ( Y, G ( d, t, s ) , X ) can be achieved by controlling for alarge set of K confounding pre-treatment variables X with support X ⊆ R K to accountfor the possibility of selection into treatment based on observed characteristics. Assumption 1a (Conditional Mean Independence)

For all d, d (cid:48) , t, t (cid:48) ∈ { , } , s ∈ { m, v } and x ∈ X , E [ Y dt ( s ) | D = d (cid:48) , T = t (cid:48) , X = x ] = E [ Y dt ( s ) | D = d, T = t, X = x ]and all necessary moments exist.This assumption implies that the expected potential outcomes are independent of thetreatment D and time period T after controlling for the pre-treatment control variables X . All confounding variables, which jointly inﬂuence the expected potential outcomesand treatment status must be included in the vector X . Note that Assumption 1a alsoincludes a time dimension, i.e., we assume that individuals being treated in t = 1 wouldhave the same expected potential outcomes as treated individuals in t = 0 if they weretreated under the pre-reform policy system before the reform (conditional on X ). Thisassumptions holds if those treated before and after the reform do not diﬀer systematically9n unobserved characteristics that inﬂuence both the treatment probability and potentialoutcomes. Assumption 2a (Support) .0 < P r ( G ( d, t, s ) = 1 | X = x ) < ∀ d, t ∈ { , } for the subpopulation with G ( d (cid:48) , t (cid:48) , s ) = 1 ∀ d (cid:48) , t (cid:48) ∈ { , } .Assumption 2a requires overlap in the propensity score distributions of the diﬀerentsub-populations, which can be tested in the data (see the discussion in Lechner andStrittmatter, 2019).Under Assumptions 1a and 2a, for all d, d (cid:48) , t, t (cid:48) ∈ { , } and s ∈ { m, v } E [ Y dt ( s ) | D = d (cid:48) , T = t (cid:48) ] = E (cid:20) p d (cid:48) ,t (cid:48) ,s ( X ) p d (cid:48) ,t (cid:48) ,s · p d,t,s ( X ) G ( d, t, s ) Y (cid:21) , (1)is identiﬁed from observed data on the joint distribution of ( Y, G ( d, t, s ) , G ( d (cid:48) , t (cid:48) , s ) , X ),with p k,l,s ( x ) = P r ( G ( k, l, s ) = 1 | X = x ) and p k,l,s = P r ( G ( k, l, s ) = 1) for k ∈ { d, d (cid:48) } and l ∈ { t, t (cid:48) } (see, e.g., Rosenbaum and Rubin, 1983). For completeness, a formal proofof (1) can be found in Online Appendix B.1.Accordingly, the before-after eﬀect γ ba can be calculated as the diﬀerence betweenthe average treatment eﬀects on the treated (ATT) before and after the reform. Thepre-reform ATT can be formalised as γ pre = E [ Y ( m ) − Y ( m ) | D = 1 , T = 0] . The expected potential outcome E [ Y ( m ) | D = 1 , T = 0] is directly observed from thedata. E [ Y ( m ) | D = 1 , T = 0] is the counterfactual expected potential outcome, because Y ( m ) is never observed for treated individuals before the reform. In our setting, γ pre is the average eﬀect of training participation under the mandatory system in the pre-reform period for unemployed persons who mandatorily participate. The pre-reform ATT10s identiﬁed from observed data as γ pre A a,Aa = E (cid:20) p , ,m G (1 , , m ) Y (cid:21) − E (cid:20) p , ,m ( X ) p , ,m · p , ,m ( X ) G (0 , , m ) Y (cid:21) . The post-reform ATT can be indicated by γ post = E [ Y ( v ) − Y ( v ) | D = 1 , T = 1] . The expected potential outcome E [ Y ( v ) | D = 1 , T = 1] is directly observed from the data. E [ Y ( v ) | D = 1 , T = 1] is a counterfactual expected potential outcome, because Y ( v ) isnever observed for treated individuals in the post-reform period. Here, the parameter γ post is the average eﬀect of participation in the post-reform period for participants underthe voucher system. The post-reform ATT is identiﬁed from observed data as γ post A a,A a = E (cid:20) p , ,v G (1 , , v ) Y (cid:21) − E (cid:20) p , ,v ( X ) p , ,v · p , ,v ( X ) G (0 , , v ) Y (cid:21) . Next, we focus on the selection eﬀect. In our setting, programme participants beforeand after the reform are likely to diﬀer in their observed characteristics due to changesin the selection criteria. We are interested in the diﬀerences between the eﬀectiveness oftraining that comes solely by the changing characteristics of participants holding every-thing else constant on the pre-reform situation, γ sel = E [ Y ( m ) − Y ( m ) | D = 1 , T = 1] − E [ Y ( m ) − Y ( m ) | D = 1 , T = 0] . The expected potential outcome E [ Y ( m ) | D = 1 , T = 0] is directly observed from thedata. The selection eﬀect is identiﬁed under Assumption 1a and 2a by γ sel A a,A a = E (cid:20) p , ,v ( X ) p , ,v · p , ,m ( X ) G (1 , , m ) Y (cid:21) − E (cid:20) p , ,v ( X ) p , ,v · p , ,m ( X ) G (0 , , m ) Y (cid:21) − (cid:20) E (cid:20) p , ,m G (1 , , m ) Y (cid:21) − E (cid:20) p , ,m ( X ) p , ,m · p , ,m ( X ) G (0 , , m ) Y (cid:21)(cid:21) . The identiﬁcation of business cycle eﬀects and the policy eﬀect requires two additional11ssumptions because we never observe the pre-reform policy system after the reform andthe post-reform policy system before the reform. First, we assume that potential outcomesof the non-treated are independent of the policy system, i.e., we assume that the reformhas no eﬀects on the outcomes of the untreated. This is a plausible assumption if onlya relatively small fraction of the population is aﬀected by the policy system such thatgeneral equilibrium eﬀects can be neglected. Assumption 3 (Exclusion Restriction on Untreated) E [ Y ( v ) | D = 1 , T = 1] = E [ Y ( m ) | D = 1 , T = 1] . Second, we impose the assumption of common trends. Thereby, we assume the busi-ness cycle eﬀects to be independent of the treatment status, i.e., in absence of the reformthe time trends of the potential outcomes would be similar under treatment and non-treatment in the mandatory system when the characteristics of the participants would beﬁxed.

Assumption 4 (Common Trend Assumption) . γ bc = γ bc Under Assumptions 1a, 2a, 3, and 4, we can identify the business cycle eﬀect undermandatory treatment γ bc from observed data as, γ bc A = γ bc = E [ Y ( m ) − Y ( m ) | D = 1 , T = 1] A = E [ Y ( v ) − Y ( m ) | D = 1 , T = 1] A a,A a = E (cid:20) p , ,v ( X ) p , ,v · p , ,v ( X ) G (0 , , v ) Y (cid:21) − E (cid:20) p , ,v ( X ) p , ,v · p , ,m ( X ) G (0 , , m ) Y (cid:21) . Now, we focus on the parameter of primary interest in this study. The policy eﬀect isthe diﬀerence of potential outcomes of treated due to a change in the policy instrument A possible extension is to focus on bounds instead of point-identiﬁcation (see discussion in, e.g., Kikuchi,2017, Twinam, 2017). E [ Y ( v ) | D = 1 , T = 1] and using E [ Y ( v ) | D = 1 , T = 1] = E [ Y ( m ) | D = 1 , T = 1] (A3), we can rewrite the policy eﬀectas γ p = E [ Y ( v ) − Y ( m ) | D = 1 , T = 1] A = E [ Y ( v ) − Y ( v ) | D = 1 , T = 1] − E [ Y ( m ) − Y ( m ) | D = 1 , T = 1] . The potential outcome Y ( m ) is never observed for treated individuals after the reform.However, under the imposed assumptions the policy eﬀect can be decomposed into thediﬀerent reform parameters by adding and subtracting E [ Y ( m ) − Y ( m ) | D = 1 , T = 0]and E [ Y ( m ) − Y ( m ) | D = 1 , T = 1]. Thus, the policy eﬀect is equal to the overallreform eﬀect minus business cycle eﬀects minus the selection eﬀect, which are all - asshown above - identiﬁed from observed data: γ p = E [ Y ( v ) − Y ( v ) | D = 1 , T = 1] − E [ Y ( m ) − Y ( m ) | D = 1 , T = 1]+ E (cid:2) Y ( m ) − Y ( m ) | D = 1 , T = 1 (cid:3) − E (cid:2) Y ( m ) − Y ( m ) | D = 1 , T = 1 (cid:3) + E (cid:2) Y ( m ) − Y ( m ) | D = 1 , T = 0 (cid:3) − E (cid:2) Y ( m ) − Y ( m ) | D = 1 , T = 0 (cid:3) = E (cid:2) Y ( v ) − Y ( v ) | D = 1 , T = 1 (cid:3) − E (cid:2) Y ( m ) − Y ( m ) | D = 1 , T = 0 (cid:3)(cid:124) (cid:123)(cid:122) (cid:125) γ ba − (cid:104) E (cid:2) Y ( m ) − Y ( m ) | D = 1 , T = 1 (cid:3) − E (cid:2) Y ( m ) − Y ( m ) | D = 1 , T = 0 (cid:3) (cid:105)(cid:124) (cid:123)(cid:122) (cid:125) γ sel − E (cid:2) Y ( m ) − Y ( m ) | D = 1 , T = 1 (cid:3)(cid:124) (cid:123)(cid:122) (cid:125) γ bc + E (cid:2) Y ( m ) − Y ( m ) | D = 1 , T = 1 (cid:3)(cid:124) (cid:123)(cid:122) (cid:125) γ bc . Accordingly, γ p = γ ba − γ sel − ( γ bc − γ bc ), which is the additive separability assumptionimposed in Rinne, Uhlendorﬀ, and Zhao (2013). We show the suﬃcient conditions toachieve additive separability. Imposing assumptions 1a, 2a, 3 and 4, we have shown thatthe total change in the eﬀectiveness of the policy instrument from before to after thereform can be decomposed into the eﬀect of changing the selection, a time eﬀect and the13olicy eﬀect and that these, in turn, are identiﬁed from observed data. Thus, the policyeﬀect can be estimated from observed data as γ p A a,A a,A ,A = E (cid:20) p , ,v G (1 , , v ) Y (cid:21) − E (cid:20) p , ,v ( X ) p , ,v · p , ,v ( X ) G (0 , , v ) Y (cid:21) − E (cid:20) p , ,v ( X ) p , ,v · p , ,m ( X ) G (1 , , m ) Y (cid:21) − E (cid:20) p , ,v ( X ) p , ,v · p , ,m ( X ) G (0 , , m ) Y (cid:21) . We apply a mediation framework (see, for instance, the seminal paper by Baron andKenny, 1986) to isolate the causal channels through which the policy eﬀect works. Inour setting, we aim to separate the eﬀects of voluntary participation (in the following’assignment eﬀect’) from the eﬀect of increased course choice (in the following ’compositioneﬀect’). Thereby, we consider the type and duration of training as so-called mediators,i.e., intermediate outcomes on the causal path of the assignment system to the individuallabour market outcomes. Let C denote the composition of programmes. To investigatethe reform channels, we augment the notation of the potential outcomes with programmecomposition. This new notation of potential outcomes is directly linked to the formernotation by Y dt ( s ) = Y dt ( s, C = c ) = Y dt ( s, c ). We start with the policy eﬀect expressedas the total eﬀect of the change from a mandatory to a voucher system by γ p = E (cid:2) Y ( v, c v ) − Y ( m, c m ) | D = 1 , T = 1 (cid:3) , (2)where we denote c m the realised programme composition under mandatory assignmentand c v the realised programme composition under voucher assignment.This extended notation allows us to deﬁne further parameters of interest. The impactof the policy eﬀect may be (partly) due to increased course choice or to a direct eﬀect ofvoluntary participation. In the following, we show how these two eﬀects can be disentan-gled. First, we are particularly interested in the so-called controlled direct eﬀect (see, for14nstance, Pearl, 2001). It can be formalised as ρ = E (cid:2) Y ( v, c v ) − Y ( m, c v ) | D = 1 , T = 1 (cid:3) . This is the direct eﬀect of the voucher system for the type and duration composition oftraining as under the mandatory system, i.e., the assignment eﬀect. Second, the eﬀect ofincreased course choice can be formalised as δ = E (cid:2) Y ( m, c v ) − Y ( m, c m ) | D = 1 , T = 1 (cid:3) . This is the indirect eﬀect of increased course choice, i.e., the assignment system is keptconstant while the composition of programme types and durations varies. As can be seenfrom adding and substracting Y ( m, c v ) in the expectation of expression (2), the directeﬀect ρ and the indirect eﬀect δ sum up to the total policy eﬀect γ p .However, causal mechanisms are not easily identiﬁed. Even if the policy eﬀect isidentiﬁed, this would not imply identiﬁcation of the mediator eﬀects. Addressing theendogeneity of mediators requires that they are independent of the potential outcomesconditional on the policy system and the covariates. Assumption 1b (Sequential Conditional Mean Independence)

For all s, s (cid:48) ∈ { m, v } and x ∈ X , E [ Y ( s, c s (cid:48) ) | D = 1 , T = 1 , C = c s (cid:48) , X = x ] = E [ Y ( s, c s (cid:48) ) | D = 1 , T = 1 , X = x ]and all necessary moments exist.Assumption 1b implies for treated in the post-reform period that, given the observedpre-treatment confounders, the expected potential outcomes are independent of the typeand duration of training. The selection of the type and duration of training causallysucceeds the selection into treatment. Therefore, we call Assumption 1b sequential con-ditional mean independence. The combination of Assumptions 1a and 1b is analogue tosequential conditional independence assumption invoked in the non-parametric mediation15iterature for identifying direct eﬀects (see, e.g., Imai, Keele, Tingley, and Yamamoto,2011). In contrast, a multiple treatment framework would assume contemporaneous se-lection into treatment and selection of the type and duration of training (Imbens, 2000,Lechner, 2001). Then Assumptions 1a and 1b would have to hold contemporaneouslyinstead of sequentially. Assumption 2b (Support) .0 < P r ( G ( d, t, s ) = 1 | C = c, X = x ) < ∀ s ∈ { m, v } , d, t ∈ { , } . Assumption 2b requires overlap in the propensity score distributions of the mediatorsunder both systems and control variables. Finally, under Assumption 1a,b, 2a,b, 3 and 4the controlled direct and the indirect eﬀects can be identiﬁed as ρ A a,b,A a,b,A ,A = E (cid:20) p , ,v G (1 , , v ) Y (cid:21) − E (cid:20) p , ,v ( X ) p , ,v · p , ,v ( X ) G (0 , , v ) Y (cid:21) − E (cid:20) p , ,v ( X, C ) p , ,v ( C ) · p , ,m ( X, C ) G (1 , , m ) Y (cid:21) − E (cid:20) p , ,v ( X ) p , ,v · p , ,m ( X ) G (0 , , m ) Y (cid:21) , and δ A a,b,A a,b,A ,A = γ p − ρ, with p , ,v ( x, c ) = P r ( G i (1 , , v ) = 1 | C = c, X i = x ) and p , ,v ( c ) = P r ( G i (1 , , v ) =1 | C = c ) (see, e.g., Huber, 2014). We illustrate our approach using a large-scale reform of the allocation system of unem-ployed individuals to vocational training in Germany. This reform presents an illustrativeexample in which policy eﬀects, selection eﬀects, and time eﬀects are part of the overallreform eﬀect. The main objective of vocational training for unemployed persons is theadjustment of their skills to changing requirements in the labour market and/or changed16ndividual conditions (due to health problems, for example). In Germany, vocationaltraining comprises three types of programmes: practice ﬁrm training, classical vocationaltraining, and retraining. Classical vocational training courses takes place in classrooms oron-the-job and are categorised by their planned durations. We distinguish between shorttraining (a maximum duration of six months) and long training (a minimum durationof six months). Practice ﬁrm training simulates a work environment in a practice ﬁrm.Retraining (also called degree course) has long durations of up to three years. It leads tothe completion of a (new) vocational degree within the German apprenticeship system.Further descriptions and examples of courses can be found in Table 1.Table 1 around here

Before 2003, caseworkers’ assignment of unemployed to courses was mandatory and basedon subjective criteria. The introduction of a voucher system on January 1, 2003 had theintention to increase the responsibility of training participants and to establish marketsystems for training providers (Bruttel, 2005). Potential training participants receivea training voucher that allows them to select the provider and course. Their choice issubject to the following restrictions: First, the voucher speciﬁes the objective, content,and maximum duration of the course. Second, it can be redeemed within a one-daycommuting zone. Third, the validity of training vouchers varies between one week and amaximum of three months. Importantly, caseworkers cannot impose sanctions if a voucheris not redeemed.Simultaneously with the voucher system, the reform introduced stricter selection crite-ria for potential training participants. The post-reform paradigm of the German FederalEmployment Agency focuses on direct and rapid placement of unemployed individuals,high reintegration rates, and low dropout rates. Caseworkers award vouchers such thatat least 70% of all voucher recipients are expected to ﬁnd jobs within six months ofcompleting training. The enforcement of the 70% criterion was diﬃcult, because satisfying the rule had no consequences. For .2 Data, treatment and sample This study is based on administrative data provided by the German Federal EmploymentAgency. The data set contains information on all individuals in Germany who participatedin a training programme between 2001 and 2004. Individual records are collected from theIntegrated Employment Biographies (IEB). The sample used as the comparison grouporiginates from the same database. It is constructed as a 3% random sample of individualswho experience at least one transition from employment to non-employment. The treatment is deﬁned as the ﬁrst participation in a vocational training programmeduring the ﬁrst year of unemployment. We follow a static evaluation approach and im-pute (pseudo) participation starts (similar to, e.g., Lechner, 1999, Lechner and Smith,2007). The evaluation sample is constructed as an inﬂow sample into unemployment.The baseline sample (Sample A) consists of individuals who became unemployed in 2001under the mandatory system or in 2003 under the voucher system, after having been con-tinuously employed for at least three months. Additionally, we use an alternative sampledeﬁnition (Sample B) for which we alter the pre-reform sample restrictions. We considerindividuals who enter unemployment in 2002 and start training within the following 12months but no later than December 2002. Thereby, we approximate the timing of thereform implementation with respect to inﬂow into unemployment. Sample B is used forrobustness tests. A graphical illustration of the samples is presented in Figure 1.Figure 1 around hereEntering unemployment is deﬁned as the transition from (non-subsidised, non-marginal,non-seasonal) employment to non-employment of at least one month. We focus on individ-uals who are eligible for unemployment beneﬁts at the time of inﬂow into unemployment. this reason, the selection rule was abolished after 2004. The IEB is a rich administrative database and the source of the sub-samples of data used in all recentstudies that evaluate German ALMP programmes (e.g., Biewen, Fitzenberger, Osikominu, and Paul, 2014,Lechner, Miquel, and Wunsch, 2011, Lechner and Wunsch, 2013). The IEB is a merged data ﬁle containingindividual records collected from four diﬀerent administrative processes: the IAB Employment History(

Besch¨aftigten-Historik ), the IAB Beneﬁt Recipient History (

Leistungsempf¨anger-Historik ), the Data onJob Search originating from the Applicants Pool Database (

Bewerberangebot ), and the Participants-in-Measures Data (

Maßnahme-Teilnehmer-Gesamtdatenbank) . IAB (

Institut f¨ur Arbeitsmarkt- und Berufs-forschung ) is the abbreviation for the research department of the German Federal Employment Agency. We account for the fact that we have diﬀerent sampling probabilities in all calculations whenever necessary.

The baseline Sample A includes 206,511 unweighted or 1,011,125 reweighted observations.We account for the fact that we use a 100% sample of programme participants and a 3%random sample of non-participants using the inverse inclusion probabilities as weights. Weobserve 26,341 unemployed individuals who redeem vouchers and 69,216 participants whoare directly assigned to a training course. This is the full sample of vocational trainingparticipants in Germany that satisﬁes our sample selection criteria. The sample includes420,014 reweighted control persons before and 495,554 reweighted control persons afterthe reform. Table 2 around hereIn Table 2, we report the sample ﬁrst moments of the observed characteristics with alarge standardised diﬀerence. Additionally, we present descriptive statistics for observedcharacteristics with small standardised diﬀerences in Table A.1 in Online Appendix A.In the ﬁrst two columns of Table 2, we report the sample ﬁrst moments of the controlvariables for participants and non-participants under the voucher system. The respec-tive sample moments under the mandatory system can be found in the third and fourthcolumns. The last three columns display the standardised diﬀerences between the diﬀerentsub-samples and the treatment group under the voucher system. Training participantsare on average younger, have fewer instances of incapacity and are better educated. Theyhave more successful employment and welfare histories than unemployed individuals inthe comparison group. These patterns are observed under both systems. The primarydiﬀerences are observed in the employment histories of participants and the regional char-acteristics. Training participants under the voucher system have been employed longer19nd have higher cumulative earnings than participants under the mandatory system.Furthermore, participants under the voucher system are more likely to reside in localemployment agency districts with low employment in the construction sector and a highshare of male unemployment.

Assumptions 1a, 1b are strong, but standard in the programme evaluation literature.The plausibility of similar assumptions has been studied by Biewen et al. (2014) andLechner and Wunsch (2013) for training programme evaluations. Their ﬁndings suggestthat such assumptions are plausible for training programme evaluations when rich datais available. We use exceptionally rich data, which includes the control variables usedin the previous literature and additional new variables. In particular, we use baselinepersonal characteristics, the timing of programme starts, regions, beneﬁt and unemploy-ment insurance claims, pre-programme outcomes, and labour market histories (see Table2 and Table A.1 in Online Appendix A). In addition to the standard variables, we controlfor proxy information concerning physical or mental health problems, lack of motivation,and reported sanctions. Furthermore, we control for regional characteristics at the levelof local employment agency districts, which are often not available with such precision.Thus, the imposed assumptions appear to be plausible in our setting.Assumption 2a,b can be tested using the data. In unreported calculations, we performsimple support tests and do not observe any incidence of support problems.Assumption 3 requires that the reform has no eﬀect on the non-treated. After con-trolling for the changed selection of treated before and after the reform, which can indeedchange the composition of the non-treated, it is plausible that the assignment system isindependent of the potential outcomes of non-participants. The main argument for thisis that the reform of the assignment mechanism only aﬀects participants and the share ofparticipants is relatively small, such that general equilibrium eﬀects can be neglected.We show several plausibility tests for Assumption 4, which requires that the potentialoutcomes of participants and non-participants would follow the same trend in the absence20f the reform. We present three diﬀerent types of supporting evidence for the plausibilityof this assumption. First, Figure 2 reports the long-term trends in the outcome variablesfor diﬀerent samples for the years between 1990 and 2012. Prior to the treatment startdates in 2001 and 2003, the outcomes of the participants and non-participants samplesevolve in parallel over many years. Given these parallel trends, it is likely that we wouldobserve the same respective patterns after 2001 or 2003 in the absence of a treatment.Figure 2 around hereSecond, we experiment with additional information on local employment agency dis-tricts (i.e., regional control variables). We observe the monthly regional unemploymentrate (by gender and citizen status), the ratio of vacant full-time jobs, employment sharesby sector and population density. We assess the sensitivity of our ﬁndings with respect tothese factors. If our results are not sensitive to the regional control variables, we expectthat possible interactions between the eﬀectiveness of training participation and the un-employment rate (or the business cycle in general) are not important in our application.This would support the plausibility of the common trend assumption.Third, we use an alternative sample deﬁnition (Sample B) for which we alter the pre-reform sample restrictions. We consider individuals who enter unemployment in 2002and start training within the following twelve months but no later than December 2002.Consequently, not all individuals in Sample B can participate during the ﬁrst twelvemonths of their unemployment period (e.g., an individual who enters unemployment inOctober can only receive treatment under the mandatory system in the following threemonths). Using Sample B, we approximate the timing of the reform implementation withrespect to the inﬂow into unemployment. We argue that the common trend assumptionis more likely to hold if the time diﬀerence between the pre- and post-reform periodsis smaller. However, in contrast to the baseline sample (Sample A), Sample B is notbalanced in the pre- and post-reform periods (comp. Figure 1).21 .5 Estimation

We apply a semi-parametric reweighting estimator,

Auxiliary-to-Study Tilting (Graham,De Xavier Pinto, and Egel, 2016), in all estimations. This estimator is well suited toour empirical design because it balances the eﬃcient sample ﬁrst moments exactly. Fur-thermore, it is √ N -consistent and asymptotically normal. The estimator is described inOnline Appendix B.2. We start this section by showing the overall reform eﬀect. Figure 3 presents the ATTs forparticipants in vocational training courses before the reform ( γ pre ) and after the reform( γ post ). The outcomes of interest are nonsubsidised and nonmarginal employment whichis subject to social security contributions (‘employment’ in the following). Results formonthly earnings are available in the online appendix. We report separate eﬀects forevery month during 88 months following the course start. The lines are monthly pointestimates and the diamonds indicate signiﬁcant eﬀects at the 5% level.Figure 3 around hereTraining participants suﬀer from negative lock-in eﬀects before and after the reform.The lock-in eﬀects are steeper in the pre-reform period but have longer durations afterthe reform. The long-term eﬀects of participation in vocational training courses on em-ployment probability are positive. Training participation increases long-term employmentprobability (seven years after the start of training) by ﬁve percentage points before thereform and by 7.5 percentage points after the reform.The raw diﬀerence between the post- and pre-reform eﬀectiveness of training identiﬁesthe overall diﬀerence in eﬀects before and after the reform ( γ ba ). In Figure 3, the red solidline shows a positive diﬀerence in eﬀects before and after the reform in the short-term Subsidized employment is employment in the context of an ALMP. Marginal employment is accordingsocial security regulations in Germany deﬁned as employment of a few hours per week only. γ sel ), which are reported in Figure 4. The eﬀects show thediﬀerences in the eﬀectiveness of training that can be solely explained by a diﬀerentparticipant selection in terms of their characteristics holding time and policy instrumentconstant. The results suggest that stricter selection criteria only have a minor inﬂuenceon the eﬀectiveness of training. If anything, we ﬁnd negative selection eﬀects over thelong-term. Given the small diﬀerences in most observed characteristics, such small andmostly insigniﬁcant selection eﬀects are plausible.Figure 5 presents the business cycle eﬀects of non-participation ( γ bc ) for Samples23 and B with and without additional regional control variables. The time eﬀects showan immediate, sharp increase of employment probabilities which peaks after three years.Thereafter, the eﬀects evolve to a 3-5 percentage points higher employment probabilityin the post-reform period compared to the pre-reform period.Figure 5 around hereThe general pattern of the time eﬀects is not sensitive to the sample deﬁnition orto the inclusion of additional regional labour market characteristics. This supports theplausibility of the common trend assumption. However, by the implementation of theHartz reforms, the German labour market was intensively reformed during the observationperiod, particularly in 2005. An improvement of labour market conditions can be observedover the long-term. This does not alter the plausibility of our identifying assumptions aslong as all groups are equally aﬀected by the Hartz reforms.Finally, Figure 6 displays the policy eﬀects for Samples A and B, with and withoutadditional regional control variables. They show the diﬀerence in the eﬀectiveness oftraining that can be solely explained by the changing assignment system from mandatoryassignment to vouchers holding participants characteristics and time period ﬁxed.The pattern of the policy eﬀects varies in the diﬀerent periods after course start. Inthe short term, the policy eﬀects are positive, implying that training is more eﬀectiveunder the voucher system. In the best case, training participants who receive a voucherhave employment probabilities that are approximately 2-3 percentage points higher com-pared to participants in the mandatory system. Over the medium term, the policy eﬀectsare negative. The speciﬁcations using Sample B present a slightly more negative picture.In the worst-case scenario, the employment probability decreases by 5 percentage points.Three years after the start of training, we observe an increase to slightly positive butmostly insigniﬁcant policy eﬀects. After seven years, the eﬀects are positive for all spec-iﬁcations. However, the eﬀects are only signiﬁcant for Sample A with a 4-5 percentagepoint increase in employment probabilities. The results of all speciﬁcations are relatively stable between 40 and 80 months after training participationbegins. This mitigates concerns that our ﬁndings are greatly altered by the ﬁnancial crisis in 2008.

To interpret the policy eﬀect, it is necessary to investigate the channels through whichthe reform aﬀect the employment outcome. First, training is voluntary after the reform.Thus, the eﬀectivness of training might diﬀer between a voucher and a mandatory systembecause voluntarily assigned participants are more motivated than compulsory assignedparticipants. Second, voucher assigned participants have free course choice conditional onthe speciﬁcation on the voucher. In Table 3, we report descriptive statistics for diﬀerenttypes and duration of training programmes before and after the reform. The share ofshort training programmes increases from 21% to 42% after the reform. Moreover, theshare of long training programmes decreases from 41% to 19%. The average planned andactual duration of long programmes (practice ﬁrm) decrease nearly three (two) monthsafter the reform. The share of participants in retraining courses increases from 19% to25%. The average planned duration is extended by more than one month. Table 3 around hereAccordingly, the composition of programme types and durations changed substantiallyafter the reform. We observe higher shares of participants in programmes with a durationof less than six months and higher shares of participants in very long programmes withdurations of more than two years. The ﬁrst development might reﬂect increased freedomof choice under the voucher system. Training vouchers are determined with respect to themaximum programme duration. The unemployed individuals are free to choose a trainingprovider and may self-select into shorter courses.To disentangle the eﬀects of voluntary participation from the eﬀects of increased coursechoice we apply a mediation framework (see, e.g, Robins and Greenland, 1992, Baronand Kenny, 1986). We consider the type and duration of training as mediators, i.e., In 2003, there was also a reduction in the total number of vocational training programmes for politicalreasons. p , ,v ( x, c v ). Figure 7 shows the policy eﬀects, the course composition eﬀects and the eﬀects ofvoluntary participation for Samples A and B with regional control variables. We ﬁndpositive short-term eﬀects that can be explained by the larger share of short programmesafter the reform. After 2-3 years, the eﬀects turn negative which can be explained by alarger share of retraining programmes in the voucher system. In the long term, the coursecomposition eﬀects become slightly positive but remain close to zero.Figure 7 around hereThe eﬀects of voluntary participation become negative immediately after the start oftraining. After two years, voluntary participation leads to a 3-5 percentage points declinein the employment probability compared to mandatory participation. The voluntary par-ticipation eﬀects remain negative until three years after the start of training. Unemployedindividuals might perceive less pressure to ﬁnd a job under voluntary participation, asthey feel more accommodated, have more positive attitudes towards the training courseand a higher motivation to complete the programme. A descriptive analysis of dropoutrates supports this interpretation (see Online Appendix D). We ﬁnd that the dropout We generate dummies for the planned programme durations (less than 6 months, between 6 and 12months, between 12 and 24 months, and more than 24 months). These durations correspond to diﬀerentprogramme types. Furthermore, we account for interactions between these dummies and the plannedprogramme duration to allow for linear trends within each period.

Our results qualitatively conﬁrm the ﬁndings in Rinne, Uhlendorﬀ, and Zhao (2013) forthe time horizon of 1.5 years after treatment. We ﬁnd positive eﬀects of the reform of theallocation system in the short-term. Moreover, we ﬁnd that the reform of the allocationsystem reduces the re-employment probabilities between the ﬁrst and second year afterthe start of training. Our application shows that the consideration of long-term eﬀects iscrucial. In the long-term, the policy eﬀects turn positive and remain on an approximatelystable level until seven years after the training started.Compared to earlier studies, we show that it is important to consider direct andindirect eﬀects of a policy reform. We provide evidence that the short-term positive policyeﬀects are mainly driven by a changing composition of training course types and durationafter the reform. The share of individuals who participate in shorter courses increasedin the post-reform period. The selection into shorter courses improves the labour marketoutcomes in the short-term. This is almost a mechanical eﬀect, because participants inshorter courses are distracted from intensive job search for a shorter time period.If we focus on the direct eﬀect of voluntary participation net of the course compositioneﬀect, we observe a reduction of training eﬀectiveness in the short-term and a signiﬁcant27ncrease in the long run. This can be explained by a higher motivation of participantsunder the voucher system to focus on the course contents and to complete training insteadof intensively search for a new job during course participation.

In this study, we formally show the identiﬁcation of channels of policy reforms withmultiple treatments and diﬀerent selection into each type of treatment. We discuss theassumptions that are suﬃcient to identify the diﬀerent components of the policy reformwhich are selection eﬀects, time eﬀects and the policy eﬀects. Furthermore, we provide aformal framework of the causal channels through which the policy eﬀects may aﬀect theoutcome of interest using mediation analysis.We illustrate the empirical approach using a large reform of the allocation of vocationaltraining programmes in Germany. The pre-reform system granted caseworkers substantialauthority through mandatory allocation of unemployed individuals to training courses.The post-reform voucher system introduces voluntary participation and some freedomof course choice. Additionally, the reform changed the criteria for selecting unemployedpersons into training programmes. This reforms is a illustrative example in which theoverall reform eﬀect can be decomposed into selection eﬀects, time eﬀects and the policyeﬀects of interest. We separate the diﬀerent reform components from each other andinvestigate the channels through which the reform of the allocation system operates. Weare mainly interested in the direct eﬀect of changing the allocation of vocational trainingfrom a mandatory to a voluntary system net from indirect eﬀects that may occur throughthe increased course choice.The empirical results show the importance of considering causal channels since theymay operate in opposing directions. Here, the policy eﬀect indicates an increased eﬀec-tiveness of training after the reform in the short run. We show that the positive eﬀectmainly comes from indirect eﬀects of the policy reform whereas the direct eﬀects showa short-term reduction in the eﬀectiveness of training. This is important knowledge forpolicy makers because it allows to target policy instruments more precicely. Depending on28he short- and long-term objectives of policy makers it may even reverse the applicationof policy instruments.

References

Abadie, A. (2005): “Semiparametric Diﬀerence-in-Diﬀerences Estimators,”

Review ofEconomic Studies , 72(1), 1–19.

Baron, R. M., and

D. A. Kenny (1986): “The Moderator-Mediator Variable Distinc-tion in Social Psychological Research: Conceptual, Strategic, and Statistical Consider-ations,”

Journal of Personality and Social Psychology , 51, 1173–1182.

Biewen, M., B. Fitzenberger, A. Osikominu, and

M. Paul (2014): “The Eﬀec-tiveness of Public Sponsored Training Revisited: The Importance of Data and Method-ological Choices,”

Journal of Labor Economics , 32(4), 837–897.

Bruttel, O. (2005): “Delivering Active Labour Market Policy Through Vouchers: Ex-periences with Training Vouchers in Germany,”

International Review of AdministrativeSciences , 71(3), 391–404.

Card, D., and

D. R. Hyslop (2005): “Estimating the Eﬀects of a Time-LimitedEarnings Subsidy for Welfare-Leavers,”

Econometrica , 73(6), 1723–1770.

Cox, D. R. (1958): “The Regression Analysis of Binary Sequences,”

Journal of the RoyalStatistical Society: Series B (Methodological) , 20, 215–242.

Deuchert, E., M. Huber, and

M. Schelker (2019): “Direct and Indirect EﬀectsBased on Diﬀerence-in-Diﬀerences with an Application to Political Preferences Follow-ing the Vietnam Draft Lottery,”

Journal of Business and Economic Statistics , 37(4),710–720.

Doerr, A., B. Fitzenberger, T. Kruppe, M. Paul, and

A. Strittmatter (2017): “Employment and Earnings Eﬀects of Awarding Training Vouchers in Ger-many,”

Industrial and Labor Relations Review , 70(3), 767–812.29 elfe, C., N. Nollenberger, and

N. Rodriguez-Planas (2014): “Can’t BuyMommy’s Love? Universal Childcare and Children’s Long-Term Cognitive Develop-ment,”

Journal of Population Economics , 283(2), 393–422.

Flores, C., and

A. Flores-Lagunes (2009): “Identiﬁcation and Estimation of CausalMechanisms and Net Eﬀects of a Treatment under Unconfoundedness,”

IZA DiscussionPaper , 4237.

Fricke, H. (2017): “Identifcation based on Diﬀerence-in-Diﬀerences Approaches withMultiple Treatments,”

Oxford Bulletin of Economics and Statistics , 79(3), 426–433.

Graham, B. S., C. C. De Xavier Pinto, and

D. Egel (2016): “Eﬃcient Estimationof Data Combination Models by the Method of Auxiliary-to-Study Tilting,”

Journal ofBusiness & Economics Statistics , 34(2), 288–301.

Gundersen, C., B. Kreider, J. Pepper, and

V. Tarasuk (2017): “Food Assis-tance Programs and Food Insecurity: Implications for Canada in Light of the MixingProblem,”

Empirical Economics , 52(3), 1065–1087.

Havnes, T., and

M. Mogstad (2011a): “Money for Nothing? Universal Child Careand Maternal Employment,”

Journal of Public Economics , 95(11-12), 1455–1465.(2011b): “No Child Left Behind: Subsidized Child Care and Children’s Long-Run Outcomes,”

American Economic Journal: Economic Policy , 3(2), 97–129.

Heckman, J. J., H. Ichimura, and

P. E. Todd (1997): “Matching as an EconometricEvaluation Estimator: Evidence from Evaluating a Job Training Programme,”

Reviewof Economic Studies , 64(4), 605–654.

Hirano, K., G. W. Imbens, and

G. Ridder (2003): “Eﬃcient Estimation of AverageTreatment Eﬀects Using the Estimated Propensity Score,”

Econometrica , 71(4), 1161–1189.

Huber, M. (2014): “Identifying Causal Mechanisms (Primarily) Based on Inverse Prob-ability Weighting,”

Journal of Applied Econometrics , 29(6), 920–943.30 uber, M., M. Lechner, and

G. Mellace (2017): “Why Do Tougher CaseworkersIncrease Employment? The Role of Programme Assignment as a Causal Mechanism,”

Review of Economics and Statistics , 99(1), 180–183.

Huber, M., M. Lechner, and

A. Strittmatter (2018): “Direct and Indirect Eﬀectsof Training Vouchers for the Unemployed,”

Journal of the Royal Statistical Society,Series A , 181(2), 441–463.

Huber, M., M. Schelker, and

A. Strittmatter (2019): “Direct and Indirect Ef-fects based on Changes-in-Changes,” arXiv:1909.04981 . Imai, K., L. Keele, D. Tingley, and

T. Yamamoto (2011): “Unpacking the BlackBox of Causality: Learning about Causal Mechanisms from Experimental and Obser-vational Studies,”

American Political Science Review , 105(4), 765–789.

Imai, K., L. Keele, and

T. Yamamoto (2010): “Identiﬁcation, Inference and Sensi-tivity Analysis for Causal Mediation Eﬀects,”

Statistical Science , 25, 51–71.

Imbens, G. (2000): “The Role of the Propensity Score in Estimating Dose-ResponseFunctions,”

Biometrika , 87(3), 706–710.

Kikuchi, N. (2017): “Intergenerational Transmission of Education in Japan: Nonpara-metric Bounds Analysis with Multiple Treatments,”

ISER Discussion Paper No. 1011 . Lechner, M. (1999): “Earnings and Employment Eﬀects of Continuous Oﬀ-the-jobTraining in East Germany after Uniﬁcation,”

Journal of Business and Economic Statis-tics , 17(1), 74–90.

Lechner, M. (2001): “Identiﬁcation and Estimation of Causal Eﬀects of Multiple Treat-ments under the Conditional Independence Assumption,” in

Econometric Evaluation ofLabour Market Policies , ed. by M. Lechner, and

F. Pfeiﬀer, pp. 43–58. ZEW EconomicStudies 13. New York: Springer-Verlag.(2010): “The Estimation of Causal Eﬀects by Diﬀerence-in-Diﬀerence Methods,”

Foundations and Trends in Econometrics , 4(3), 165–224.31 echner, M., R. Miquel, and

C. Wunsch (2011): “Long-run Eﬀects of Public SectorSponsored Training,”

The Journal of the European Economic Association , 9(4), 742–784.

Lechner, M., and

J. Smith (2007): “What is the Value Added by Caseworkers?,”

Labour Economics , 14(2), 135–151.

Lechner, M., and

A. Strittmatter (2019): “Practical Procedures to Deal withCommon Support Problems in Matching Estimation,”

Econometric Reviews , 38(2),193–207.

Lechner, M., and

C. Wunsch (2013): “Sensitivity of Matching-Based Program Eval-uations to the Availability of Control Variables,”

Labour Economics , 21(C), 111–121.

Manski, C. (1997): “The Mixing Problem in Programme Evaluation,”

Review of Eco-nomic Studies , 64(4), 537–553.

McCall, B., J. A. Smith, and

C. Wunsch (2016): “Government-Sponsored Voca-tional Education for Adults,”

Handbook of the Economics of Education , 5, 479–652.

Paul, M. (2015): “Many Dropouts? Never mind!- Employment Prospects of Dropoutsfrom Training Programs,”

Annals of Economics and Statistics , 119-120, 235–267.

Pearl, J. (2001): “Direct and Indirect Eﬀects,”

Proceedings of the Seventeenth Confer-ence on Uncertainty in Artiﬁcial Intelligence , pp. 411–420.

Perez-Johnson, I., Q. Moore, and

R. Santillano (2011): “Improving the Eﬀec-tiveness of Individual Training Accounts: Long-Term Findings from an ExperimentalEvaluation of Three Service Delivery Models,”

Final Report, Mathematica Policy Re-search, Princeton, NJ . Petersen, M. L., S. E. Sinisi, and

M. J. van der Laan (2006): “Estimation ofDirect Causal Eﬀects,”

Epidemiology , 17, 276–284.

Rinne, U., A. Uhlendorff, and

Z. Zhao (2013): “Vouchers and Caseworkers inTraining Programs for the Unemployed,”

Empirical Economics , 45(3), 1089–1127.32 obins, J., and

S. Greenland (1992): “Identiﬁability and Exchangeability for Directand Indirect Eﬀects,”

Epidemiology , 3, 143–155.

Rosenbaum, P., and

D. Rubin (1983): “The Central Role of Propensity Score inObservational Studies for Causal Eﬀects,”

Biometrica , 70(1), 41–55.

Rubin, D. B. (1974): “Estimating the Causal Eﬀect of Treatments in Randomized andNon-Randomized Studies,”

Journal of Educational Psychology , 66(5), 688–701.

Strittmatter, A. (2016): “What Eﬀect Do Vocational Training Vouchers Have on theUnemployed?,”

IZA World of Labor , 316.

Tomini, F., W. Groot, and

H. Maassen van den Brink (2016): “The Eﬀective-ness of the Voucher Training Programs: A Systematic Review of the Evidence fromEvaluations,”

TIER Working Paper Series , 16/08.

Twinam, T. (2017): “Complementarity and Identiﬁcation,”

Econometric Theory , 33(5),1154–1185.

Van der Weele, T. J. (2009): “Marginal Structural Models for the Estimation ofDirect and Indirect Eﬀects,”

Epidemiology , 20, 18–26.33 igures

Figure 1: Graphical illustration of Sample A and B (a) Sample A(b) Sample B

Figure 2: Time trends of employment probabilities for diﬀerent subgroups of individualsfor the 1991-2012 period

Note: We report time trends for the years between 1990 and 2012. The outcome variables are reweighted as described inOnline Appendix B.2. Similar ﬁndings are obtained without reweighting.

Note: We estimate separate eﬀects for each of the 88 months following the treatment. Diamonds indicate signiﬁcant pointestimates at the 5%-level. Signiﬁcance levels are bootstrapped with 499 replications. Lines without diamonds indicate pointestimates that are not signiﬁcantly diﬀerent from zero. We use baseline Sample A and control for local employment agencydistrict characteristics and the full set of observed characteristics (see Table A.2 in Online Appendix A).

Figure 4: Selection and overall reform eﬀects

Figure 6: Policy eﬀects

Programme type Description ExamplesPractice ﬁrm training Courses that took place in practice ﬁrms tosimulate a work environment. Training in commercial software, for oﬃceclerks, in data processingShort training Provision of occupation speciﬁc skills (dura-tion ≤ > Voucher system Mandatory system Absolute standardised diﬀerences betweenTreatment- Control- Treatment- Control (1) and (2) (1) and (3) (1) and (4)group group group group(1) (2) (3) (4) (5) (6) (7)

Personal characteristics

Age 38.8 41.3 38.7 41.5 28.5 0.9 31.4Older than 50 years .010 .111 .019 .125 43.3 7.1 47.0Incapacity (e.g., ill-ness, pregnancy) .022 .050 .032 .062 15.4 6.2 20.2Health .083 .128 .093 .146 14.5 3.4 20.0

Education and occupation

University entry de-gree (Abitur) .229 .170 .197 .142 14.7 7.9 22.5White-collar .382 .476 .440 .527 19.2 12.0 29.5Manufacturing .069 .101 .101 .147 11.7 11.4 25.3

Employment and welfare history

Half months empl.(last 2 years) 45.6 44.9 44.5 43.7 10.1 15.4 25.7Half months since lastunempl. in last 2 years 46.8 46.2 45.6 44.4 11.6 19.7 35.0Half months since lastOLF (last 2 years) 45.8 44.6 44.9 43.3 15.5 12.5 29.9Eligibility unempl.beneﬁts 13.5 14.7 13.2 14.8 21.1 5.9 20.7Remaining unempl.insurance claim 25.6 22.3 23.4 21.4 25.0 18.0 31.7Cumulative earnings(last 4 years) 91,204 83,632 80,913 81,156 15.6 21.8 21.0

Timing of unemployment and programme start

Start unempl. inSeptember .151 .079 .099 .075 22.9 15.7 24.2Elapsed unempl. dura-tion 5.06 3.55 4.53 3.45 46.0 15.7 49.0

Characteristics of local employment agency districts

Share of empl. in con-struction industry .064 .065 .077 .077 2.3 54.3 55.5Share of male unempl. .564 .563 .541 .541 1.1 50.8 53.5Note: See Table A.1 in Online Appendix A for sample ﬁrst moments of observed characteristics with small standardiseddiﬀerences. In columns (1)-(4), we report the sample ﬁrst moments of observed characteristics for the treated and non-treated sub-samples. Information on individual characteristics refers to the time of inﬂow into unemployment, with theexception of the elapsed unemployment duration and monthly regional labour market characteristics, which refer to the(pseudo) treatment time. In columns (5)-(7), we report the standardised diﬀerences between the diﬀerent sub-samplesand the treatment group under the voucher system. A description of how we measure absolute standardised diﬀerences isavailable in Online Appendix B.2. Rosenbaum and Rubin (1983) classify absolute standardised diﬀerence of more than 20as “large”. OLF is the acronym for “out of labour force”. nline Appendix to“Identifying causal channels of policy reforms withmultiple treatments and diﬀerent types of selection” Annabelle Doerr and Anthony Strittmatter Sections:

A. Descriptive statisticsB. Supplements to the empirical approachC. Matching qualityD. The change in dropout ratesE. Results for monthly earningsF. Heterogeneous results by programme type Annabelle Doerr, UC Berkeley, [email protected] and Anthony Strittmatter, Department ofEconomics, University of St.Gallen, [email protected]. Descriptive statistics

Table A.1: Sample ﬁrst moments of observed characteristics with small standardiseddiﬀerences.

Voucher system Mandatory system Standardised diﬀerences betweenTreatment- Control- Treatment- Control- (1) and (2) (1) and (3) (1) and (4)group group group group(1) (2) (3) (4) (5) (6) (7)

Personal characteristics

Female .472 .447 .477 .411 5.0 .9 12.4No German citizenship .054 .080 .052 .071 10.5 1.0 7.2Children under 3 years .042 .035 .040 .031 3.7 1.2 6.1Single .300 .285 .270 .251 3.4 6.7 11.1Sanction .007 .007 .009 .008 .2 2.0 .5Lack of motivation .007 .007 .009 .008 .2 2.0 .5

Education and occupation

No schooling degree .036 .068 .036 .056 14.3 .4 9.3Schooling degree without Abitur .720 .731 .750 .770 2.4 6.8 11.4Missing .014 .031 .017 .032 10.9 2.4 11.9No vocational degree .203 .227 .218 .219 5.9 3.6 3.8Academic degree .112 .096 .081 .063 5.4 10.6 17.3Agriculture, Fishery .012 .020 .015 .023 6.7 3.2 8.7Construction .054 .032 .027 .022 10.9 13.3 16.6Trade and Retail .127 .169 .148 .175 11.8 6.2 13.5Communication and Information Ser-vice .108 .137 .122 .128 8.6 4.2 6.1

Employment and welfare history

Half months unempl. in last 2 years .398 .370 .578 .581 1.6 9.5 9.7No unempl. in last 2 years .914 .921 .877 .878 2.7 11.8 11.6Unemployed in last 2 years .034 .040 .046 .052 3.1 6.2 9.1

Timing of unemployment and programme start

Start unempl. in January .060 .101 .117 .105 15.0 19.8 16.1Start unempl. in February .070 .089 .108 .089 7.2 13.4 7.0Start unempl. in March .096 .083 .105 .085 4.5 3.0 3.7Start unempl. in April .102 .088 .120 .086 4.8 5.7 5.8Start unempl. in June .059 .078 .058 .072 7.6 .6 5.3Start unempl. in July .052 .080 .053 .078 11.1 .3 10.4Start unempl. in August .081 .078 .080 .078 1.0 .3 .9Start unempl. in October .127 .078 .085 .082 16.4 13.8 14.9Start unempl. in November .086 .079 .045 .082 2.6 16.6 1.7Start unempl. in December .045 .082 .040 .089 15.0 2.8 17.6

State of residence

Baden-W¨urttemberg .087 .113 .095 .090 8.6 2.9 1.2Bavaria .159 .138 .111 .115 6.1 14.1 12.8Berlin, Brandenburg .093 .093 .107 .111 .1 4.7 6.0Hamburg, Mecklenburg WesternPomerania, Schleswig Holstein .076 .088 .098 .092 4.3 7.9 5.6Hesse .064 .068 .063 .058 1.7 .1 2.3Northrhine-Westphalia .232 .206 .182 .197 6.2 12.4 8.6Rhineland Palatinate, Saarland .056 .054 .055 .049 .9 .6 3.4Saxony-Anhalt, Saxony, Thuringia .123 .142 .189 .190 5.5 18.4 18.5

Characteristics of local employment agency districts

Population per km

910 889 789 895 1.3 7.5 .9Unemployment rate (in %) 12.2 12.3 12.1 12.0 1.9 1.4 3.8Share of empl. in production industry .250 .246 .246 .241 5.1 4.7 9.9Share of empl. in trade industry .150 .150 .150 .150 1.8 2.7 2.8Share of non-German unempl. .139 .141 .126 .128 2.5 14.3 12.1Share of vacant fulltime jobs .794 .794 .800 .799 0 8.4 7.6Note: See Table 2 for sample ﬁrst moments of observed characteristics with large standardised diﬀerences. In columns (1)-(4), we reportthe sample ﬁrst moments of observed characteristics for the treated and non-treated sub-samples. Information on individual characteristicsrefers to the time of inﬂow to unemployment, with the exception of the elapsed unemployment duration and monthly regional labour marketcharacteristics, which refer to the (pseudo) treatment time. In columns (5)-(7), we report the standardised diﬀerences between the diﬀerentsub-samples and the treatment group under the voucher system. Please ﬁnd a description of how we measure standardised diﬀerences inOnline Appendix B.2. OLF is the acronym for “out of labour force”.

Voucher Mandatory Standardised diﬀerences betweensystem system (1) and (2)(1) (2)

Personal characteristics

Female .472 .476 .8Age 38.754 38.697 .8Older than 50 years .011 .019 7.1No German citizenship .054 .052 1.1Children under 3 years .042 .040 1.2Single .300 .270 6.6Health problems .083 .093 3.7Sanction .007 .009 2.1Incapacity (e.g., illness, pregnancy) .022 .032 6.3Lack of motivation .007 .009 2.1

Education and occupation

No schooling degree .036 .035 .5Schooling degree without Abitur .719 .762 9.8University entry degree (Abitur) .230 .185 11.2No vocational degree .204 .217 3.3Academic Degree .114 .081 11.3White-collar .383 .440 11.7Agriculture, Fishery .012 .015 3.2Manufacturing .069 .101 11.6Construction .053 .027 13.1Trade and Retail .127 .148 6.1Communication and Information Service .109 .122 4.1

Employment and welfare history

Half months empl. in last 2 years 45.5 44.5 15Half months unempl. in last 2 years .401 .587 10Half months since last unempl. in last 2 years 46.7 45.6 20.7No unempl. in last 2 years .913 .876 12.1Unempl. in last 2 years .034 .047 6.3

Timing of unemployment and programme start

Start unempl. in January .059 .116 19.9Start unempl. in February .070 .108 13.4Start unempl. in March .095 .104 2.9Start unempl. in April .102 .120 5.7Start unempl. in June .059 .058 .4Start unempl. in July .052 .053 .7Start unempl. in August .082 0.08 .6Start unempl. in September .152 .099 16.2Start unempl. in October .127 .085 13.5Start unempl. in November .087 .046 16.6Start unempl. in December .045 .040 2.6Elapsed unempl. duration 5.08 4.54 16.2

State of residence

Baden-W¨urttemberg .085 .093 2.9Bavaria .159 .113 13.4Berlin, Brandenburg .090 .103 4.3Hamburg, Mecklenburg Western Pomerania, Schleswig Holstein .077 .099 8Hesse .064 .064 0Northrhine-Westphalia .231 .180 12.7Rhineland Palatinate, Saarland .056 .055 .7Saxony-Anhalt, Saxony, Thuringia .125 .191 18

Characteristics of local employment agency districts

Share of empl. in production industry .250 .246 5.1Share of empl. in construction industry .064 .077 52.4Share of empl. in trade industry .150 .150 3.1Share of male unempl. .564 .541 46.9Share of non-German unempl. .138 .126 13.3Share of vacant fulltime jobs .793 .800 8.9Population per km

902 778 7.4Unemployment rate (in %) 12.2 12.1 2.5Note: In columns (1)-(2), we report the eﬃcient ﬁrst moments of observed characteristics for the treated sub-samples. They are exactlyequal in the other re-weighted sub-samples, which are not reported. Information on individual characteristics refers to the time of inﬂow tounemployment, with the exception of the elapsed unemployment duration and monthly regional labour market characteristics which refer tothe (pseudo) treatment time. In column (3), we report the standardised diﬀerences (SD) between the two treatment groups. Please ﬁnd adescription of how we measure standardised diﬀerences in Online Appendix B.2. OLF is the acronym for “out of labour force”. Supplements to the empirical approach

B.1 Proof of Equation (1)

We show that E [ Y di,t ( s ) | D i = g, T i = q ] can be identiﬁed from the joint distributionof random variables ( Y, G ( d, t, s ) , G ( d (cid:48) , t (cid:48) , s ) , X ) under Assumptions 1a and 2a (comp.Hirano, Imbens, and Ridder, 2003, Rosenbaum and Rubin, 1983): E [ Y di,t ( s ) | D i = d (cid:48) , T i = t (cid:48) ] = (cid:90) E [ Y di,t ( s ) | D i = d (cid:48) , T i = t (cid:48) , X i = x ] f X ( x | D i = d (cid:48) , T i = t (cid:48) ) dx, = (cid:90) E [ Y di,t ( s ) | D i = d, T i = t, X i = x ] f X ( x | D i = d (cid:48) , T i = t (cid:48) ) dx, = (cid:90) E [ Y i | D i = d, T i = t, X i = x ] f X ( x | D i = d (cid:48) , T i = t (cid:48) ) dx, = (cid:90) E [ G i ( d, t, s ) Y i | D i = d, T i = t, X i = x ] f X ( x | D i = d (cid:48) , T i = t (cid:48) ) dx, = (cid:90) p d,t,s ( x ) E [ G i ( d, t, s ) Y i | X i = x ] f X ( x | D i = d (cid:48) , T i = t (cid:48) ) dx, = (cid:90) p d (cid:48) ,t (cid:48) ,s ( x ) p d (cid:48) ,t (cid:48) ,s · p d,t,s ( x ) E [ G i ( d, t, s ) Y i | X i = x ] f X ( x ) dx, = (cid:90) p d (cid:48) ,t (cid:48) ,s ( x ) p d (cid:48) ,t (cid:48) ,s · p d,t,s ( x ) G i ( d, t, s ) Y i f X ( x ) dx, = E (cid:20) p d (cid:48) ,t (cid:48) ,s ( x ) p d (cid:48) ,t (cid:48) ,s · p d,t,s ( x ) G i ( d, t, s ) Y i (cid:21) . In the ﬁrst equation we apply the law of iterative expectations. In the second equalitywe condition on D i = d , which is possible because we assume that the expected potentialoutcomes are independent of the treatment after controlling for X i (Assumption 1). Inequality three we replace the potential by the observed outcome. In equality four wemultiply the outcome Y i with the the group dummy G i ( d, t, s ). In equality ﬁve we usethe fact that E [ DY ] = E [ DY | D = 1] P r ( D = 1). In equality six we apply Bayes’ rule.We make a backward application of the law of iterative expectations in equality seven.Finally, we replace the integral by an expectation in equality eight. (cid:3) .2 Estimation strategy A straightforward estimation strategy is based on the sample analogue of (1)ˆ E [ Y di,t ( s ) | D i = d (cid:48) , T i = t (cid:48) ] = 1 N N (cid:88) i =1 ˆ ω i Y i , with ˆ ω i = G i ( d, t, s ) N (cid:80) Nj =1 ˆ p d (cid:48) ,t (cid:48) ,s ( X j ) · ˆ p d (cid:48) ,t (cid:48) ,s ( X i )ˆ p d,t,s ( X i ) , (1)where ˆ p d (cid:48) ,t (cid:48) ,s ( X i ) and ˆ p d,t,s ( X i ) indicate the estimated conditional treatment probabilities(henceforth, propensity scores ). This is an Inverse Probability Weighting (IPW) estimator.Hirano, Imbens, and Ridder (2003) demonstrate that the consistency and eﬃciency of anIPW critically depend on the estimated propensity scores. Parametric speciﬁcations ofthe propensity score do not necessarily lead to eﬃcient estimates. One reason is that (1)seeks to balance the sample covariate distributions, which equalˆ F d (cid:48) ,t (cid:48) = 1 (cid:80) Ni =1 ˆ p d (cid:48) ,t (cid:48) ,s ( X i ) N (cid:88) i =1 G i ( d (cid:48) , t (cid:48) , s )1 { X i ≤ x } , when d = d (cid:48) and t = t (cid:48) . However, ˆ F d (cid:48) ,t (cid:48) can be more eﬃciently estimated using informationfrom the entire population rather than from the random sample d (cid:48) , t (cid:48) alone. The eﬃcientestimators for the covariate distributions of subpopulation d (cid:48) , t (cid:48) equalˆ F effd (cid:48) ,t (cid:48) = 1 (cid:80) Ni =1 ˆ p d (cid:48) ,t (cid:48) ,s ( X i ) N (cid:88) i =1 ˆ p d (cid:48) ,t (cid:48) ,s ( X i )1 { X i ≤ x } . Accordingly, reweighting estimators that recover ˆ F effd (cid:48) ,t (cid:48) rather than of ˆ F d (cid:48) ,t (cid:48) may be moreeﬃcient. We report the eﬃcient ﬁrst moments for all control variables and both treatmentgroups in Table A.2 in Online Appendix A.Graham, De Xavier Pinto, and Egel (2016) recently proposed a double robust andlocally eﬃcient semiparametric version of IPW, named Auxiliary-to-Study Tilting (AST).This estimator precisely balances the eﬃcient ﬁrst moments of all control variables ineach treatment sample. Using AST, the propensity score is estimated in a conventionalparametric way. We use the probit model ˆ p d (cid:48) ,t (cid:48) ,s ( X i ) = Φ( X (cid:48) i ˆ β ), where Φ( · ) denotes thecumulative normal distribution function and X (cid:48) i ˆ β is the estimated linear index. Theestimated propensity score ˆ p d,t,s ( x ) is replaced by ˜ p d,t,s ( x ). It is estimated under the Exact balancing is not guaranteed for the sample moments using conventional IPW estimators. N N (cid:88) i =1 G i ( d, t, s )1 N N (cid:88) j =1 ˆ p d (cid:48) ,t (cid:48) ,s ( X j ) · ˆ p d (cid:48) ,t (cid:48) ,s ( X i )˜ p d,t,s ( X i ) · X i = 1 N N (cid:88) i =1 ˆ p d (cid:48) ,t (cid:48) ,s ( X i )1 N N (cid:88) j =1 ˆ p d (cid:48) ,t (cid:48) ,s ( X j ) · X i , (2)where ˜ p d,t,s ( X i ) = Φ( X (cid:48) i ˜ β ) is speciﬁed such that the left and right sides of (2) are numer-ically equivalent for all elements in X i (including a constant term). The right side is theeﬃcient ﬁrst moment estimate. As the eﬃcient ﬁrst moment estimates are independentof subpopulation with d, t , the ﬁrst moments are exactly balanced in all treatment groupsfor d, t ∈ { , } using this procedure. The constant guarantees that the weights sum toone. The expected potential outcomes are estimated using˜ E [ Y di,t ( s ) | D i = d (cid:48) , T i = t (cid:48) ] = 1 N N (cid:88) i =1 ˜ ω i Y i , with ˜ ω i = G i ( d, t, s ) N (cid:80) Nj =1 ˆ p d (cid:48) ,t (cid:48) ,s ( X j ) · ˆ p d (cid:48) ,t (cid:48) ,s ( X i )˜ p d,t,s ( X i ) . It can be shown that this estimator is √ N -consistent and asymptotically normal dis-tributed. Similar to Graham, De Xavier Pinto, and Egel (2016), we compute the signiﬁ-cance levels (p-values) of our estimated parameters based on a non-parametric bootstrap-ping procedure (sampling individual observations with replacement). The large sample properties of AST are subject to assumptions regarding the speciﬁcation of the propen-sity score. These assumptions imply that the propensity score is correctly speciﬁed, strictly increasing inits arguments, diﬀerentiable, and well located within the unit interval. Matching quality

We assess the matching quality by reporting the moments (mean, variance, skewness,kurtosis) and standardised diﬀerences for the control variables in all four samples. Thestandardised diﬀerences are deﬁned by SD = | µ d,t,s − µ d (cid:48) ,t (cid:48) ,s | (cid:113) . σ µ d,t,s + σ µ d (cid:48) ,t (cid:48) ,s ) · , where µ d,t,s is the moment and σ µ d,t,s is the variance of the moment in the respectivetreatment group G i ( d, t, s ) with d, d (cid:48) , t, t (cid:48) ∈ { , } and s ∈ { v, m } . The pre-matchingstandardised diﬀerences between the sample ﬁrst moments are reported in Table 2. Thepost-matching standardised diﬀerences between the eﬃcient ﬁrst moments are exactlyzero, as the ﬁrst moments are precisely balanced (see the discussion in Online AppendixB.2). Therefore, we do not report the standardised diﬀerence of the matched treatmentand control samples in Table A.2 (only between the voucher and mandatory system).In the optimal case, matching estimators balance the complete distributions of allcontrol variables rather than only the ﬁrst moments. For all binary variables, this re-quirement is satisﬁed because the ﬁrst moments are balanced. In the main speciﬁcations,we control for 63 variables, 43 of which are binary. For the other variables, we report thevariance, skewness, and kurtosis for the diﬀerent samples matched to the treatment groupunder the voucher system in Table C.1. Furthermore, we present the higher moments forthe diﬀerent samples matched to the treatment group under the mandatory system inTable C.2. For most moments, we report small standardised diﬀerences. However, par-ticularly for the monthly regional labour market characteristics, we ﬁnd large diﬀerencesin the higher moments for the samples that are matched to the treatment group underthe mandatory system. 7able C.1: Higher moments of observed characteristics matched to the treatment groupunder the voucher system. Voucher system Mandatory system Standardised diﬀerences betweenTreatment- Control- Treatment- Control- (1) and (2) (1) and (3) (1) and (4)group group group group(1) (2) (3) (4) (5) (6) (7)

Variance

Age 55.48 64.64 56.33 63.1 13.62 1.35 11.68Half months empl. in the last 24 months 41.45 40.17 38.23 36.72 1.16 2.98 4.47Half months unempl.in the last 24 months 2.98 3.12 3.12 3.14 .72 .67 .77Time since last unemployment in the last 24 months (half-months) 20.07 20.45 20.73 20.5 .42 .67 .44 · · · · km Skewness

Age 46.23 110.42 81.03 115.84 5.20 3.18 5.87Half months empl. in the last 24 months -691.31 -663.08 -610.47 -566.93 1.18 3.50 5.52Half months unempl. in the last 24 months 29.37 32.61 34.63 35.01 .97 1.40 1.52Time since last unemployment in the last 24 months (half-months) -381.07 -400.22 -449.84 -439.91 .89 2.61 2.30 · -1.37 · -1.42 · -1.25 · · · · · km · · · · Kurtosis

Age 6984 9302 7132 8593 13.28 1.05 9.75Half months empl. in the last 24 months 14214 13745 12377 11302 .88 3.64 5.95Half months unempl. in the last 24 months 375 440 521 520 .96 1.82 1.84Time since last unemployment in the last 24 months (half-months) 8409 9053 11762 11308 1.22 4.30 3.95 · · · · km · · · · .22 8.46 6.91Unemployment rate (in %) 1740 1941 1219 1239 3.82 12.50 11.49 Note: In columns (1)-(4), we report the variance, skewness, and kurtosis of observed characteristics for the treated andnon-treated sub-samples. Information on individual characteristics refers to the time of inﬂow into unemployment, withthe exception of the elapsed unemployment duration and monthly regional labour market characteristics, which refer tothe (pseudo) treatment time. In columns (5)-(7), we report the standardised diﬀerences between the diﬀerent sub-samplesand the treatment group under the voucher system. All control variables that are not reported in this table have binarydistributions. The higher moments of these variables are precisely balanced in the matched samples.

Voucher system Mandatory system Standardised diﬀerences betweenTreatment- Control- Treatment- Control- (1) and (2) (1) and (3) (1) and (4)group group group group(1) (2) (3) (4) (5) (6) (7)

Variance

Age 59.75 60.3 72.45 66.49 .82 16.27 8.64Half months empl. in the last 24 months 58.91 54.18 62.6 53.03 3.98 6.51 1Half months unempl. in the last 24 months 3.93 4.32 4.4 4.2 1.83 .33 .51Time since last unemployment in the last 24 months (half-months) 41.11 45.83 44.56 44.64 3.34 .84 .75 · · · · km Skewness

Age 85.74 110.07 153.47 163.78 2 2.96 3.9Half months empl. in the last 24 months -894.18 -795.24 -1055.27 -772.03 3.87 8.51 .9Half months unempl. in the last 24 months 31.99 43.57 39.91 40.72 3.36 1.02 .69Time since last unemployment in the last 24 months (half-months) -732.4 -1014.77 -896.02 -969.59 7.23 2.76 .96 · · · · .34 3.08 5.03Cumulative beneﬁts (last 4 years before unemployment) 3005.55 2948.6 2372.48 2995.96 .25 2.75 .18Elapsed unemployment duration 9.71 12.41 11.45 11.55 3.23 1.07 1Share of empl. in production industry .0001537 .000418 .0001786 .0004147 13.62 11.99 .14Share of empl. in construction industry .0000071 .0000112 .0000112 .0000135 8.1 .06 4.02Share of empl. in trade industry .0000054 .0000044 .0000049 .0000045 3.59 1.7 .44Share of male unempl. -.0000996 -.0000117 -.0000718 -.0000223 23.68 18.35 3.3Share of non-German unempl. .0005108 .0002599 .0004424 .0002689 11 8.29 .44Share of vacant fulltime jobs -.0002477 -.0004009 -.0002327 -.0003027 5.25 6.01 3.66Population per km · · · · Kurtosis

Age 7962.29 8231.21 11811 10104.53 1.62 15.71 8.85Half months empl. in the last 24 months 18211.33 16415.04 23904.57 15972.42 3.16 9.7 .7Half months unempl. in the last 24 months 331.3 602.69 439.63 540.75 3.88 2.29 .72Time since last unemployment in the last 24 months (half-months) 15975.52 27816.77 22022.23 26122.82 10.04 4.4 1.13 · · · · km · · · · Note: In columns (1)-(4), we report the variance, skewness, and kurtosis of observed characteristics for the treated andnon-treated sub-samples. Information on individual characteristics refers to the time of inﬂow into unemployment, withthe exception of the elapsed unemployment duration and monthly regional labour market characteristics, which refer tothe (pseudo) treatment time. In columns (5)-(7), we report the standardised diﬀerences between the diﬀerent sub-samplesand the treatment group under the voucher system. All control variables that are not reported in this table have binarydistributions. The higher moments of these variables are precisely balanced in the matched samples. The change in dropout rates

In our interpretation of the negative eﬀects of voluntary participation over the short- andmedium-term after course start (comp. Section 3.7), we argue that participants mightchange their attitudes towards training in a positive way and participate with higher mo-tivation. If an increase in motivation actually occurs, we should see a lower dropout rateunder the voucher system compared to the mandatory system. Therefore, we implementa simple descriptive analysis to investigate the change in dropout rates under both allo-cation systems. Course completion or dropout is only observed for treated individuals.We deﬁne dropout as proposed by Paul (2015) if particiants complete less than 80% ofthe planned course duration.Table D.1: Marginal changes of dropout rate in the mandatory vs. voucher system

Dep. variable: Dropout yes/no (1) (2) (3)Post-reform period -.047 (.002) -.046 (.002) -.037 (.002)Personal characteristics No Yes YesEducation and occupation No Yes YesEmployment and welfare history No Yes YesTiming of unemployment and programme start No Yes YesState of residence No Yes YesProgramme type and durations No No Yes

Note: Marginal eﬀects after probit estimations based on the sample of treated individuals in Sample A.

We estimate diﬀerent speciﬁcations in which we add more control variables. In column(3), we use all available controls variables including dummies for diﬀerent planned coursedurations. In all speciﬁcations (1)-(3), the marginal eﬀect of the time dummy on thedropout rate is signiﬁcantly negative implying that the dropout rate decreases after thereform by about 4-5 percentage points. This supports our argumentation.10

Results for monthly earnings

Figure E.1: Overall reform, post-reform, and pre-reform treatment eﬀects

Figure E.2: Selection and overall reform eﬀects

Figure E.4: Policy eﬀects