[PDF] Does external medical review reduce disability insurance inflow?

Abstract

This paper investigates the effects of introducing external medical review for disability insurance (DI) in a system relying on treating physician testimony for eligibility determination. Using a unique policy change and administrative data from Switzerland, I show that medical review reduces DI incidence by 23%. Incidence reductions are closely tied to difficult-to-diagnose conditions, suggesting inaccurate assessments by treating physicians. Due to a partial benefit system, reductions in full benefit awards are partly offset by increases in partial benefits. More intense screening also increases labor market participation. Existing benefit recipients are downgraded and lose part of their benefit income when scheduled medical reviews occur. Back-of-the-envelope calculations indicate that external medical review is highly cost-effective. Under additional assumptions, the results provide a lower bound of the effect on the false positive award error rate.

Full PDF

DDoes external medical review reduce disabilityinsurance inﬂow?

Helge Liebert ∗ , †

Journal of Health Economics , 2019, Vol. 64, 108–128 https://doi.org/10.1016/j.jhealeco.2018.12.005

Abstract

This paper investigates the eﬀects of introducing external medical review for disabilityinsurance (DI) in a system relying on treating physician testimony for eligibility de-termination. Using a unique policy change and administrative data from Switzerland,I show that medical review reduces DI incidence by 23%. Incidence reductions areclosely tied to diﬃcult-to-diagnose conditions, suggesting inaccurate assessments bytreating physicians. Due to a partial beneﬁt system, reductions in full beneﬁt awardsare partly oﬀset by increases in partial beneﬁts. More intense screening also increaseslabor market participation. Existing beneﬁt recipients are downgraded and lose partof their beneﬁt income when scheduled medical reviews occur. Back-of-the-envelopecalculations indicate that external medical review is highly cost-eﬀective. Underadditional assumptions, the results provide a lower bound of the eﬀect on the falsepositive award error rate. ∗ Center for Disability and Integration, Department of Economics, University of St. Gallen, Rosenbergstr.51, 9000 St. Gallen, Switzerland. Email: [email protected]. † I thank the editor and three anonymous referees for their valuable comments. The paper beneﬁtedfrom discussions with Simone Balestra, Eva Deuchert, Beatrix Eugster, Per Johansson, Rafael Lalive,Michael Lechner, Nicole Maestas, Beatrice Mäder and seminar participants at the University of St. Gallen,the University of Uppsala/IFAU, the 2015 SOLE/EALE meeting in Montreal and the 2018 EuropeanWorkshop on Health Economics and Econometrics in Groningen. All remaining errors are my own. Thiswork was funded by the Swiss National Science Foundation under grant no. 100018_143317/1. a r X i v : . [ ec on . GN ] J a n Introduction

Targeted programs constitute the most common form of social protection worldwide.Beneﬁt payments are disbursed to groups identiﬁed by a common characteristic – families,the unemployed or persons with a work-limiting disability. Among the diﬀerent socialprograms, disability insurance (DI) is by far the most costly. The average OECD countryspends about 2.3% of GDP on disability-related beneﬁts (OECD 2010). In both the UnitedStates and Europe, the number of DI beneﬁciaries has been rising throughout the late20th and early 21st century and recently stabilized on a high level—on average about6% of the working age population in OECD countries receive disability beneﬁts (OECD2010). Increases in DI beneﬁciaries have often been associated with imperfect screening ofDI applicants (e.g. Autor and Duggan 2003). One indication for this is that the relativeprevalence of diﬃcult-to-diagnose health conditions like musculoskeletal or mental healthproblems on the DI rolls has increased at a higher rate than prevalence in the generalpopulation (Campolieti 2002, OECD 2010). Across OECD countries, 60% of DI inﬂowcan be attributed to muscoloskeletal conditions or mental health claims (OECD 2009).Disability beneﬁt decisions are made based on medical assessments of individuals’residual functional capacity, i.e., their remaining ability to work. However, the medicalassessment process required for eligibility determination diﬀers across countries. In 40% ofthe OECD countries surveyed in OECD (2003), the ﬁrst gatekeeper to the DI system is thetreating physician. In Norway, Switzerland and the United States—countries which arecharacterized by high rates of DI prevalence—treating physician testimony has historicallyoften been decisive for claims decisions. Treating physicians also hold an inﬂuential role inthe DI determination process in Australia, Denmark, Germany, Sweden, and the UnitedKingdom. In these DI systems, the treating physician submits the medical documentationof applicants’ diagnosis and treatment history to the DI administration. After submission,the documentation is reviewed by caseworkers—and potentially also by DI physicians.Whether treating physicians or DI-appointed physicians alone should assess residualfunctional capacity of DI applicants remains an open question. Treating physicians are con-sidered to have an informational advantage, hence their recommendation is often inﬂuentialin award decisions. The United States Social Security Administration (SSA) even adopteda ‘treating physician rule’ in 1991, giving ‘controlling weight’ to the treating physician’sopinion. At the same time, treating physicians are known to diagnose clients favorably inthe context of sick-listing, possibly to prevent harming a long-standing physician-patientrelationship (e.g. Zinn and Furutani 1996, Englund et al. 2000, Kankaanpää et al. 2012).Moreover, treating physicians are often general practitioners and not clinical specialists,and it is unclear whether complex disabling conditions can be accurately diagnosed bytreating physicians. For these reasons, treating physicians’ assessments are commonlysubjected to medical review by DI physicians, who are often clinical specialists.2his paper evaluates the eﬀectiveness of external medical review and its implications.Identiﬁcation relies on quasi-experimental policy variation generated by an extensivepilot program that preceded the nationwide introduction of mandatory medical review inSwitzerland. For the analysis, I develop a combined diﬀerence-in-diﬀerences and spatialmatching approach, embedded in an age-based duration analysis framework for estimation.The results indicate that introducing medical review reduces DI admissions by 23%.Reductions are closely tied to psychological and musculoskeletal conditions, diseases whichare more prone to inaccurate diagnoses. Medical review also increases labor marketparticipation. In an extension to the main analysis, I provide explicit identifying conditionsunder which the inﬂow reduction can be interpreted as a bound on the reduction in DIaward errors. Looking at the stock, I ﬁnd that existing beneﬁt recipients are downgradedand lose part of their beneﬁt income when scheduled medical reviews occur. Finally, Idemonstrate that medical review is highly cost eﬀective.In 2005, external medical review became mandatory for all DI applications in Switzer-land. This reform was preceded by a pilot, which introduced mandatory medical review inseveral Swiss cantons already in 2002. Medical review in this context means ﬁle-basedreview, exchange with treating physicians and personal examinations by oﬃcial DI andother third-party physicians. The reform had three major components. First, it substan-tially increased the medical staﬀ and funding directed towards reviewing DI applicants’cases, more than doubling the number of full-time equivalent staﬀ positions. Screeningquality was improved by substantially reducing the individual DI physicians’ caseloadand by directing cases to physicians’ specialized in the relevant ﬁeld. Second, the physi-cians are mandated to review all DI applications, to conduct medical checks if requiredand to provide the responsible DI caseworker with better information about applicants’health. Before the policy change, caseworkers relied on information provided by applicants’treating physicians for their decision, as the DI oﬃces had insuﬃcient resources to screenindividuals. Third, the policy also abolished legal obstacles that prevented DI physiciansfrom examining applicants in person or requesting further documentation. Meanwhile,the decision structure remains unchanged, the ﬁnal eligibility decision remains with theresponsible DI caseworker.This paper contributes to the literature on screening in DI by investigating medicalreview, a form of screening which has so far been largely neglected. DI screening involvestwo distinct aspects: stringency and quality . Interestingly, while screening has receivedconsiderable attention in the literature on DI, studies on screening in DI almost exclusivelyfocus on variations in screening stringency and use them to obtain a control group toidentify the disincentive eﬀect of DI on labor supply (e.g. Karlström et al. 2008, Mitra2009, de Jong et al. 2011, Staubli 2011, Maestas et al. 2013, French and Song 2014). Thesestudies rely on either explicit or implicit changes to eligibility criteria and the admittancethreshold for identiﬁcation and generally ﬁnd positive labor supply eﬀects of screening.3or example, de Jong et al. (2011), Maestas et al. (2013) and French and Song (2014) relyon variations in adjudicator stringency, while Karlström et al. (2008) and Staubli (2011)rely on explicit policy reforms that limited eligibility for certain groups. Looking at DI inAustria, Staubli (2011) shows that stricter eligibility requirements both reduce insuranceprevalence and increase labor supply. Naturally, these studies also often ﬁnd lower take-uprates of DI because individuals become mechanically ineligible for DI due to changes inthe admittance criteria.In this paper, I focus on the implications of medical review, an intervention thatinﬂuences screening quality by providing more information on individuals’ underlyingcapacity to work. Looking at medical review allows abstracting from mechanical inﬂoweﬀects which arise due to implicit eligibility requirement changes. Unlike stringencychanges, medical review does not inherently involve a trade-oﬀ between false positiveand false negative decision errors (e.g. Kleven and Kopczuk 2011, Low and Pistaferri2015). Since medical review is primarily targeting new DI applicants, I focus explicitlyon insurance incidence (inﬂow) in the analysis, since prevalence (stock) is likely to bemore inert. In addition, research has shown that inducing work take-up among long-termbeneﬁciaries can be diﬃcult and results regarding the employment capabilities of thisgroup are mixed (e.g. Kornfeld and Rupp 2000, Adam et al. 2010, Borghans et al. 2014,Bütler et al. 2015, Moore 2015, Garcia Mandico et al. 2018).Moreover, the results in this paper also relate to the ﬁndings of health condition-dependent eﬀect heterogeneity in the literature on disincentive eﬀects of DI and theliterature on misreporting of health status. In a seminal paper, Bound (1989) ﬁnds thatup to half of DI recipients in the US would be working in the absence of DI. Newer studieshave conﬁrmed Bound’s (1989) main result, but also show that there is considerable eﬀectheterogeneity (e.g. Chen and van der Klaauw 2008, von Wachter et al. 2011, Maestas et al.2013, French and Song 2014). Results by von Wachter et al. (2011) indicate that especiallyemployment of younger individuals and those who applied based on mental health andmuscoloskeletal conditions would be non-negligible in the absence of DI. Related to this,Campolieti (2006) notes that stricter DI entry requirements cause fewer reports of thesediﬃcult-to-diagnose conditions among older males. Using administrative records, I showthat medical screening reduces insurance inﬂow of diﬃcult-to-diagnose conditions andincreases labor market participation. This eﬀectively ties excess inﬂow of individualscapable of working to certain conditions and suggests that medical review is a cost-eﬀectivepolicy to reduce it. Other studies have observed that self-reports of disability diﬀer from objective measures of functionallimitations and that individuals out of the labor market tend to overstate health limitations (Butler et al.1987, Kreider 1999, Kreider and Pepper 2007, 2008). Exaggeration and malingering of health limitationsby patients in anticipation of insurance beneﬁts has also been documented in medical studies (e.g. Fruehet al. 2003) and the literature on worker compensation schemes (e.g. Staten and Umbeck 1982, Bolducet al. 2002).

Institutional background

The Swiss DI system is characterized by generous beneﬁts. Individuals can receive beneﬁtsfrom three main beneﬁt schemes: mandatory public DI, mandatory employer-providedoccupational pensions and optional private DI. Eligibility for beneﬁts is determined bythe local public DI oﬃce responsible for the main mandatory public scheme and bindingfor all other beneﬁt providers. Replacement rates are based on an individual’s previousincome, contribution history, whether the individual receives full or partial beneﬁts and thefamily situation. The full beneﬁt amount from the mandatory public DI scheme is cappedbetween 1,175 CHF and 2,350 CHF per month before taxes, depending on prior income,marriage and contribution history. Individuals with children receive an additional 40% ofthis amount for each dependent child. In addition, there are income-contingent beneﬁtsfor spouses and means-tested supplementary beneﬁts for recipients who fall below thesubsistence earnings threshold. The additional payouts from the mandatory occupationalpension scheme vary based on the contribution length and the employers contract terms.Focusing only on the two mandatory schemes, a 40 year old adult with full contributionhistory and average wage can expect a replacement rate of 70% if single, 80% if married, and100% if married with two children. At earnings below the average wage, the replacementrate increases sharply up to 120%, exceeding the prior earnings level (OECD 2006, 2010).Eligibility status and the beneﬁt amount from the main public DI scheme are determinedbased on an individual’s disability degree , a measure of work incapacity calculated as oneminus the ratio of potential labor market income with disability to the potential incomewithout disability (typically prior earnings). The determination of potential income isdirectly tied to a medical assessment of individuals’ residual work capacity . If granted,beneﬁts are paid indeﬁnitely, and are only revised if applicants’ health or earnings changesubstantially, or they become eligible for retirement pay. Unlike unemployment insurance(UI), DI beneﬁts are not attached to return-to-work measures. The Swiss system allowsfor partial disability beneﬁts in quarterly increments.The Swiss parliament passed a reform of the DI system in 2003 (

4. Revision desBundesgesetzes über die Invalidenversicherung ). Prior to this, medical review occuredinfrequently and DI caseworkers made their decisions based on medical assessmentssubmitted by the applicants’ treating physician. The treating physician-based screeningprocedure had been in place unrevised since 1973. The reform resulted in a large expansionof the medical staﬀ available for review of insurance applications and substantially extendedtheir legal competences. Physicians were tasked to conduct (re-) appraisals of beneﬁtclaims and authorized to carry out medical examinations.To assess the eﬀect of the institutional changes, the Federal Ministry of Social Insurancesdevised a pilot scheme. Beginning in 2002, 11 out of 26 cantons could already hire newstaﬀ and conduct medical review. In the remaining cantons, operation began in 20056 igure 1: Cantons with medical review during the pilot

Note: Pilot cantons shaded gray. Legend: ZH: Zürich, BE: Bern, LU: Lucerne, UR: Uri, SZ: Schwyz,OW: Obwalden, NW: Nidwalden, GL: Glarus, ZG: Zug, FR: Fribourg, SO: Solothurn, BS: Basel-Stadt, BL:Basel-Landschaft, SH: Schaﬀhausen, AR: Appenzell A.-Rh., AI: Appenzell I.-Rh., SG: St. Gallen, GR: Graubün-den, AG: Aargau, TG: Thurgau, TI: Ticino, VD: Vaud, VS: Valais, NE: Neuchâtel, GE: Geneva, JU: Jura. as scheduled by the reform proposal. Following the nationwide implementation in 2005,staﬀ funding was expanded further. The cantons that introduced medical review in 2002are shown in Figure 1. The cantonal DI oﬃces operate autonomously, but hold a yearlyjoint conference, during which participation in the early adopter program was decided(endogenous self-selection is addressed in more detail in section 4). The program was fullyfunded by the federal ministry.To become eligible for DI, individuals have to register with their local DI oﬃce.Applicants must register with the DI oﬃce corresponding to their place of residence andcannot ﬁle for beneﬁts elsewhere. When ﬁling a beneﬁt claim, applicants have their treatingphysician submit the medical documentation of their condition and their previous earningsrecords. The earnings loss induced by the condition must span at least twelve months toqualify for beneﬁts. The disability insurance oﬃce then assesses the individual earningsloss based on the severity of the condition and its impact on work capability. Based onthe assessment, the caseworker makes a decision whether the person qualiﬁes for beneﬁts.Prior to 2002, the insurance oﬃce could only assess eligibility from the medicalcertiﬁcates issued by the applicant’s chosen treating physician, typically the applicant’sgeneral practitioner. DI oﬃces were legally not allowed to examine the applicant, even7hen in doubt about the credibility or severity of the impediment. The DI caseworkersdeciding on the application have no medical training themselves, but could consult withphysicians working at the DI oﬃces if they deemed it necessary. However, the DI oﬃceswere notoriously understaﬀed with physicians. In 2006, the average DI physician reviewedabout 612 dossiers per year. Considering the changes in manpower, this ﬁgure wouldhave to be 2.25 times as high prior to the reform to ensure the same coverage given thatapplication numbers remained constant (Appendix Figure A4). For this reason, only asubset of selected dossiers were passed to the DI physicians for inspection. Caseworkerswere reliant on the medical assessment provided by the treating physician when awardingbeneﬁts.This situation changed with the reform, which essentially strengthened the role ofindependent DI physicians in the application process. There are three major changesattached to the policy. First, the reform substantially increased the medical staﬀ workingfor the DI oﬃces. Aggregate ﬁgures indicate that the number of full-time equivalentpositions increased by 125%. Nationwide, the number of staﬀ positions increased from105 to 235 due to the reform. Positions are distributed among cantons proportional tothe insured population, implying that the relative increase is the same for every region.Pilot cantons experienced this increase three years earlier (see Appendix Figure A2). New physicians are selected to have specialized in ﬁelds relevant to diagnose diﬃcult cases(e.g. rheumatology, orthopedics or psychiatry) and are trained in actuarial regulation.Second, medical review became mandatory for DI claims. Every applicants’ medicalhistory is reviewed and summarized in a non-technical report for the DI caseworker. Third,physicians were given the authority to screen people in person, to consult with treatingphysicians and order further examinations with other specialists. Before, reviews werelegally restricted to ﬁle-based review. The staﬀ is instructed to focus on new DI applicantsand aid with scheduled revisions of existing beneﬁciaries claim status.A schematic overview of the application process and the additional processes is depictedin Figure 2. Under the new system, the responsible DI physician always receives a completecopy of an individual’s insurance application, including the medical documentation ofpotential limitations. The DI physician then provides an evaluation of the applicant’seligibility for the DI caseworker. If the documentation is considered insuﬃcient, additionalinformation can be requested from treating physicians. Furthermore, if the physiciansnotice inconsistencies in the application or deem it to be invalid, they have the authorityto consult with the treating physician, to conduct further examinations or request visitsto other clinical specialist. The DI frequently uses the available channels to gather Since the reform more than doubled the number of physicians working at the DI oﬃces, thereis concern about delays in hiring staﬀ and ﬁlling positions. However, comparing the average share ofvacancies ﬁlled in 2006 between oﬃces in pilot and late adopter regions does not indicate that such delaysdid occur. Examples for inconsistencies are an applicant claiming beneﬁts on grounds of depression without a igure 2: The DI application and decision process filesregistration eligibilitydecision DISABILITYINSURANCE

DI OFFICE

APPLICANT DI CASEWORKER requestsinformation consults mandatory DI processmandatory medical reviewoptional medical reviewreportsmedicalexamination

TREATINGPHYSICIAN

DI PHYSICIAN medicalexamination submitsdocumentation additional information: Aggregate ﬁgures suggest that in-house examinations occur in upto 10% of cases, specialist consultations are decreed in up to 12% of cases and specialmultidisciplinary reports when multiple conditions are present are requested in up to 6%(Wapf and Peters 2007).The DI physicians’ eligibility evaluation is not binding. The ﬁnal decision on whetherbeneﬁts are granted remains with the responsible insurance caseworker and the actuarialrequirements are the same. This implies that the regulatory framework remains unchanged,only the provision of information about the subjects’ eligibility regarding health limitationsis aﬀected by the reform.

The main analysis regarding insurance inﬂow and the analysis of the labor market responseare both based on the SESAM (

Syntheserhebung soziale Sicherheit und Arbeitsmarkt ) dataset provided by the Swiss Federal Statistical Oﬃce. The SESAM data link the oﬃcialSwiss labor force survey (SAKE,

Schweizerische Arbeitskräfteerhebung ) to administrativerecords. The sample period ranges from 1999–2011. I rely on the SESAM data to analyzethe DI hazard because they are the largest representative administrative data sourceavailable which combines diﬀerent social security and labor market registers and hassuﬃcient coverage over time. Given the survey weights, the data is representative of the suﬃciently documented history of therapy or medication, or an individual with moderate chronic painclaiming full work incapacity. global sample(containing all individuals in all regions) and a local sample (containing only individuals inmunicipalities near the border between treated and control regions). Distance informationis available as both actual travel distance and travel time by car. I choose a travel distanceof 20 kilometers between municipalities as the threshold for the local sample. I thencompute nearest-neighbor estimation weights for this sample. The unrestricted globalsample comprises 259,323 individuals, the local sample is restricted to 133,549 individuals.(descriptive statistics are given in Appendix Table A2, the sample composition is mappedin Appendix Figure A1). In the estimations, I use the survey weights for the global sampleand nearest-neighbor weights for the local sample. All results in the paper are robust tothe choice of distance measure, variations in the threshold level and whether weights areapplied.As discussed in section 2, medical review also applies to scheduled reassessmentsof existing beneﬁciaries’ claim status. In the second part of the analysis, I investigatethe eﬀects of medical review on existing beneﬁciaries. For this analysis, I use a secondadministrative dataset provided by the Swiss Federal Ministry of Social Insurances. I usethe data to estimate the eﬀects of medical review on the disability degree classiﬁcationand beneﬁt payment in the beneﬁciary stock. Moreover, I rely on this data to investigatepotential outﬂow eﬀects in the beneﬁciary stock which could confound the main results(see section 4).The stock data tracks the stock of all existing DI recipients from 2001 onwards. Foreach individual, I observe the age of entry and the time spent on the DI rolls. In addition,the data register the actual disability degree, the beneﬁt amount paid out by the stateinsurance and the health limitations the person suﬀers from, among other socio-economicvariables. However, the stock data only register the region of residence, rendering localizedanalyses impossible. All stock analyses condition on individuals with beneﬁt receipt priorto treatment in 2001, such that results are unconfounded by new entries to the DI payroll. Microcensus data on mobility show that 80% of commuters stay within this distance limit, andit corresponds approximately to the average commuting distance and time in Switzerland (BSV 2012,Eugster and Parchet 2018). Empirical strategy

In this section, I develop the empirical approach used in the remainder of the paper.Section 4.1 discusses identiﬁcation and introduces the duration model used in the mainanalysis. Section 4.2 provides explicit identifying conditions for diﬀerence-in-diﬀerencesin a Cox (1972) proportional hazards model. Section 4.3 discusses potential mechanismsthat could violate these conditions and provides evidence to support their validity. Finally,section 4.4 explores and discusses additional identifying conditions which tighten theinterpretation of the reduced-form estimate, bounding the eﬀect of medical review on thefalse positive award error rate.

The main quantity of interest is the change in the population DI hazard induced byexternal medical review, i.e., the change in the rate of newly awarded beneﬁts amongpreviously non-receiving working-age individuals. However, due to an opaque politicaldecision process and self-selection into the early adopter scheme, treatment assignmentcannot be assumed to be fully random. The cantons participating in the pilot programare a mixture of high and low prevalence regions, and regional cooperation considerationswere relevant in the assignment process.A diﬀerence-in-diﬀerences identiﬁcation approach is used to evaluate the impact of themedical review institutions. Diﬀerencing removes time-invariant inﬂuences on potentialoutcomes. This removes bias due to selection into the program based on ﬁxed or inertaggregate regional diﬀerences. However, identiﬁcation still requires a common developmentof DI incidence in the absence of the expansion of medical review. This assumptionraises concerns related to regional heterogeneity and selection. The remainder of thissection introduces the modeling approach, the following sections present the identifyingassumptions and discuss potential threats to their validity.As Autor and Duggan (2003) illustrate, people rarely transition directly from employ-ment into DI, but typically apply conditional on job loss. One concern in the presentcontext is that labor markets may be less resilient in some regions, or that regions withstrong industrial and commercial hubs are more aﬀected by common economic shocks. Ifscreening is imperfect and disability insurance is used as an extension to unemploymentinsurance or an early retirement vehicle in case of job loss, diﬀerential labor market trendscan confound the results. Since Switzerland is a country with historically tight labormarkets, such concerns are alleviated to some degree. Nevertheless, there may also beother underlying diﬀerences between regions based on the self-selection into the pilotprogram that cause time-variant divergence. Remaining time-variant heterogeneity amongSwiss regions may raise concerns about biased treatment eﬀect estimates.12o address this issue, I follow a twofold approach. A ﬁrst set of results is based onthe full sample of individuals across all regions. A more narrow identiﬁcation approachfocuses on individuals in border regions within commuting distance between treated andcontrol areas. Focusing on these regions generates samples that are balanced in observablecharacteristics ex ante and increases the credibility of the common trend assumption.Similar strategies are used by Frölich and Lechner (2010) and Campolieti and Riddell(2012).However, local estimation approaches relying on sampling based on the distance to aborder can suﬀer from problems due to spatial clustering on diﬀerent sides along the border(cf. Keele and Titiunik 2016). To alleviate these concerns, I compute weights correspondingto nearest-neighbor pairwise diﬀerences and use them in the estimations. This weightingapproach is equivalent to spatial matching. The main advantage of weighting is that itcreates a sample that is well-balanced in observables and increases the credibility of theidentifying assumptions introduced in the next section. Weighting reduces the bias of theestimator by restricting comparisons to a more similar control group. The bias reductionpotentially comes at the cost of an increase in variance, since the estimator may not use allavailable data. In the context of matching, this bias-variance trade-oﬀ is often favorable,as the gain from ﬁnding good matches dominates the loss due to higher variance.For estimation, I exploit the spell format of the data and model insurance take-up as aduration problem. The main speciﬁcation uses a stratiﬁed Cox (1972) proportional hazardmodel to estimate the impact of the reform on DI incidence. The hazard rate is modeledas h ( t, P, D | X < ¯ x ) = h g ( t ) exp (cid:16) β P + β D + β P D (cid:17) , (1)where h g ( t ) is the non-parametric baseline hazard within birth cohort stratum g , t denotes time in years, D ∈ n , o is a binary treatment group indicator and P ∈ n , o is a binary time-varying indicator for the pilot period during t ∈ n , , o .Samples are restricted to individuals in border municipalities between treated and controlregions within an absolute distance threshold ¯ x (20 km in the main speciﬁcation), whereindividuals are similar in observables and remaining diﬀerences can credibly be assumedto be time-constant. The model is speciﬁed using age as the time scale. This is preferable to using time-on-study as analysis time due to the age-dependent nature of the disability hazard, the richcohort data available and the interest in the eﬀect of a time-varying covariate (Kom et al.1997, Thiébaut and Bénichou 2004). All models are stratiﬁed by ﬁve-year birth cohorts All estimates are robust across a large set of bandwidths and whether travel distance or travel timeis chosen as the distance metric. Moreover, the results are also robust to replacing (1) with a more ﬂexiblespeciﬁcation containing cantonal ﬁxed eﬀects.

13o account for cohort-speciﬁc diﬀerences in health environments. Individuals become atrisk when they are eligible for insurance at age 18. Censoring occurs at the sampling dateor when individuals reach the retirement age, whichever occurs ﬁrst. Disability beneﬁtreceipt constitutes failure. Due to data limitations, the analysis is restricted to singlespells and disability insurance is assumed to be an absorbing state. However, this is notmuch of an abstraction. Actual outﬂow rates due to reasons other than death or movingto the old-age pension system amount to less than 1% of the stock per year (BSV 2012).Previous research for Switzerland has shown that DI recipients are loath to give up safebeneﬁts even when faced with strong ﬁnancial incentives to do so (Bütler et al. 2015).A duration approach has a number of advantages compared to a linear diﬀerence-in-diﬀerences framework in this setting. It corresponds naturally to the spell format of theavailable cross-sectional data and the fact that DI entry is essentially a survival outcome.Data issues also limit the feasibility of the standard diﬀerence-in-diﬀerences approach. DIreceipt is observed retrospectively as year of entry and only repeated cross-sections ofa representative sample of the population are available. Since total DI incidence in thepopulation is low, actual DI entry observed in each sampling year is low and insuﬃcientfor the analysis. Note that DI entry year and sampling year can be distinct. As the DIentry year is observed for each recipient, irrespective of the sampling date, pooling alldata increases power substantially. This is due to the fact that all information on DI entryin any given year which is available from subsequent years in which data was sampled canbe utilized.Pooling all cross-sectional data and conducting the analysis by age instead of samplingyear (time-on-study) also limits the possibility of implicit sampling bias. With inﬂowobserved retrospectively, relying on absolute sampling time as the time measure forthe analysis would require creating a pseudo-panel structure by inferring past incidenceﬁgures from a post-treatment cross-section and adjusting for past eligibility. Since thedisability risk is concentrated at older ages near the oﬃcial retirement age, extrapolatingpast incidence causes bias due to intermittent entry into the retirement scheme. A non-negligible share of those in the old-age pension system at the sampling date may havereceived DI previously, but are not observed to do so any more when they are sampled.This share will increase the further past incidence ﬁgures are inferred retrospectively.Incidence ﬁgures inferred this way will be artiﬁcially low and the cross-sectional dataceases to be representative. Finally, estimation of eﬀects on incidence rates in a standard diﬀerence-in-diﬀerencesframework would require modifying the standard common trend assumption in a way Comparisons with aggregate data indicate that the reported aggregate rates are underestimated byabout 20% going back ﬁve years. Inferring incidence further retrospectively, inferred inﬂow continues todecrease as attrition caused by moving to the old age pension system and mortality increase. Going back30 years, inferred incidence converges to zero and is almost exclusively driven by small-sample variation ofindividuals who were awarded DI when they were very young.

The standard assumptions for diﬀerence-in-diﬀerences estimation have to be restated forproportional hazard models. The exponentiated coeﬃcient on the interaction betweentreatment time and region represents a ratio of hazard ratiosexp (cid:16) β (cid:17) = h ( t | D = 1 , P = 1) / h ( t | D = 1 , P = 0) h ( t | D = 0 , P = 1) / h ( t | D = 0 , P = 0) . (2)The distance condition has been dropped to ease notation. The eﬀect of interest is therelative change in the hazard for the treated, a relative average treatment eﬀect on thetreated (rATT), rATT = h ( t | D = 1 , P = 1) h ( t | D = 1 , P = 1) , (3)where h D denotes potential hazard rates. I assume SUTVA (Rubin 1977) holds, i.e., eitherof the two potential treatment states is observed. As disability insurance applicants are asmall fraction of the population, it is credible that general equilibrium eﬀects are absent.Identiﬁcation then requires the two usual conditions in restated form h ( t | D = 1 , P = 0) = h ( t | D = 1 , P = 0) , ( no anticipation , 4)and h ( t | D = 1 , P = 1) h ( t | D = 1 , P = 0) = h ( t | D = 0 , P = 1) h ( t | D = 0 , P = 0) . ( common trend , 5)The main identifying assumption (5) is that in the absence of mandatory medical review,incidence for individuals in both pilot and non-pilot (border) regions would have changed15roportionally. The common trend assumption is not invariant to the scaling of thedependent variable (e.g. Lechner 2010) and is modiﬁed accordingly. Instead of assuming acommon trend between regions over time in diﬀerences, I am assuming a constant hazardratio, i.e., a common relative change or a common absolute change in logs. In addition, Iassume that anticipation eﬀects are absent. Given these assumptions, the coeﬃcient ofthe interaction identiﬁes the hazard ratio of interest, the relative ATT. The two main threats to identiﬁcation are a violation of the no anticipation conditionand the common trend assumption. Prospective or ongoing reform changes may inducesome individuals to change their behavior in anticipation of future loss or gain. The mainconfounding mechanisms are mobility (individuals move to untreated regions to apply forDI) and the timing of applications (early application in anticipation of medical review).The implementation and chronology of the reform alleviate these concerns. The ﬁrstdraft of the reform which included the institutional changes introducing medical reviewwas proposed in parliament in February 2001, and underwent some revisions until beingapproved by popular vote in March 2003. The pilot project began already in January 2002,before the changes were approved. The early adopter scheme was scheduled immediatelyafter the reform proposal was publicised and began only ten months afterwards.Importantly, the pilot scheme was never publicly announced. Communication onlyoccurred internally between the Federal Ministry of Social Insurances and the DI oﬃcesand was never publicised. The person responsible for the yearly committee meetingconﬁrmed that the medical review pilot was never publicly communicated to outsiders.Pilots are published only since 2007, and the medical review pilot was one of the ﬁrstpilots launched by the ministry. To be certain, I conducted a systematic news search onnewspaper databases Factiva, LexisNexis, Pressreader and Swissdox. These do not list asingle record mentioning the early adopter program. Overall, the medical review changesimplied by the reform proposal received little public attention and were only scheduled tobe implemented in 2005. Moreover, considering the one-year earnings loss restriction required for DI eligibility,the time frame until implementation leaves limited scope for the strategic timing ofapplications in both treated and control regions, even if public knowledge of the programwere available. In the treated regions, the project started ten months after the ﬁrst reformproposal, eﬀectively leaving too little time for the strategic timing of applications in treatedregions. Similarly, there is only a relatively short time period between the reforms deﬁnite Other reform measures scheduled to come into eﬀect at a later time included the introduction of athree-quarter beneﬁt and the abolishment of additional beneﬁts for spouses. These measures received thebulk of public attention. The changes were adopted nationwide and only became eﬀective in late 2004.There were no further reforms to DI or other social insurances during the introduction period. igure 3: Trends for aggregate disability insurance inﬂow I n f l o w (a) Inflow I n f l o w (b) Inflow (partial weighted) Control Treated

Note: Disability insurance inﬂow for treated and control regions in panel (a). Insurance inﬂow partialweighted by pension amount shown in panel (b). Data were provided by the Federal Ministry of SocialInsurances. igure 4: Log cumulative hazard by age and treatment region − − − − − l n [ − l n ( s u r v i v a l ) ]

30 35 40 45 50 55 60 65AgeControl Treated

Note: Log-log plot showing log cumulative hazard estimates by age for individuals in treated and controlregions. and follow the same trend in treated and control regions (Appendix Figure A4). Regardingthe treatment, the number of full-time equivalent positions for DI physicians exhibitsthe expected increase due to the pilot and the nationwide implementation (AppendixFigure A2, Panel a). Correspondingly, the caseload per physician drops substantially(Appendix Figure A2, Panel b). Note that all descriptive graphs are based on data fromnational statistics or federal social insurance reports and not conditioned on the localsample.While an indication of comparability between regions, strictly seen, the trend plots inFigure 3 do not correspond to the dependent variable used in the estimations. Generatingan equivalent plot in an age-based duration framework is hindered by the fact that thetreatment occurs for every individual at a diﬀerent time in life, i.e., the age at whichthey experience the reform being implemented. An alternative test for the common trendassumption are the typical placebo speciﬁcations for pre-reform eﬀects (discussed amongthe robustness checks in section 5.5). These do not indicate that the common trend isviolated. Another possibility to investigate the assumption is to look at the log cumulativehazard by age as shown in Figure 4 (referred to as a ‘log-log plot’ in biostatistics). Sinceindividuals are randomly sampled across regions and have the same age distributions, thelog cumulative hazard estimates for both groups should be parallel.In addition, the log-log plot is a common diagnostic to assess the validity of theproportional hazards assumption in the Cox model, with non-parallel or crossing lines seenas an indication that the proportionality assumption is violated (e.g. Vittinghoﬀ et al. 2011).19isually assessing the validity of the assumption from the log-minus-log transformation ispreferable to comparing survival curves directly, as it is easier to determine whether twocurves are apart by a constant diﬀerence than to judge whether they are an exponentialtransformation. The curves in Figure 4 appear parallel and provide no indication thatthe proportional hazards assumption is violated. The same applies when the cumulativehazard estimate is stratiﬁed by ﬁve-year birth cohorts as in the analysis (Figure B1, OnlineAppendix).Finally, another potential issue pertains to bias incurred by selective sampling due to DIoutﬂow. Previous beneﬁt receipt is not registered in the data—only current beneﬁt receiptat the time of sampling is observed. If the reform aﬀected DI outﬂow as well, samplingmay be biased, as those who were barred from receiving insurance due to treatment arenot observed in later years. This may result in a selected sample with artiﬁcially lowerinﬂow in treatment regions. An actual outﬂow eﬀect would be mistaken for an inﬂoweﬀect due to unobserved dropout. I test for such outﬂow eﬀects using the stock data andthe results do not indicate that outﬂow is aﬀected. The results for outﬂow are discussedtogether with the robustness checks in section 5.5.

The assumptions outlined in section 4.2 are suﬃcient to identify the reduced form eﬀectof introducing mandatory external medical review on the DI inﬂow rate. This sectionexplores additional conditions under which the interpretation of the reduced form eﬀectcan be extended. The main estimation results in the paper indicate a reduction in DIinﬂow. By layering two additional assumption, this reduced form eﬀect can be interpretedas a lower bound of the eﬀect on false positive award errors. Conditioning on individuals’latent eligibility status, the total eﬀect can be decomposed into a mixture of eﬀects on thefalse positive and false negative DI misclassiﬁcation rates.It is the main duty of the insurance oﬃce to separate meritorious from non-meritoriousclaims (‘tag’ the eligible, Akerlof 1978). Given the null hypothesis of ‘no disability’, twotypes of classiﬁcation errors can occur in this situation: (1) Award errors (type-I, falsepositive) and (2) Rejection errors (type-II, false negative). If medical review is imperfect,beneﬁts may be awarded to persons who are ineligible, and deserving applicants may bedenied beneﬁts.Hence, medical review may not reduce insurance inﬂow unambiguously. Suppose thatintroducing mandatory medical review increases the probability to detect applicants’ truetype. This implies that medical review can reduce both type-I and type-II misclassiﬁcation, Since h ( t | x ) = h ( t )exp( xβ ), the equivalent relation for the survival curve is S ( t | x ) = S ( t ) exp( xβ ) .Visual inspection requires identifying this exponential relationship. The log-minus-log transformationof the last equation gives log( − log( S ( t | x ))) = log( − log( S ( t ))) + xβ , i.e., if the proportional hazardsassumption holds, the curves of the treatment groups should be a constant distance apart. E = n , o ,rATT = h ( t | D = 1 , P = 1 , E = 1) · p ( D = 1 , P = 1 , E = 1)+ h ( t | D = 1 , P = 1 , E = 0) · h − p ( D = 1 , P = 1 , E = 1) i h ( t | D = 1 , P = 1 , E = 1) · p ( D = 1 , P = 1 , E = 1)+ h ( t | D = 1 , P = 1 , E = 0) · h − p ( D = 1 , P = 1 , E = 1) i . (6)This underscores that the identiﬁed eﬀect is a mixture of changes in the hazard for botheligible and ineligible types. Using this expression, it is possible to explore the conditionsfor a negative treatment eﬀect—an inﬂow reduction, corresponding to a hazard ratiosmaller than one—depending on the eﬀect for each type separately. In the following, Isimplify notation by omitting the parameters common to all objects in the conditioningset.Unlike in other treatment eﬀect settings, population shares in (6) are superscripted bythe corresponding counterfactual states. In this setting, distinguishing them is sensibleas they can be thought of as shares of applications by eligibility types which might beinﬂuenced by the treatment. Ruling this out to ease interpretation, assume that p ( E = 1) = p ( E = 1) . ( no self-screening , 7)This assumption implies continuity in the composition of applications, eﬀectively ruling outthat the propensity to apply for DI is inﬂuenced by the pilot. The most likely mechanismto confound this assumption is self-screening, i.e., individuals are selectively discouragedfrom applying for beneﬁts (Parsons 1991). For the reasons outlined in the previous section,this behavior is unlikely since information about the pilot program did not transpire tothe public. Parsons’s (1991) original paper on the self-screening mechanism is about howchanges in screening stringency and administrative hassle which are perfectly observed byapplicants inﬂuence the application decision. The medical review process is largely hiddento the applicants and there is no information available to them detailing it. This view isalso supported by the data. Looking at the limited aggregate data available, applicationrates evolve similarly across both groups of cantons, are very stable over time and do notdiverge during the pilot (Appendix Figure A4). Due to the non-public introduction ofmedical review and the common trend in applications, diﬀerential variations in applicationbehavior are likely to be negligible.The estimation results in the next section indicate a reduction in inﬂow. This in mind21nd simplifying notation due to (7), the hazard ratio must be smaller than one,rATT = h ( t | E = 1) · p ( E = 1) + h ( t | E = 0) · h − p ( E = 1) i h ( t | E = 1) · p ( E = 1) + h ( t | E = 0) · h − p ( E = 1) i ≤ . (8)Rearranging gives h h ( t | E = 1) − h ( t | E = 1) i p ( E = 1) ≤ − h h ( t | E = 0) − h ( t | E = 0) i h − p ( E = 1) i , (9)i.e., the absolute value of the population-weighted treatment eﬀect for the ineligible mustexceed the population-weighted treatment eﬀect for the eligible to observe an aggregatereduction in inﬂow. This implies the reduction in award errors (type-I, RHS) must exceedthe reduction in rejection errors (type-II, LHS) for the eﬀect to be negative. This isconsistent with the interpretation of the eﬀect in (3) as a net eﬀect.Finally, assuming the treatment does not decrease inﬂow of eligible types, h ( t | E = 1) − h ( t | E = 1) ≥ , ( monotone treatment response for eligible types , 10)the left hand side of condition (9) is greater or equal zero. If medical review actuallydecreases the chances of the ineligible to get insurance beneﬁts, the weighted decrease inthe hazard for the ineligible must be less in absolute value than the weighted increasein the hazard for the eligible for the condition to be fulﬁlled. In this case, any observedinﬂow reduction can be interpreted as a net reduction in DI award errors.This assumption is not directly testable with the available data. It relies on the factthat medical review is an intervention to improve screening quality and, unlike variations inscreening stringency, does not involve a trade-oﬀ between false positives and false negatives(Parsons 1991, Kleven and Kopczuk 2011, Low and Pistaferri 2015). Alternatively, thecondition in (9) is trivially fulﬁlled if (10) is violated and medical review actually has theperverse eﬀect of worsening the chances of the truly eligible to get insurance, reducingtheir DI inﬂow hazard.I will consider the consequences of violations of these assumptions and how theycan be relaxed in turn. Assumption (7) posits that medical review does not change thecomposition of applications. This assumption could be weakened by assuming that medicalreview decreases the propensity of ineligible types to apply, i.e., p ( E = 1) ≥ p ( E = 1). This coincides with Parsons’s (1991) empirical result that self-screening is non-perverse.The ﬁnding is also conﬁrmed by Low and Pistaferri (2015), who ﬁnd that false applicationsdecrease with program stringency. Since medical review extracts information, those atthe margin of being discouraged from applying are those that are more likely to befound undeserving. The lower bound interpretation can be retained with non-perverse I am grateful to an anonymous referee for pointing out this possibility. As discussed, the institutional The leading physician in one oﬃce was aware of the fact that more intense medical review couldincrease DI incidence. She stated that in her experience, rejection errors do occur and are sometimes

The main results are presented in Table 1, separately for the unrestricted and the localsample. The ﬁrst column for each sample considers only spells which are censored orresult in failure before the end of the pilot period in 2005, the remaining columns use allrecorded spells and control for the post-treatment period in which the intervention wasextended nationwide. The last column adds individual control variables, including gender, encountered during revisions, but are much less frequent in relation to the amount of award errorsuncovered ex post. able 1: Disability incidence (a) Full sample (b) Local sample (within 20 km)(1) (2) (3) (4) (5) (6)Treat 1.322*** 1.322*** 1.236*** 1.150*** 1.151*** 1.148***(0.041) (0.041) (0.039) (0.061) (0.061) (0.061)Pilot time 1.083 1.088 1.110 1.257* 1.267** 1.298**(0.089) (0.089) (0.090) (0.148) (0.148) (0.152)Treat x pilot 0.856** 0.856** 0.860* 0.770** 0.771** 0.766**(0.067) (0.067) (0.068) (0.087) (0.087) (0.086)Post time 0.690*** 0.731*** 0.867 0.918(0.068) (0.072) (0.151) (0.160)Treat x post 0.971 0.970 0.841 0.829(0.078) (0.078) (0.105) (0.104)Other controls - - (cid:88) - - (cid:88) N municipalities 2,337 2,338 2,338 1,086 1,087 1,087N individuals 249,750 259,323 259,323 128,536 133,549 133,549N failures 7,877 9,204 9,204 3,985 4,693 4,693N failures during pilot 1,713 1,713 1,713 885 885 885Note: Cox Proportional Hazard estimates for individuals in treated and control regions based onSESAM individual-level survey and administrative data sampled during 1999–2011. Estimationsseparately for a complete representative sample of the Swiss population and only for individuals inthe vicinity of the border between treated and non-treated regions. Baseline hazard for all regres-sions stratiﬁed by 5-year birth cohorts. Survey weights applied for the full sample. Observations inthe local sample are weighted for nearest-neighbor pairwise diﬀerences. Results are reported in ex-ponentiated form as hazard ratios. The hazard ratio for ‘Treat x pilot’ corresponds to the relativeaverage treatment eﬀect on the treated as deﬁned in section 4. Standard errors clustered at the in-dividual level in parentheses, number of observations given below. *, ** and *** denote signiﬁcanceat the 10%, 5% and 1% level respectively. education, marital status, number of children and foreign citizenship. All speciﬁcationsstratify the baseline hazard by ﬁve year birth cohort intervals to account for cohort speciﬁcdiﬀerences in health environment. Survey weights are applied in the full sample such thatestimates are representative of the Swiss population. Observations in the local sample areweighted for pairwise nearest-neighbor estimation. All tables report hazard ratios, i.e.,exponentiated coeﬃcients and corresponding standard errors.All estimates of the eﬀect of the reform are negative (corresponding to a hazard ratioless than one) and signiﬁcant at conventional levels, indicating that third-party medicalreview signiﬁcantly reduced insurance inﬂow. The estimate for the full sample implies a14% reduction. The magnitude for the local sample is slightly higher and correspondsto a 23% lower inﬂow rate. Both estimates are stable in magnitude across speciﬁcations.The post coeﬃcient estimates are negative as well, reﬂecting the fact that the reform wasextended to the federal level after 2004 and funding increased even further. However, thepost estimates for the local sample are imprecise as the failure density in the local sampleis not dense enough in later years, when many observations are censored at the samplingdate.The preferred speciﬁcation for the remainder of the paper is given in column (5), sinceadding covariates does not aﬀect the results in a notable way. The remaining analysisfocuses on the local sample. Results for the main sample are qualitatively similar.External medical review is also likely to aﬀect the classiﬁcation of the severity of25 able 2: Disability classiﬁcation

All Partial Full DD < 70 DD ≥ health impediments for new awards. I analyse whether medical review changes the relativeincidence of partial and full beneﬁt awards. Results in Table 2 show that incidencereductions occur only for full beneﬁt awards (columns 2 and 3) and those due to limitationsclassiﬁed as very serious (disability degree of 70% or larger, columns 4 and 5). Estimatesfor partial beneﬁt awards and those classiﬁed as less serious are too imprecisely estimatedto draw a clear conclusion, but may be unaﬀected. One possible explanation is thatincidence reductions occur mainly for full beneﬁt applicants. However, it is unlikely thatonly applicants claiming 100% work incapability constitute the aﬀected marginal cases. Amore likely scenario is that DI incidence reductions occur at all latent health levels. Afterintroducing medical review, some individuals who would have received the full beneﬁtamount previously are now downgraded, resulting in a zero net eﬀect for partial DI beneﬁts.This ﬁnding is also reﬂected by a moderate decrease in the aggregate share of full beneﬁtawards—in 2005, 58% of new beneﬁciaries are awarded full beneﬁts compared to 68% in2002. The main analysis indicates that DI awards declined substantially due to external medicalreview, most likely due to a reduction in false positive beneﬁt awards. If the eﬀect is drivenby more accurate health and functional capacity diagnoses, then incidence reductions aremore likely to occur for diseases which are diﬃcult to diagnose and verify for treating26 able 3: Disability types

Illness: Illness: Illness: Congenital/All Illness Psych. Nerve MSC Accident Other(1) (2) (3) (4) (5) (6) (7)Treatment region 1.151*** 1.229*** 1.185* 1.100 1.245** 0.843 1.293**(0.061) (0.072) (0.106) (0.216) (0.136) (0.148) (0.162)Pilot period 1.267** 1.384** 1.450* 2.373* 1.412 0.900 0.795(0.148) (0.178) (0.282) (1.185) (0.330) (0.362) (0.201)Treat x pilot 0.771** 0.683*** 0.699* 0.377** 0.633** 1.729 1.150(0.087) (0.084) (0.129) (0.167) (0.145) (0.656) (0.290)Post time 0.867 0.974 0.667 1.737 1.285 0.175*** 1.220(0.151) (0.183) (0.188) (1.211) (0.460) (0.102) (0.441)Treat x post 0.841 0.733** 0.897 0.607 0.596** 6.436*** 0.748(0.105) (0.097) (0.176) (0.272) (0.156) (2.942) (0.197)N municipalities 1,087 1,087 1,087 1,087 1,087 1,087 1,087N individuals 133,549 133,549 133,549 133,549 133,549 133,549 133,549N failures 4,693 3,827 1,685 339 1,090 409 835N failures during pilot 885 753 352 61 210 59 149Note: Cox Proportional Hazard estimates for individuals in treated and control regions based on SESAMindividual-level survey and administrative data sampled during 1999–2011. Sample is based on individualsliving within 20 km of the border between treated and non-treated regions. Columns distinguish betweenDI awards due to diﬀerent health impairments. Baseline hazard for all regressions stratiﬁed by 5-year birthcohorts. Observations are weighted for nearest-neighbor pairwise diﬀerences. Results are reported in expo-nentiated form as hazard ratios. The hazard ratio for ‘Treat x pilot’ corresponds to the relative averagetreatment eﬀect on the treated as deﬁned in section 4. Standard errors clustered at the municipality levelin parentheses, number of observations given below. *, ** and *** denote signiﬁcance at the 10%, 5% and1% level respectively. physicians, the ﬁrst DI gatekeeper. The reduction will be most pronounced for illnesseswhich are both diﬃcult to diagnose and whose functional capacity implications are morelikely to be misjudged.Table 3 investigates this by diﬀerentiating between health impairments leading to beneﬁtawards. The results conﬁrm that reductions occur most frequently for diﬃcult-to-diagnoseconditions, while conditions which can typically be diagnosed unambiguously are notaﬀected. Looking at column (3) and (4), the eﬀect is pronounced for psychological diseasesand illnesses related to nerve problems. Beneﬁt awards due to mental health problemsare reduced by 30%. Nerve-related handicaps are reduced by over 60%, but incidence inthis group is generally very low. Column (5) looks at the incidence of musculoskeletalconditions (MSC). This category also includes a variety of conditions which are diﬃcult toverify (e.g. whiplash injuries, back pain). The hazard ratio suggest a substantial reductionin incidence as well. The speciﬁcation in column (6) looks at disability beneﬁt awards dueto handicaps incurred in accidents; the last column considers disabilities due to congenitaldefects and other diseases. These conditions are unlikely to be subject to award errors, asthere is rarely any ambiguity and they are typically well-documented. Indeed, there is noeﬀect on conditions which are unaﬀected by intensiﬁed medical review.

This section investigates the labor market reaction in response to external medical review.In case reductions in DI incidence are driven by rejections of individuals capable of27eturning to the labor market, medical review should also have a positive eﬀect onlabor market participation. Conversely, if the reduction is largely driven by rejections ofindividuals incapable of working, medical review should not have an eﬀect on employment,but possibly on the inﬂow into other social security programs (e.g. Inderbitzin et al.2016). Table 4 uses the pooled cross-sectional administrative SESAM data to estimate adiﬀerences-in-diﬀerences speciﬁcation using a linear model.The results in Table 4 for the full sample show that the share of individuals inregistered employment increases. Similarly, the share of individuals with positive (non-beneﬁt) earnings increases as well. In addition, the share of individuals registered withthe employment oﬃce as job seekers also decreases (columns 1–3). In columns (4) and(5), I consider other pathways from unemployment and reasons for not being registeredwith the employment oﬃce anymore. I ﬁnd no eﬀect on dismissal from the employmentoﬃce (and the associated return-to-work measures) due to exhausting unemployment themaximum duration for unemployment beneﬁts. Similarly, I ﬁnd no eﬀect on the receipt ofsocial assistance, the minimum social security provision. If rejected DI applicants wereincapable of working, we would expect to see an increase in these measures. However,the results do not provide evidence for this channel. The results for the local sample arecomparable in sign and magnitude to the estimates for the full sample. However, theyare insigniﬁcant, most likely due to a lack of power ( p = 0 .

17 for the main employmentestimate in the local sample).An explanation for these results is that DI applications are partly made by peoplecapable of gainful employment and driven by moral hazard. One possible mechanismbehind this result is the canonical substitution eﬀect interpretation—applicants seekbeneﬁts due to a distortion in the relative price of leisure. This distortion is caused by animplicit tax on work due to DI (’cash cliﬀs’). An alternative explanation is that applicationsare (partly) due to income eﬀects, i.e., even if work is not implicitly taxed by the DIprogram, given the transfer payments, beneﬁciaries may prefer leisure to labor (e.g. Autorand Duggan 2007, Eugster and Deuchert 2017, Gelber et al. 2017). These eﬀects havediﬀerent welfare implications. If DI reduces labor supply through the substitution eﬀectthis implies a deadweight loss, which would be reduced by medical review. Alternatively,medical review would not be welfare improving if all of the labor supply increase is due toa reduced income eﬀect. Since DI is provided (partially) contingent on work, I am unableto separate these eﬀects. Taken together, the evidence from the analysis suggests thatdistorted incentives are likely to matter in this context.

Although the primary task of the medical staﬀ is to screen applicants, they also aid withreviews of recipients’ disability degree classiﬁcation. While scheduled by law to occur28 able 4: Labor market responses to medical review (a) Full sampleEmployment oﬃceWorking Positive income registration dismissal Social assistance(1) (2) (3) (4) (5)Treat x pilot 0.009*** 0.008** −0.007*** −0.002 0.000(0.004) (0.003) (0.002) (0.001) (0.001)Individual covariates (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)

Canton FE (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)

Year FE (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)

N 556,540 557,270 411,461 411,461 411,461(b) Local sample (within 20 km)Employment oﬃceWorking Positive income registration dismissal Social assistance(1) (2) (3) (4) (5)Treat x pilot 0.007 0.006 −0.003 0.000 0.001(0.005) (0.005) (0.003) (0.002) (0.002)Individual covariates (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)

Canton FE (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)

Year FE (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)

N 282,858 283,111 208,340 208,340 208,340Note: Linear model estimates for individuals in treated and control regions based on SESAMindividual-level survey and administrative data sampled during 1999–2011. Estimations separatelyfor a complete representative sample of the Swiss population (panel a) and only for individuals in thevicinity of the border between treated and non-treated regions (panel b). All models include cantonaland year speciﬁc eﬀects and control for gender, age and native status. Standard errors clustered atthe municipality level given in parentheses. *, ** and *** denote signiﬁcance at the 10%, 5% and 1%level respectively. able 5: Stock reclassiﬁcation and pension cuts (a) Disability degreeAll All illnesses Psychological MSC Accident CongenitalTreated region 3.04*** 3.41*** 3.66*** 2.78*** 1.67*** 1.19***(0.08) (0.09) (0.13) (0.18) (0.26) (0.18)Pilot 0.49*** 0.60*** 0.60*** 0.39*** 0.46*** 0.30**(0.06) (0.07) (0.09) (0.13) (0.17) (0.13)Treat x pilot −0.35*** −0.42*** −0.58*** −0.39* −0.24 −0.10(0.09) (0.11) (0.15) (0.20) (0.30) (0.21)Post 1.63*** 1.80*** 1.44*** 0.87*** 0.89*** 1.39***(0.05) (0.06) (0.09) (0.12) (0.16) (0.12)Treat x post −0.52*** −0.61*** −0.83*** −0.47** −0.39 −0.28(0.09) (0.10) (0.14) (0.19) (0.28) (0.19)Constant 78.54*** 77.98*** 82.75*** 72.31*** 74.53*** 86.07***(0.05) (0.06) (0.08) (0.11) (0.15) (0.11)(b) Pension amountAll All illnesses Psychological MSC Accident CongenitalTreated region 124.68*** 141.87*** 101.22*** 133.26*** 88.41*** 8.36***(2.20) (2.61) (3.67) (4.94) (7.25) (3.14)Pilot 37.04*** 39.92*** 33.12*** 39.02*** 33.29*** 28.43***(1.55) (1.85) (2.62) (3.56) (4.83) (2.25)Treat x pilot −17.25*** −21.77*** −17.79*** −21.90*** −8.10 −0.45(2.52) (2.98) (4.17) (5.66) (8.33) (3.64)Post 143.12*** 148.37*** 126.96*** 139.82*** 122.50*** 126.26***(1.46) (1.75) (2.46) (3.38) (4.54) (2.10)Treat x post −41.25*** −48.74*** −33.50*** −50.51*** −19.17** −1.61(2.38) (2.82) (3.92) (5.38) (7.84) (3.39)Constant 1232.08*** 1221.01*** 1311.78*** 1134.61*** 1199.86*** 1343.85***(1.35) (1.61) (2.30) (3.10) (4.20) (1.95)N 2,489,323 1,884,876 887,604 537,191 282,224 274,918Note: Estimates from a linear model. Outcomes are the disability degree in percent (panel a) andthe eﬀective beneﬁt amount paid to recipients in panel (b). The reference group are individuals inthe non-treated regions in 2001. Based on administrative panel data provided by the Swiss FederalMinistry of Social insurances which tracks the complete stock of Swiss DI beneﬁt recipients in 2001until 2011. Standard errors in parentheses, number of observations given below. *, ** and *** denotesigniﬁcance at the 10%, 5% and 1% level respectively. regularly, revisions seldom resulted in actual disability degree or beneﬁt cuts and typicallyinvolved DI caseworkers going over beneﬁciaries ﬁles without personal contact. Revisionsalso commonly take place if applicants have submitted new medical information, typicallydocumenting deteriorating health, and often result in beneﬁt increases. With the newregime in place, ﬁles that are scheduled for review are now also passed to the DI physiciansin charge of medical review.To assess whether stock reclassiﬁcations occur, I estimate a linear diﬀerence-in-diﬀerencemodel using data for the stock of all DI beneﬁciaries in Switzerland in 2001. I conditionon beneﬁt receipt prior to treatment and track the changes to the disability degree andthe eﬀective beneﬁt payments of existing beneﬁciaries over time. Results are given inTable 5. The sample is again stratiﬁed by disease groups. The outcome in Panel (a)is the individual disability degree, Panel (b) looks at the beneﬁt amount. On average,recipients are classiﬁed less disabled by 0.35 percentage points and lose about 17 CHF inmonthly beneﬁts. The eﬀect magnitudes are small since reclassiﬁcation remains a rareevent. Summary statistics indicate that only 9.3% of individuals of the 2001 stock are30eclassiﬁed during the three years of the pilot period. Complete denial of beneﬁts aftera revision occurs only in exceptional cases. Upward revisions are far more common,downward changes only account for 2.3 percentage points. Still, introducing mandatorymedical review appears to cause revisions of the disability status of beneﬁciaries whosedocumentation is deemed insuﬃcient, suspicious or whose health has improved. Boththe disability classiﬁcation and payouts are again only adjusted for those beneﬁciarieswith illnesses which are more diﬃcult to screen. Again, cuts are most pronounced forthose who receive DI due to mental health problems or musculoskeletal conditions, whilebeneﬁciaries with congenital diseases or handicaps incurred in accidents are unaﬀected.Unlike previously, nerve-related diseases are not declared in this data.

To assess the validity of the main identifying assumption, I test the eﬀect of a placebo reformprior to the treatment period and assume a pseudo-treatment to be eﬀective during 1999–2001. Hazard ratio estimates across all speciﬁcations are close to one, precisely estimatedand insigniﬁcant at conventional levels, supporting the validity of the identiﬁcation strategy(Appendix Table A4). Placebo results for employment also do not indicate any violationof the common trend assumption (Table B3, Online Appendix).Another potential concern is that the results are sensitive to the choice of distancewindow. Figure A7 addresses this issue by plotting treatment eﬀect estimates across alarge set of bandwidths, using both actual travel distance and travel time as distancemeasures. The coeﬃcient of interest remains stable in size and signiﬁcant across a largeset of distances. The estimates consistently suggest at least a 20% reduction in incidencein the treatment group during the pilot program. More detailed estimates over selecteddistances are provided in Appendix Table A5.As discussed in section 4.3, potential outﬂow eﬀects of the reform might confound themain result. Since previous beneﬁt receipt is unobserved, outﬂow eﬀects would lead toinﬂow being measured with error in the sample. I use the stock data to test for outﬂoweﬀects. A duration model similar to the main speciﬁcation is estimated for those whoare beneﬁciaries prior to treatment in 2001. Exit from the DI rolls is considered failure,individuals are censored at the sampling limit in 2011 or when they exit at the relevantpension age. Variable measurements are less clean-cut in this case. Exit due to work orexpulsion cannot be separated. However, there is no explicit reason why trends in worktake-up by insurees (a similarly rare event) should diﬀer between regions. Results are givenin Appendix Table A6, separately for all individuals and those below age 50 in 2001, anage requirement which prohibits early retirement within the analysis horizon and selects a Complete beneﬁt denial is legally diﬃcult, unless fraud or malingering are proven beyond reasonabledoubt. These cases also require high up-front investment from DI oﬃces and are initiated only in extremecases.

This paper provides a comprehensive evaluation of the introduction of medical review forDI applications in a setting in which treating physician testimony is decisive. The resultsindicate that external medical review can reduce insurance inﬂow substantially. The mainestimate suggests that medical review reduces DI uptake by 23%. Reductions are closelytied to diﬃcult-to-diagnose conditions, suggesting a more accurate assessment of complexor multidisciplinary diseases. This is corroborated by the fact that disability status andbeneﬁt revisions in the stock of recipients occur only for individuals with the same typesof conditions and the fact that medical review also increases labor market participation.32nder additional assumptions, the results suggest that medical review is likely to reducethe amount of false positive award errors and that these errors occur frequently in theabsence of medical review.Results from the local approach (sample restricted to commuting distance aroundborders) have the same sign and are comparable in magnitude to the global approach (usingthe full sample). The distance variations in Figure A7 consistently suggest a reductionin the hazard of about 20%. Considering the sizeable eﬀect of medical review on the DIhazard, it is illustrative to assess how large the absolute eﬀects induced by introduction ofthe medical reviews are. Looking at the main speciﬁcation, without treatment, the baselineDI hazard in the treated regions is about 0.38%, i.e., on average 3.8 persons per thousandenter DI. The medical review process reduces this by about 23% to 0.29%, implying thatapproximately one person less per thousand enters DI due to a second medical assessment.Given the substantial present-discounted value of DI beneﬁts, it is interesting to examinewhether external medical review is a cost-eﬀective policy. Simple back-of-the-envelopecalculations indicate that outlays for hiring physicians are more than oﬀset by reductionsin the beneﬁciary payload. The calculations are based on the observed increase in thenumber of physicians, a conservative eﬀect estimate and the average beneﬁt amount andremaining spell duration until retirement, assuming rejections are permanent. Based onthese parameters, the yearly savings only in the treated regions during the pilot are likelyto be above 650 million Swiss Francs (approximately 650 million US$ in 2018). Extendingmedical review nationwide in 2005 may have saved in excess of 1.2 billion Swiss Francsin that year alone. Even if all rejected applicants never reenter the labor market andimmediately receive social assistance, estimated yearly savings for 2005 are upwards of 500million Swiss Francs. These calculations disregard the fact that beneﬁt decisions are tied toadditional occupational beneﬁts and private pension schemes, which are substantially moregenerous than the main state DI beneﬁts and would result in further savings. Nevertheless,the yearly savings far exceed potential outlays for the medical personnel that was hired.Introducing external medical review is a highly cost-eﬀective tool to reduce insuranceinﬂow.Taken together, the results cast doubt on the practice to assign a large weight to thetreating physician’s opinion in DI insurance decisions. Considering that inﬂow reductionsare restricted to diﬃcult-to-diagnose conditions and the results indicate that work take-upincreases when medical review is done by clinical specialists, treating physicians may not bewell-suited to serve as the main gatekeeper to DI. This result corroborates medical studieswhich posit that specialists may be better suited to judge social insurance eligibility thanpersonal physicians (e.g. Novack et al. 1989, Zinn and Furutani 1996, Freeman et al. 1999,Wynia et al. 2000, Everett et al. 2011). In addition, treating physicians have often voiceddiscomfort with being both care-takers of patients and gatekeepers to public insurancesystems. In surveys, physicians are overwhelmingly in favor of designating independent33hird-party physicians to determine disability status to prevent damaging physician-patientrelations (e.g. Zinn and Furutani 1996).Since external medical review by DI physicians appears to be eﬀective in the Swisssetting, it might provide a viable policy option for other countries which are burdenedby high disability insurance costs and rely on treating physician assessments for DI.However, it is important to bear in mind that prior to the reform, medical review wasconducted almost exclusively by treating physicians and DI physicians could not examinepatients. Both the policy impact and the size of award errors are likely to depend onthe initial level of screening intensity. Still, treating physician testimony is inﬂuentialfor DI determinations in many OECD countries. The results suggest that subjectingtreating physcicians’ opinions to medical review by a third party is a cost-eﬀective policyto regulate inﬂow and award errors. Since the policy also lifted bans on personal medicalexaminations, the changes in Switzerland can potentially also provide some insight aboutextending medical review in systems which exclusively rely on ﬁle-based review.It is important to note that screening during the pilot does not necessarily come at thecost of increased program complexity (e.g. as modeled by Kleven and Kopczuk 2011). Theadditional administrative hassle is low, and there are few visible additional up-front costsborne by the applicant. As such, external medical review is unlikely to discourage take-upstrongly in the long-term. This situation might diﬀer if medical review is announcedpublicly. Since medical review extracts information, it may also discourage ineligibleapplicants from applying for beneﬁts, as they have higher chances to be ultimately denied.This deterrence eﬀect is found to be pronounced by Low and Pistaferri (2015).The mechanisms behind the results in this paper merit further investigation. Onepossible channel behind the incidence reductions are inaccurate diagnoses by treatingphysicians, the ﬁrst gatekeeper to the DI system. However, whether and how muchapplication behavior suﬀers from moral hazard remains ultimately unclear. Applicantscould be largely myopic or actively engage in malingering. Still, the overall reduction ininﬂow provides a tentative suggestion that award errors exceed rejection errors in awarddecisions. This result diverges from previous analyses for the US. However, given thatbeneﬁts are substantially more generous in Switzerland, this ﬁnding is in line with Lowand Pistaferri’s (2015) result that false applications are strongly increasing with beneﬁtgenerosity. Hence, the result is also a ﬁrst indication that the relative prevalence of errorsmay be diﬀerent in European DI systems which oﬀer higher replacement rates. Separatingtype-I and type-II classiﬁcation errors more cleanly and examining the mechanisms throughwhich they occur remains a promising pursuit for further research.34 eferences

Adam, S., Bozio, A. and Emmerson, C. (2010). Reforming disability insurance in theuk: Evaluation of the pathways to work programme.

Working paper , Insitute for FiscalStudies, London.Akerlof, G. A. (1978). The economics of “tagging” as applied to the optimal income tax,welfare programs, and manpower planning.

The American Economic Review

The Quarterly Journal of Economics

American Economic Review

NBER Working Papers 10219 ,National Bureau of Economic Research, Inc.Bolduc, D., Fortin, B., Labrecque, F. and Lanoie, P. (2002). Workers’ compensation, moralhazard and the composition of workplace injuries.

The Journal of Human Resources

American Economic Journal: Economic Policy

Working Paper 2816 , National Bureau of Economic Research.BSV (2012).

Statistiken zur sozialen Sicherheit – IV-Statistik 2011 . Bundesamt fürSozialversicherungen.Butler, J. S., Burkhauser, R. V., Mitchell, J. M. and Pincus, T. P. (1987). Measurementerror in self-reported health variables.

The Review of Economics and Statistics

IZA Journal of Labor Policy

Canadian Public Policy / Analyse de Politiques

Contributions to Economic Analysis & Policy

Journal of Public Economics

Journal of Econometrics

Journal of the Royal StatisticalSociety. Series B (Methodological)

Scandinavian Journal of Primary Health Care

Economics Working Paper Series 1709 , University of St. Gallen,School of Economics and Political Science.Eugster, B. and Parchet, R. (2018). Culture and taxes.

Journal of Political Economy (forthcoming).Everett, J. P., Walters, C. A., Stottlemyer, D. L., Knight, C. A., Oppenberg, A. A. andOrr, R. D. (2011). To lie or not to lie: Resident physician attitudes about the use ofdeception in clinical practice.

Journal of Medical Ethics

Archives of Internal Medicine

American Economic Journal: Economic Policy

Journal of the American Statistical Association

Psychiatric Services

Working paper , Tinbergen Institute.Gelber, A., Moore, T. J. and Strand, A. (2017). The eﬀect of disability insurance paymentson beneﬁciaries’ earnings.

American Economic Journal: Economic Policy

AmericanEconomic Journal: Economic Policy

Journal of the European Economic Association

The European Journal of Public Health

Journal of PublicEconomics

PoliticalScience Research and Methods

American Economic Journal: Economic Policy

American Journal of Epidemiology

Social Security Bulletin

Journal of Human Resources

Journal of the American Statistical Association

Journalof Applied Econometrics

Foundations and Trends in Econometrics

American Economic Review

Journal ofPublic Economics

Disability and rehabilitation: Legal, clinical, and self-concepts andmeasurement.

Columbus, Ohio State University Press.Novack, D., Detering, B., Arnold, R., Forrow, L., Ladinsky, M. and Pezzullo, J. (1989).Physicians’ attitudes toward using deception to resolve diﬃcult ethical problems.

JAMA

Transforming Disability into Ability . Paris, OECD Publishing.OECD (2006).

Sickness, Disability and Work: Breaking the Barriers—Norway, Polandand Switzerland, Vol. 1 . Paris, OECD Publishing.OECD (2009). Sickness, disability and work: Keeping on track in the economic downturn.

Working paper , High-Level Forum, Stockholm.OECD (2010).

Sickness, Disability and Work: Breaking the Barriers—A Synthesis ofFindings across OECD countries . Paris, OECD Publishing.Parsons, D. O. (1991). Self-screening in targeted public transfer programs.

Journal ofPolitical Economy

Journal of PublicEconomics

Journalof Educational and Behavioral Statistics

Journal ofPublic Economics

The Social Security Disability program: Anevaluation study . 39, US Social Security Administration, Oﬃce of Research and Statistics.Staten, M. E. and Umbeck, J. (1982). Information costs and incentives to shirk: Disabilitycompensation of air traﬃc controllers.

The American Economic Review

Journal of Public Economics

Statistics in Medicine

Regressionmethods in biostatistics: linear, logistic, survival, and repeated measures models . SpringerScience & Business Media.von Wachter, T., Song, J. and Manchester, J. (2011). Trends in employment and earningsof allowed and rejected applicants to the social security disability insurance program.

American Economic Review

Beiträgezur Sozialen Sicherheit,

Bericht im Rahmen des mehrjährigen Forschungsprogramms zuInvalidität und Behinderung, Forschungsbericht Nr. 13/07.Wynia, M., Cummins, D., VanGeest, J. and Wilson, I. (2000). Physician manipulationof reimbursement rules for patients: Between a rock and a hard place.

JAMA

Journal of General Internal Medicine ppendix A: Additional tables and ﬁgures

Table A1: DI recipients before and after ﬁling for beneﬁts

DI ﬁling-2 -1 year +1 +2

Worked last week 0.761 0.622 0.368 0.320 0.296(197) (410) (810) (1268) (1457)Looking for work last month 0.484 0.333 0.120 0.096 0.079(31) (84) (357) (748) (953)Work contract but absent at work last week 0.326 0.455 0.294 0.120 0.057(46) (154) (506) (845) (1006)Yearly income (1k CHF) 53.113 47.785 33.262 17.284 11.574(197) (410) (810) (1268) (1457)Dismissed from unemployment oﬃce 0.025 0.053 0.061 0.056 0.066(79) (206) (445) (784) (948)Social assistance 0.051 0.058 0.038 0.051 0.044(79) (206) (445) (784) (948)Age 47.365 48.480 50.022 50.445 50.648(197) (410) (810) (1268) (1457)Mental or physical problem 0.234 0.393 0.688 0.844 0.836(124) (262) (523) (841) (980)Accident within the last 12 months 0.208 0.176 0.118 0.057 0.082(48) (85) (136) (174) (184)Note: This table shows the mean values of selected variables for DI recipients from two years prior toﬁling the application until two years afterwards. The table utilizes the limited longitudinal informa-tion that is available in the SESAM data. The number of observations in a cell is given in parentheses.Note that sample sizes vary because not all recipients have the same historic coverage and not all sur-vey modules are administered every year. able A2: Descriptive statistics (a) Full sampleMean SD Min Max NAll individualsAge 50.316 18.033 18.0 104.0 259,323Female 0.539 0.498 0.0 1.0 259,323Married 0.552 0.497 0.0 1.0 259,323Foreign 0.322 0.467 0.0 1.0 259,323Nr. of children 0.582 0.973 0.0 7.0 259,323Education: Primary 0.234 0.423 0.0 1.0 259,323Education: Secondary 0.510 0.500 0.0 1.0 259,323Education: Tertiary 0.255 0.436 0.0 1.0 259,323Gross annual earnings 41.450 107.251 0.0 42,317.4 259,323Travel distance (km) 34.297 31.825 0.2 194.1 259,323Travel time (min) 31.411 23.167 0.6 169.5 259,323Unemployed 0.027 0.163 0.0 1.0 259,323Receives DI 0.035 0.185 0.0 1.0 259,323RegionLéman 0.191 0.393 0.0 1.0 259,323Mittelland 0.194 0.396 0.0 1.0 259,323Nordwestschweiz 0.136 0.343 0.0 1.0 259,323Zürich 0.166 0.372 0.0 1.0 259,323Ostschweiz 0.122 0.328 0.0 1.0 259,323Zentralschweiz 0.107 0.310 0.0 1.0 259,323Tessin 0.083 0.275 0.0 1.0 259,323DI recipientsYears in DI 9.415 6.847 0.0 48.0 9,204Disability: Psych. problems 0.341 0.474 0.0 1.0 9,204Disability: Nerve 0.072 0.259 0.0 1.0 9,204Disability: Muscoloskeletal cond. 0.235 0.424 0.0 1.0 9,204Disability: Accident 0.092 0.289 0.0 1.0 9,204Disability: Congenital disease/other 0.185 0.388 0.0 1.0 9,204(b) Local sample (within 20 km)Mean SD Min Max NAll individualsAge 49.950 18.019 18.0 104.0 133,549Female 0.538 0.499 0.0 1.0 133,549Married 0.546 0.498 0.0 1.0 133,549Foreign 0.329 0.470 0.0 1.0 133,549Nr. of children 0.580 0.972 0.0 7.0 133,549Education: Primary 0.226 0.418 0.0 1.0 133,549Education: Secondary 0.510 0.500 0.0 1.0 133,549Education: Tertiary 0.265 0.441 0.0 1.0 133,549Gross annual earnings 43.252 134.295 0.0 42,317.4 133,549Travel distance (km) 11.871 4.753 0.2 20.0 133,549Travel time (min) 14.981 5.170 0.6 30.1 133,549Unemployed 0.027 0.163 0.0 1.0 133,549Receives DI 0.035 0.184 0.0 1.0 133,549RegionLéman 0.119 0.324 0.0 1.0 133,549Mittelland 0.156 0.363 0.0 1.0 133,549Nordwestschweiz 0.260 0.439 0.0 1.0 133,549Zürich 0.256 0.436 0.0 1.0 133,549Ostschweiz 0.068 0.252 0.0 1.0 133,549Zentralschweiz 0.140 0.347 0.0 1.0 133,549Tessin 0.000 0.003 0.0 1.0 133,549DI recipientsYears in DI 9.294 6.779 0.0 47.0 4,693Disability: Psych. problems 0.359 0.480 0.0 1.0 4,693Disability: Nerve 0.072 0.259 0.0 1.0 4,693Disability: Muscoloskeletal cond. 0.232 0.422 0.0 1.0 4,693Disability: Accident 0.087 0.282 0.0 1.0 4,693Disability: Congenital disease/other 0.178 0.382 0.0 1.0 4,693Note: Descriptive statistics for the unrestricted and the local estimation sample. Based onthe 1999–2011 SESAM data. able A3: Pre-treatment covariate balance (a) Full sample (b) Local sample (within 20 km)Total Treated Control Diﬀerence Total Treated Control DiﬀerenceAll individualsAge 48.34 47.74 48.66 −0.926*** 48.55 48.53 48.68 −0.153(18.28) (18.83) (17.95) (0.309) (18.56) (10.61) (40.06) (0.605)Female 0.54 0.55 0.54 0.009 0.55 0.55 0.54 0.009(0.50) (0.52) (0.49) (0.009) (0.50) (0.29) (1.08) (0.016)Married 0.52 0.58 0.50 0.078*** 0.52 0.53 0.51 0.021(0.50) (0.52) (0.49) (0.009) (0.50) (0.29) (1.08) (0.016)Foreign 0.09 0.12 0.08 0.043*** 0.13 0.14 0.11 0.027***(0.29) (0.34) (0.26) (0.005) (0.34) (0.20) (0.67) (0.010)Nr. of children 0.56 0.66 0.51 0.142*** 0.57 0.57 0.59 −0.023(0.98) (1.08) (0.91) (0.018) (0.98) (0.56) (2.15) (0.035)Education: Primary 0.21 0.23 0.20 0.028*** 0.24 0.24 0.22 0.024*(0.41) (0.44) (0.39) (0.007) (0.43) (0.25) (0.89) (0.014)Education: Secondary 0.59 0.59 0.60 −0.010 0.58 0.58 0.60 −0.021(0.49) (0.52) (0.48) (0.009) (0.49) (0.28) (1.06) (0.016)Education: Tertiary 0.20 0.19 0.21 −0.019*** 0.18 0.18 0.18 −0.004(0.40) (0.41) (0.40) (0.007) (0.39) (0.22) (0.84) (0.012)Gross annual earnings 36.09 35.36 36.49 −1.135 34.19 33.93 35.59 −1.658(48.35) (50.57) (47.10) (0.877) (45.81) (26.26) (97.48) (1.444)Travel distance (km) 28.69 43.02 20.90 22.125*** 10.28 10.26 10.42 −0.158(27.22) (37.62) (15.95) (0.506) (4.80) (2.74) (10.35) (0.150)Travel time (min) 27.80 37.15 22.72 14.434*** 13.25 13.22 13.46 −0.240(20.27) (27.79) (12.98) (0.378) (5.24) (3.00) (11.23) (0.165)Unemployed 0.02 0.02 0.01 0.005 0.02 0.02 0.01 0.008**(0.12) (0.14) (0.11) (0.002) (0.14) (0.08) (0.25) (0.004)Receives DI in 2001 0.04 0.04 0.04 −0.005 0.04 0.04 0.04 −0.004(0.20) (0.20) (0.20) (0.004) (0.19) (0.11) (0.43) (0.008)DI recipientsYears in DI 7.90 7.64 8.03 −0.391 7.41 7.67 6.09 1.582(6.94) (7.48) (6.62) (0.646) (6.68) (3.71) (12.70) (0.967)Entry age 43.11 44.20 42.55 1.654 45.05 45.26 43.99 1.270(11.69) (13.25) (10.84) (1.142) (11.51) (6.23) (24.99) (2.271)DI: Psych. problems 0.29 0.27 0.30 −0.028 0.29 0.27 0.35 −0.081(0.46) (0.49) (0.43) (0.043) (0.45) (0.24) (1.01) (0.091)DI: Nerve 0.11 0.09 0.12 −0.033 0.11 0.11 0.11 0.004(0.31) (0.31) (0.31) (0.029) (0.32) (0.18) (0.66) (0.051)DI: MSK 0.21 0.27 0.18 0.089** 0.23 0.26 0.12 0.136**(0.41) (0.49) (0.37) (0.041) (0.42) (0.24) (0.69) (0.064)DI: Other illness 0.21 0.21 0.21 −0.002 0.19 0.20 0.17 0.034(0.41) (0.45) (0.38) (0.039) (0.40) (0.22) (0.79) (0.063)DI: Accident 0.10 0.09 0.11 −0.025 0.08 0.06 0.19 −0.129(0.30) (0.31) (0.30) (0.029) (0.27) (0.13) (0.83) (0.090)All individuals 15,522 5,983 9,539 8,570 2,367 6,203DI recipients 506 207 299 280 70 210Note: Means of selected covariates for individuals in treated and control regions sampled between 1999–2001, prior to thepilot period. Separate statistics for all individuals and those within a distance of 20 kilometers in border regions. Standarddeviation in parentheses. The last column in each block shows the diﬀerence between treated and control individuals foreach variable, standard error in parentheses. Survey weights applied for the full sample. Observations weighted for pairwisediﬀerences in the local sample. *, ** and *** denote signiﬁcance at the 10%, 5% and 1% level respectively. able A4: Placebo reform (a) Full sample (b) Local sample (within 20 km)(1) (2) (3) (4) (5) (6) (7) (8)Treatment region 1.337*** 1.337*** 1.337*** 1.248*** 1.150** 1.150** 1.150** 1.148**(0.051) (0.051) (0.051) (0.048) (0.076) (0.076) (0.076) (0.076)Pre-pilot time 1.235*** 1.241*** 1.241*** 1.274*** 1.204 1.213 1.213 1.253*(0.082) (0.082) (0.082) (0.084) (0.146) (0.146) (0.146) (0.150)Treat x pre 0.970 0.970 0.970 0.975 0.999 0.999 0.999 0.996(0.064) (0.064) (0.064) (0.064) (0.111) (0.111) (0.111) (0.111)Pilot time 1.320*** 1.326*** 1.390*** 1.514*** 1.525*** 1.612***(0.129) (0.129) (0.135) (0.228) (0.229) (0.241)Treat x pilot 0.847** 0.846** 0.852** 0.770** 0.771** 0.765**(0.069) (0.069) (0.069) (0.092) (0.092) (0.091)Post time 0.842 0.917 1.046 1.142(0.094) (0.103) (0.207) (0.226)Treat x post 0.960 0.961 0.841 0.829(0.080) (0.080) (0.110) (0.109)Other controls - - - (cid:88) - - - (cid:88) N municipalities 2,336 2,337 2,338 2,338 1,086 1,086 1,087 1,087N individuals 242,531 249,750 259,323 259,323 124,747 128,633 133,648 133,648N failures 6,164 7,877 9,204 9,204 3,100 3,985 4,693 4,693N fail during pilot 0 1,713 1,713 1,713 0 885 885 885N fail during prepilot 1,950 1,950 1,950 1,950 989 989 989 989N 439,761 631,782 787,954 787,954 226,345 325,321 406,221 406,221Note: Cox Proportional Hazard estimates for individuals in treated and control regions based on SESAM individual-level survey and administrative data sampled during 1999–2011. Baseline hazard for all regressions stratiﬁed by 5-year birth cohorts. Survey weights applied for the full sample. Observations in the local sample are weighted forpairwise estimation. Results are reported in exponentiated form as hazard ratios. The hazard ratio for ‘Treat x pilot’corresponds to the relative average treatment eﬀect on the treated as deﬁned in section 4. Standard errors clusteredat the individual level in parentheses, number of observations given below. *, ** and *** denote signiﬁcance at the10%, 5% and 1% level respectively.

Table A5: Distance windows (a) Travel distance (km) (b) Travel time (min)10 km 15 km 20 km 25 km 30 km 10 min 15 min 20 min 25 min 30 minTreatment region 1.13 1.20*** 1.15*** 1.16*** 1.20*** 1.040 1.13 1.18*** 1.16*** 1.09(0.10) (0.08) (0.06) (0.06) (0.06) (0.115) (0.09) (0.07) (0.06) (0.06)Pilot time 1.29 1.38** 1.27** 1.25** 1.25** 1.469* 1.43** 1.30** 1.32** 1.20(0.23) (0.19) (0.15) (0.14) (0.13) (0.333) (0.25) (0.17) (0.16) (0.14)Treat x pilot 0.75* 0.71** 0.77** 0.78** 0.78** 0.740 0.66** 0.73** 0.76** 0.81*(0.13) (0.10) (0.09) (0.08) (0.08) (0.164) (0.11) (0.09) (0.09) (0.09)Post time 0.92 0.91 0.87 0.82 0.84 1.086 0.87 0.78 0.80 0.80(0.24) (0.18) (0.15) (0.14) (0.13) (0.337) (0.21) (0.15) (0.14) (0.13)Treat x post 0.79 0.83 0.84 0.85 0.85 0.995 0.85 0.86 0.90 0.94(0.16) (0.13) (0.11) (0.10) (0.10) (0.241) (0.16) (0.12) (0.12) (0.12)N municipalities 549 825 1,087 1,286 1,414 372 649 922 1,159 1,371N individuals 47,403 88,990 133,549 151,215 163,852 26,956 56,609 119,572 143,504 166,486N failures 1,626 3,230 4,693 5,223 5,690 942 1,948 4,253 5,031 5,752N failures during pilot 332 612 885 980 1,063 180 379 811 961 1,087N 107,479 200,431 300,432 340,370 369,235 61,269 128,479 269,155 323,290 375,210Note: Cox Proportional Hazard estimates for individuals in treated and control regions across various distance windows from the border.Based on SESAM individual-level survey and administrative data sampled during 1999–2011. Observations are weighted for pairwiseestimation. Results are reported in exponentiated form as hazard ratios. The hazard ratio for ‘Treat x pilot’ corresponds to the relativeaverage treatment eﬀect on the treated as deﬁned in section 4. Standard errors clustered at the individual level in parentheses, numberof observations given below. *, ** and *** denote signiﬁcance at the 10%, 5% and 1% level respectively. able A6: Stock outﬂow (a) All individuals (b) Age ≤

50 in 2001(1) (2) (3) (4) (5) (6)Treat 0.925*** 0.923*** 0.911*** 0.871*** 0.871*** 0.872***(0.027) (0.027) (0.027) (0.041) (0.041) (0.041)Pilot time 7.698*** 7.677*** 7.825*** 7.515*** 7.479*** 7.652***(0.157) (0.156) (0.160) (0.243) (0.240) (0.247)Treat x pilot 0.985 0.986 0.992 0.995 0.997 0.997(0.033) (0.033) (0.033) (0.053) (0.053) (0.053)Post time 7.518*** 7.728*** 7.676*** 7.931***(0.152) (0.157) (0.236) (0.246)Treat x post 1.008 1.014 1.036 1.035(0.032) (0.033) (0.052) (0.051)Other controls - - (cid:88) - - (cid:88)

N individuals 314,249 327,580 327,580 145,018 154,020 154,020N failures 20,481 44,529 44,529 8,904 23,547 23,547N failures during pilot 15,389 15,389 15,389 6,957 6,957 6,957N 1,032,666 2,489,323 2,489,323 504,801 1,470,137 1,470,137Note: Cox Proportional Hazard estimates for individuals in treated and control regions based on SESAMindividual-level survey and administrative data sampled during 1999–2011. Baseline hazard for all regres-sions stratiﬁed by 5-year birth cohorts. Survey weights applied for the full sample. Observations in thelocal sample are weighted for nearest-neighbor pairwise diﬀerences. Results are reported in exponenti-ated form as hazard ratios. Standard errors clustered at the municipality level in parentheses, number ofobservations given below. *, ** and *** denote signiﬁcance at the 10%, 5% and 1% level respectively.

Table A7: Determinants of local sample

Full sample Treated Control(1) (2) (3)Age −0.0004* −0.0008*** 0.0002(0.0002) (0.0002) (0.0003)Female 0.0040 −0.0080 0.0159***(0.0054) (0.0063) (0.0057)Married −0.0115 0.0093 −0.0320*(0.0165) (0.0181) (0.0181)Foreign 0.0175 −0.0360 0.1117***(0.0258) (0.0270) (0.0210)Nr. of children −0.0030 0.0037 −0.0050(0.0041) (0.0033) (0.0049)Education: Secondary 0.0195*** 0.0041 0.0189**(0.0068) (0.0066) (0.0083)Education: Tertiary 0.0373 0.0008 0.0490**(0.0228) (0.0240) (0.0239)N 259,323 117,701 141,622Note: Probit estimates for the probability to be included in the localsample separately for treated and control regions. Marginal eﬀectsat the mean reported. Standard errors clustered at the municipalitylevel in parentheses, number of observations given below. *, ** and*** denote signiﬁcance at the 10%, 5% and 1% level respectively. igure A1: Sample composition Non−treatedNon−treated localTreated localTreatedLake

Note: Pilot cantons in shaded in dark and medium grey, control cantons shaded in light grey and white. Intermediateshades indicate the municipalities that are included in the local sample. Lakes shown in black. igure A2: Trends in DI physicians and caseload P h y s i c i a n s (a) DI physicians Applications/physician Dossiers/physician (2006) C a s e l o a d (b) Caseload Control Treated

Note: Panel (a) shows the number of full-time equivalent medical staﬀ positions before and after thereform changes, panel (b) approximates the application caseload per physician. Cantons in westernSwitzerland for which the electronic reporting system is known to have been faulty are omitted from thesample for the statistics in panel (b) (Fribourg, Genève, Jura, Neuchâtel, Vaud). Applications are onlyavailable from 2002. igure A3: Trends for aggregate disability insurance stock S t o ck Note: Disability insurance stock for treated and control regions.Figure A4: Trends for disability insurance applications D I a pp li c a t i o n s Note: Disability insurance applications for the years 2002–2012. Cantons in western Switzerland for whichthe electronic reporting system is known to have been faulty are omitted from the sample (Fribourg,Genève, Jura, Neuchâtel, Vaud). igure A5: Possible eﬀects of medical review on latent typesFigure A6: Disability insurance court cases l c l a i m s (a) All claims l c l a i m s (b) Rejected claims Note: Mean cantonal total and rejected disability insurance legal claims for the years 2002–2012. igure A7: Distance windows(a) Travel distance (km) . . . . H a z a r d r a t i o

10 20 30 40 50 60 70 80 90 100 110 120Distance (km)Estimate 90% Confidence bounds (b) Travel time (min) . . . . H a z a r d r a t i o

10 20 30 40 50 60 70 80 90 100 110 120Distance (min)Estimate 90% Confidence bounds

Note: Treatment eﬀect estimates and 90% conﬁdence bounds from the main speciﬁcation for diﬀerentdistance windows measured using actual travel distance and travel time. upplementary material for online publication onlyAppendix B: Further tables and ﬁgures Title : Does external medical review reduce disability insurance inﬂow?

Author : Helge Liebert

Table B1: Main disability incidence results, linear probability model (a) Full sample (b) Local sample (within 20 km)(1) (2) (3) (4)Treat x pilot −0.000265 −0.000272 −0.000106 −0.000107(0.000519) (0.000518) (0.001146) (0.001144)relative ATT (implied) -0.1698 -0.1742 -0.0696 -0.0701¯ y (cid:88) - (cid:88) Canton ﬁxed eﬀects (cid:88) (cid:88) (cid:88) (cid:88)

Time ﬁxed eﬀects (cid:88) (cid:88) (cid:88) (cid:88)

N 592491 592491 299545 299545Note: Linear probability model estimates of DI receipt for individuals in treated and controlregions based on SESAM individual-level survey and administrative data sampled during 1999–2011. Estimations separately for a complete representative sample of the Swiss population andonly for individuals in the vicinity of the border between treated and non-treated regions. Stan-dard errors clustered at the cantonal level in parentheses.

Table B2: Main disability incidence results with canton ﬁxed eﬀects (a) Full sample (b) Local sample (within 20 km)(1) (2) (3) (4) (5) (6)Treat x pilot 0.857** 0.855** 0.859* 0.765** 0.765** 0.760**(0.067) (0.067) (0.068) (0.086) (0.087) (0.086)Canton ﬁxed eﬀects (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)

Time ﬁxed eﬀects (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)

Other controls - - (cid:88) - - (cid:88)

N municipalities 2,337 2,338 2,338 1,086 1,087 1,087N individuals 249,750 259,323 259,323 128,536 133,549 133,549N failures 7,877 9,204 9,204 3,985 4,693 4,693N failures during pilot 1,713 1,713 1,713 885 885 885Note: Cox Proportional Hazard estimates for individuals in treated and control regions based onSESAM individual-level survey and administrative data sampled during 1999–2011. Estimationsseparately for a complete representative sample of the Swiss population and only for individuals inthe vicinity of the border between treated and non-treated regions. Baseline hazard for all regres-sions stratiﬁed by 5-year birth cohorts. Survey weights applied for the full sample. Observations inthe local sample are weighted for nearest-neighbor pairwise diﬀerences. Results are reported in ex-ponentiated form as hazard ratios. The hazard ratio for ‘Treat x pilot’ corresponds to the relativeaverage treatment eﬀect on the treated as deﬁned in section 4. Standard errors clustered at the in-dividual level in parentheses, number of observations given below. *, ** and *** denote signiﬁcanceat the 10%, 5% and 1% level respectively. igure B1: Log cumulative hazard by age, treatment region and birth cohort strata − − − − − l n [ − l n ( s u r v i v a l ) ]

35 40 45 50 55 60 65

Age