Does external medical review reduce disability insurance inflow?
DDoes external medical review reduce disabilityinsurance inflow?
Helge Liebert ∗ , †
Journal of Health Economics , 2019, Vol. 64, 108–128 https://doi.org/10.1016/j.jhealeco.2018.12.005
Abstract
This paper investigates the effects of introducing external medical review for disabilityinsurance (DI) in a system relying on treating physician testimony for eligibility de-termination. Using a unique policy change and administrative data from Switzerland,I show that medical review reduces DI incidence by 23%. Incidence reductions areclosely tied to difficult-to-diagnose conditions, suggesting inaccurate assessments bytreating physicians. Due to a partial benefit system, reductions in full benefit awardsare partly offset by increases in partial benefits. More intense screening also increaseslabor market participation. Existing benefit recipients are downgraded and lose partof their benefit income when scheduled medical reviews occur. Back-of-the-envelopecalculations indicate that external medical review is highly cost-effective. Underadditional assumptions, the results provide a lower bound of the effect on the falsepositive award error rate. ∗ Center for Disability and Integration, Department of Economics, University of St. Gallen, Rosenbergstr.51, 9000 St. Gallen, Switzerland. Email: [email protected]. † I thank the editor and three anonymous referees for their valuable comments. The paper benefitedfrom discussions with Simone Balestra, Eva Deuchert, Beatrix Eugster, Per Johansson, Rafael Lalive,Michael Lechner, Nicole Maestas, Beatrice Mäder and seminar participants at the University of St. Gallen,the University of Uppsala/IFAU, the 2015 SOLE/EALE meeting in Montreal and the 2018 EuropeanWorkshop on Health Economics and Econometrics in Groningen. All remaining errors are my own. Thiswork was funded by the Swiss National Science Foundation under grant no. 100018_143317/1. a r X i v : . [ ec on . GN ] J a n Introduction
Targeted programs constitute the most common form of social protection worldwide.Benefit payments are disbursed to groups identified by a common characteristic – families,the unemployed or persons with a work-limiting disability. Among the different socialprograms, disability insurance (DI) is by far the most costly. The average OECD countryspends about 2.3% of GDP on disability-related benefits (OECD 2010). In both the UnitedStates and Europe, the number of DI beneficiaries has been rising throughout the late20th and early 21st century and recently stabilized on a high level—on average about6% of the working age population in OECD countries receive disability benefits (OECD2010). Increases in DI beneficiaries have often been associated with imperfect screening ofDI applicants (e.g. Autor and Duggan 2003). One indication for this is that the relativeprevalence of difficult-to-diagnose health conditions like musculoskeletal or mental healthproblems on the DI rolls has increased at a higher rate than prevalence in the generalpopulation (Campolieti 2002, OECD 2010). Across OECD countries, 60% of DI inflowcan be attributed to muscoloskeletal conditions or mental health claims (OECD 2009).Disability benefit decisions are made based on medical assessments of individuals’residual functional capacity, i.e., their remaining ability to work. However, the medicalassessment process required for eligibility determination differs across countries. In 40% ofthe OECD countries surveyed in OECD (2003), the first gatekeeper to the DI system is thetreating physician. In Norway, Switzerland and the United States—countries which arecharacterized by high rates of DI prevalence—treating physician testimony has historicallyoften been decisive for claims decisions. Treating physicians also hold an influential role inthe DI determination process in Australia, Denmark, Germany, Sweden, and the UnitedKingdom. In these DI systems, the treating physician submits the medical documentationof applicants’ diagnosis and treatment history to the DI administration. After submission,the documentation is reviewed by caseworkers—and potentially also by DI physicians.Whether treating physicians or DI-appointed physicians alone should assess residualfunctional capacity of DI applicants remains an open question. Treating physicians are con-sidered to have an informational advantage, hence their recommendation is often influentialin award decisions. The United States Social Security Administration (SSA) even adopteda ‘treating physician rule’ in 1991, giving ‘controlling weight’ to the treating physician’sopinion. At the same time, treating physicians are known to diagnose clients favorably inthe context of sick-listing, possibly to prevent harming a long-standing physician-patientrelationship (e.g. Zinn and Furutani 1996, Englund et al. 2000, Kankaanpää et al. 2012).Moreover, treating physicians are often general practitioners and not clinical specialists,and it is unclear whether complex disabling conditions can be accurately diagnosed bytreating physicians. For these reasons, treating physicians’ assessments are commonlysubjected to medical review by DI physicians, who are often clinical specialists.2his paper evaluates the effectiveness of external medical review and its implications.Identification relies on quasi-experimental policy variation generated by an extensivepilot program that preceded the nationwide introduction of mandatory medical review inSwitzerland. For the analysis, I develop a combined difference-in-differences and spatialmatching approach, embedded in an age-based duration analysis framework for estimation.The results indicate that introducing medical review reduces DI admissions by 23%.Reductions are closely tied to psychological and musculoskeletal conditions, diseases whichare more prone to inaccurate diagnoses. Medical review also increases labor marketparticipation. In an extension to the main analysis, I provide explicit identifying conditionsunder which the inflow reduction can be interpreted as a bound on the reduction in DIaward errors. Looking at the stock, I find that existing benefit recipients are downgradedand lose part of their benefit income when scheduled medical reviews occur. Finally, Idemonstrate that medical review is highly cost effective.In 2005, external medical review became mandatory for all DI applications in Switzer-land. This reform was preceded by a pilot, which introduced mandatory medical review inseveral Swiss cantons already in 2002. Medical review in this context means file-basedreview, exchange with treating physicians and personal examinations by official DI andother third-party physicians. The reform had three major components. First, it substan-tially increased the medical staff and funding directed towards reviewing DI applicants’cases, more than doubling the number of full-time equivalent staff positions. Screeningquality was improved by substantially reducing the individual DI physicians’ caseloadand by directing cases to physicians’ specialized in the relevant field. Second, the physi-cians are mandated to review all DI applications, to conduct medical checks if requiredand to provide the responsible DI caseworker with better information about applicants’health. Before the policy change, caseworkers relied on information provided by applicants’treating physicians for their decision, as the DI offices had insufficient resources to screenindividuals. Third, the policy also abolished legal obstacles that prevented DI physiciansfrom examining applicants in person or requesting further documentation. Meanwhile,the decision structure remains unchanged, the final eligibility decision remains with theresponsible DI caseworker.This paper contributes to the literature on screening in DI by investigating medicalreview, a form of screening which has so far been largely neglected. DI screening involvestwo distinct aspects: stringency and quality . Interestingly, while screening has receivedconsiderable attention in the literature on DI, studies on screening in DI almost exclusivelyfocus on variations in screening stringency and use them to obtain a control group toidentify the disincentive effect of DI on labor supply (e.g. Karlström et al. 2008, Mitra2009, de Jong et al. 2011, Staubli 2011, Maestas et al. 2013, French and Song 2014). Thesestudies rely on either explicit or implicit changes to eligibility criteria and the admittancethreshold for identification and generally find positive labor supply effects of screening.3or example, de Jong et al. (2011), Maestas et al. (2013) and French and Song (2014) relyon variations in adjudicator stringency, while Karlström et al. (2008) and Staubli (2011)rely on explicit policy reforms that limited eligibility for certain groups. Looking at DI inAustria, Staubli (2011) shows that stricter eligibility requirements both reduce insuranceprevalence and increase labor supply. Naturally, these studies also often find lower take-uprates of DI because individuals become mechanically ineligible for DI due to changes inthe admittance criteria.In this paper, I focus on the implications of medical review, an intervention thatinfluences screening quality by providing more information on individuals’ underlyingcapacity to work. Looking at medical review allows abstracting from mechanical infloweffects which arise due to implicit eligibility requirement changes. Unlike stringencychanges, medical review does not inherently involve a trade-off between false positiveand false negative decision errors (e.g. Kleven and Kopczuk 2011, Low and Pistaferri2015). Since medical review is primarily targeting new DI applicants, I focus explicitlyon insurance incidence (inflow) in the analysis, since prevalence (stock) is likely to bemore inert. In addition, research has shown that inducing work take-up among long-termbeneficiaries can be difficult and results regarding the employment capabilities of thisgroup are mixed (e.g. Kornfeld and Rupp 2000, Adam et al. 2010, Borghans et al. 2014,Bütler et al. 2015, Moore 2015, Garcia Mandico et al. 2018).Moreover, the results in this paper also relate to the findings of health condition-dependent effect heterogeneity in the literature on disincentive effects of DI and theliterature on misreporting of health status. In a seminal paper, Bound (1989) finds thatup to half of DI recipients in the US would be working in the absence of DI. Newer studieshave confirmed Bound’s (1989) main result, but also show that there is considerable effectheterogeneity (e.g. Chen and van der Klaauw 2008, von Wachter et al. 2011, Maestas et al.2013, French and Song 2014). Results by von Wachter et al. (2011) indicate that especiallyemployment of younger individuals and those who applied based on mental health andmuscoloskeletal conditions would be non-negligible in the absence of DI. Related to this,Campolieti (2006) notes that stricter DI entry requirements cause fewer reports of thesedifficult-to-diagnose conditions among older males. Using administrative records, I showthat medical screening reduces insurance inflow of difficult-to-diagnose conditions andincreases labor market participation. This effectively ties excess inflow of individualscapable of working to certain conditions and suggests that medical review is a cost-effectivepolicy to reduce it. Other studies have observed that self-reports of disability differ from objective measures of functionallimitations and that individuals out of the labor market tend to overstate health limitations (Butler et al.1987, Kreider 1999, Kreider and Pepper 2007, 2008). Exaggeration and malingering of health limitationsby patients in anticipation of insurance benefits has also been documented in medical studies (e.g. Fruehet al. 2003) and the literature on worker compensation schemes (e.g. Staten and Umbeck 1982, Bolducet al. 2002).
Institutional background
The Swiss DI system is characterized by generous benefits. Individuals can receive benefitsfrom three main benefit schemes: mandatory public DI, mandatory employer-providedoccupational pensions and optional private DI. Eligibility for benefits is determined bythe local public DI office responsible for the main mandatory public scheme and bindingfor all other benefit providers. Replacement rates are based on an individual’s previousincome, contribution history, whether the individual receives full or partial benefits and thefamily situation. The full benefit amount from the mandatory public DI scheme is cappedbetween 1,175 CHF and 2,350 CHF per month before taxes, depending on prior income,marriage and contribution history. Individuals with children receive an additional 40% ofthis amount for each dependent child. In addition, there are income-contingent benefitsfor spouses and means-tested supplementary benefits for recipients who fall below thesubsistence earnings threshold. The additional payouts from the mandatory occupationalpension scheme vary based on the contribution length and the employers contract terms.Focusing only on the two mandatory schemes, a 40 year old adult with full contributionhistory and average wage can expect a replacement rate of 70% if single, 80% if married, and100% if married with two children. At earnings below the average wage, the replacementrate increases sharply up to 120%, exceeding the prior earnings level (OECD 2006, 2010).Eligibility status and the benefit amount from the main public DI scheme are determinedbased on an individual’s disability degree , a measure of work incapacity calculated as oneminus the ratio of potential labor market income with disability to the potential incomewithout disability (typically prior earnings). The determination of potential income isdirectly tied to a medical assessment of individuals’ residual work capacity . If granted,benefits are paid indefinitely, and are only revised if applicants’ health or earnings changesubstantially, or they become eligible for retirement pay. Unlike unemployment insurance(UI), DI benefits are not attached to return-to-work measures. The Swiss system allowsfor partial disability benefits in quarterly increments.The Swiss parliament passed a reform of the DI system in 2003 (
4. Revision desBundesgesetzes über die Invalidenversicherung ). Prior to this, medical review occuredinfrequently and DI caseworkers made their decisions based on medical assessmentssubmitted by the applicants’ treating physician. The treating physician-based screeningprocedure had been in place unrevised since 1973. The reform resulted in a large expansionof the medical staff available for review of insurance applications and substantially extendedtheir legal competences. Physicians were tasked to conduct (re-) appraisals of benefitclaims and authorized to carry out medical examinations.To assess the effect of the institutional changes, the Federal Ministry of Social Insurancesdevised a pilot scheme. Beginning in 2002, 11 out of 26 cantons could already hire newstaff and conduct medical review. In the remaining cantons, operation began in 20056 igure 1: Cantons with medical review during the pilot
Note: Pilot cantons shaded gray. Legend: ZH: Zürich, BE: Bern, LU: Lucerne, UR: Uri, SZ: Schwyz,OW: Obwalden, NW: Nidwalden, GL: Glarus, ZG: Zug, FR: Fribourg, SO: Solothurn, BS: Basel-Stadt, BL:Basel-Landschaft, SH: Schaffhausen, AR: Appenzell A.-Rh., AI: Appenzell I.-Rh., SG: St. Gallen, GR: Graubün-den, AG: Aargau, TG: Thurgau, TI: Ticino, VD: Vaud, VS: Valais, NE: Neuchâtel, GE: Geneva, JU: Jura. as scheduled by the reform proposal. Following the nationwide implementation in 2005,staff funding was expanded further. The cantons that introduced medical review in 2002are shown in Figure 1. The cantonal DI offices operate autonomously, but hold a yearlyjoint conference, during which participation in the early adopter program was decided(endogenous self-selection is addressed in more detail in section 4). The program was fullyfunded by the federal ministry.To become eligible for DI, individuals have to register with their local DI office.Applicants must register with the DI office corresponding to their place of residence andcannot file for benefits elsewhere. When filing a benefit claim, applicants have their treatingphysician submit the medical documentation of their condition and their previous earningsrecords. The earnings loss induced by the condition must span at least twelve months toqualify for benefits. The disability insurance office then assesses the individual earningsloss based on the severity of the condition and its impact on work capability. Based onthe assessment, the caseworker makes a decision whether the person qualifies for benefits.Prior to 2002, the insurance office could only assess eligibility from the medicalcertificates issued by the applicant’s chosen treating physician, typically the applicant’sgeneral practitioner. DI offices were legally not allowed to examine the applicant, even7hen in doubt about the credibility or severity of the impediment. The DI caseworkersdeciding on the application have no medical training themselves, but could consult withphysicians working at the DI offices if they deemed it necessary. However, the DI officeswere notoriously understaffed with physicians. In 2006, the average DI physician reviewedabout 612 dossiers per year. Considering the changes in manpower, this figure wouldhave to be 2.25 times as high prior to the reform to ensure the same coverage given thatapplication numbers remained constant (Appendix Figure A4). For this reason, only asubset of selected dossiers were passed to the DI physicians for inspection. Caseworkerswere reliant on the medical assessment provided by the treating physician when awardingbenefits.This situation changed with the reform, which essentially strengthened the role ofindependent DI physicians in the application process. There are three major changesattached to the policy. First, the reform substantially increased the medical staff workingfor the DI offices. Aggregate figures indicate that the number of full-time equivalentpositions increased by 125%. Nationwide, the number of staff positions increased from105 to 235 due to the reform. Positions are distributed among cantons proportional tothe insured population, implying that the relative increase is the same for every region.Pilot cantons experienced this increase three years earlier (see Appendix Figure A2). New physicians are selected to have specialized in fields relevant to diagnose difficult cases(e.g. rheumatology, orthopedics or psychiatry) and are trained in actuarial regulation.Second, medical review became mandatory for DI claims. Every applicants’ medicalhistory is reviewed and summarized in a non-technical report for the DI caseworker. Third,physicians were given the authority to screen people in person, to consult with treatingphysicians and order further examinations with other specialists. Before, reviews werelegally restricted to file-based review. The staff is instructed to focus on new DI applicantsand aid with scheduled revisions of existing beneficiaries claim status.A schematic overview of the application process and the additional processes is depictedin Figure 2. Under the new system, the responsible DI physician always receives a completecopy of an individual’s insurance application, including the medical documentation ofpotential limitations. The DI physician then provides an evaluation of the applicant’seligibility for the DI caseworker. If the documentation is considered insufficient, additionalinformation can be requested from treating physicians. Furthermore, if the physiciansnotice inconsistencies in the application or deem it to be invalid, they have the authorityto consult with the treating physician, to conduct further examinations or request visitsto other clinical specialist. The DI frequently uses the available channels to gather Since the reform more than doubled the number of physicians working at the DI offices, thereis concern about delays in hiring staff and filling positions. However, comparing the average share ofvacancies filled in 2006 between offices in pilot and late adopter regions does not indicate that such delaysdid occur. Examples for inconsistencies are an applicant claiming benefits on grounds of depression without a igure 2: The DI application and decision process filesregistration eligibilitydecision DISABILITYINSURANCE
DI OFFICE
APPLICANT DI CASEWORKER requestsinformation consults mandatory DI processmandatory medical reviewoptional medical reviewreportsmedicalexamination
TREATINGPHYSICIAN
DI PHYSICIAN medicalexamination submitsdocumentation additional information: Aggregate figures suggest that in-house examinations occur in upto 10% of cases, specialist consultations are decreed in up to 12% of cases and specialmultidisciplinary reports when multiple conditions are present are requested in up to 6%(Wapf and Peters 2007).The DI physicians’ eligibility evaluation is not binding. The final decision on whetherbenefits are granted remains with the responsible insurance caseworker and the actuarialrequirements are the same. This implies that the regulatory framework remains unchanged,only the provision of information about the subjects’ eligibility regarding health limitationsis affected by the reform.
The main analysis regarding insurance inflow and the analysis of the labor market responseare both based on the SESAM (
Syntheserhebung soziale Sicherheit und Arbeitsmarkt ) dataset provided by the Swiss Federal Statistical Office. The SESAM data link the officialSwiss labor force survey (SAKE,
Schweizerische Arbeitskräfteerhebung ) to administrativerecords. The sample period ranges from 1999–2011. I rely on the SESAM data to analyzethe DI hazard because they are the largest representative administrative data sourceavailable which combines different social security and labor market registers and hassufficient coverage over time. Given the survey weights, the data is representative of the sufficiently documented history of therapy or medication, or an individual with moderate chronic painclaiming full work incapacity. global sample(containing all individuals in all regions) and a local sample (containing only individuals inmunicipalities near the border between treated and control regions). Distance informationis available as both actual travel distance and travel time by car. I choose a travel distanceof 20 kilometers between municipalities as the threshold for the local sample. I thencompute nearest-neighbor estimation weights for this sample. The unrestricted globalsample comprises 259,323 individuals, the local sample is restricted to 133,549 individuals.(descriptive statistics are given in Appendix Table A2, the sample composition is mappedin Appendix Figure A1). In the estimations, I use the survey weights for the global sampleand nearest-neighbor weights for the local sample. All results in the paper are robust tothe choice of distance measure, variations in the threshold level and whether weights areapplied.As discussed in section 2, medical review also applies to scheduled reassessmentsof existing beneficiaries’ claim status. In the second part of the analysis, I investigatethe effects of medical review on existing beneficiaries. For this analysis, I use a secondadministrative dataset provided by the Swiss Federal Ministry of Social Insurances. I usethe data to estimate the effects of medical review on the disability degree classificationand benefit payment in the beneficiary stock. Moreover, I rely on this data to investigatepotential outflow effects in the beneficiary stock which could confound the main results(see section 4).The stock data tracks the stock of all existing DI recipients from 2001 onwards. Foreach individual, I observe the age of entry and the time spent on the DI rolls. In addition,the data register the actual disability degree, the benefit amount paid out by the stateinsurance and the health limitations the person suffers from, among other socio-economicvariables. However, the stock data only register the region of residence, rendering localizedanalyses impossible. All stock analyses condition on individuals with benefit receipt priorto treatment in 2001, such that results are unconfounded by new entries to the DI payroll. Microcensus data on mobility show that 80% of commuters stay within this distance limit, andit corresponds approximately to the average commuting distance and time in Switzerland (BSV 2012,Eugster and Parchet 2018). Empirical strategy
In this section, I develop the empirical approach used in the remainder of the paper.Section 4.1 discusses identification and introduces the duration model used in the mainanalysis. Section 4.2 provides explicit identifying conditions for difference-in-differencesin a Cox (1972) proportional hazards model. Section 4.3 discusses potential mechanismsthat could violate these conditions and provides evidence to support their validity. Finally,section 4.4 explores and discusses additional identifying conditions which tighten theinterpretation of the reduced-form estimate, bounding the effect of medical review on thefalse positive award error rate.
The main quantity of interest is the change in the population DI hazard induced byexternal medical review, i.e., the change in the rate of newly awarded benefits amongpreviously non-receiving working-age individuals. However, due to an opaque politicaldecision process and self-selection into the early adopter scheme, treatment assignmentcannot be assumed to be fully random. The cantons participating in the pilot programare a mixture of high and low prevalence regions, and regional cooperation considerationswere relevant in the assignment process.A difference-in-differences identification approach is used to evaluate the impact of themedical review institutions. Differencing removes time-invariant influences on potentialoutcomes. This removes bias due to selection into the program based on fixed or inertaggregate regional differences. However, identification still requires a common developmentof DI incidence in the absence of the expansion of medical review. This assumptionraises concerns related to regional heterogeneity and selection. The remainder of thissection introduces the modeling approach, the following sections present the identifyingassumptions and discuss potential threats to their validity.As Autor and Duggan (2003) illustrate, people rarely transition directly from employ-ment into DI, but typically apply conditional on job loss. One concern in the presentcontext is that labor markets may be less resilient in some regions, or that regions withstrong industrial and commercial hubs are more affected by common economic shocks. Ifscreening is imperfect and disability insurance is used as an extension to unemploymentinsurance or an early retirement vehicle in case of job loss, differential labor market trendscan confound the results. Since Switzerland is a country with historically tight labormarkets, such concerns are alleviated to some degree. Nevertheless, there may also beother underlying differences between regions based on the self-selection into the pilotprogram that cause time-variant divergence. Remaining time-variant heterogeneity amongSwiss regions may raise concerns about biased treatment effect estimates.12o address this issue, I follow a twofold approach. A first set of results is based onthe full sample of individuals across all regions. A more narrow identification approachfocuses on individuals in border regions within commuting distance between treated andcontrol areas. Focusing on these regions generates samples that are balanced in observablecharacteristics ex ante and increases the credibility of the common trend assumption.Similar strategies are used by Frölich and Lechner (2010) and Campolieti and Riddell(2012).However, local estimation approaches relying on sampling based on the distance to aborder can suffer from problems due to spatial clustering on different sides along the border(cf. Keele and Titiunik 2016). To alleviate these concerns, I compute weights correspondingto nearest-neighbor pairwise differences and use them in the estimations. This weightingapproach is equivalent to spatial matching. The main advantage of weighting is that itcreates a sample that is well-balanced in observables and increases the credibility of theidentifying assumptions introduced in the next section. Weighting reduces the bias of theestimator by restricting comparisons to a more similar control group. The bias reductionpotentially comes at the cost of an increase in variance, since the estimator may not use allavailable data. In the context of matching, this bias-variance trade-off is often favorable,as the gain from finding good matches dominates the loss due to higher variance.For estimation, I exploit the spell format of the data and model insurance take-up as aduration problem. The main specification uses a stratified Cox (1972) proportional hazardmodel to estimate the impact of the reform on DI incidence. The hazard rate is modeledas h ( t, P, D | X < ¯ x ) = h g ( t ) exp (cid:16) β P + β D + β P D (cid:17) , (1)where h g ( t ) is the non-parametric baseline hazard within birth cohort stratum g , t denotes time in years, D ∈ n , o is a binary treatment group indicator and P ∈ n , o is a binary time-varying indicator for the pilot period during t ∈ n , , o .Samples are restricted to individuals in border municipalities between treated and controlregions within an absolute distance threshold ¯ x (20 km in the main specification), whereindividuals are similar in observables and remaining differences can credibly be assumedto be time-constant. The model is specified using age as the time scale. This is preferable to using time-on-study as analysis time due to the age-dependent nature of the disability hazard, the richcohort data available and the interest in the effect of a time-varying covariate (Kom et al.1997, Thiébaut and Bénichou 2004). All models are stratified by five-year birth cohorts All estimates are robust across a large set of bandwidths and whether travel distance or travel timeis chosen as the distance metric. Moreover, the results are also robust to replacing (1) with a more flexiblespecification containing cantonal fixed effects.
13o account for cohort-specific differences in health environments. Individuals become atrisk when they are eligible for insurance at age 18. Censoring occurs at the sampling dateor when individuals reach the retirement age, whichever occurs first. Disability benefitreceipt constitutes failure. Due to data limitations, the analysis is restricted to singlespells and disability insurance is assumed to be an absorbing state. However, this is notmuch of an abstraction. Actual outflow rates due to reasons other than death or movingto the old-age pension system amount to less than 1% of the stock per year (BSV 2012).Previous research for Switzerland has shown that DI recipients are loath to give up safebenefits even when faced with strong financial incentives to do so (Bütler et al. 2015).A duration approach has a number of advantages compared to a linear difference-in-differences framework in this setting. It corresponds naturally to the spell format of theavailable cross-sectional data and the fact that DI entry is essentially a survival outcome.Data issues also limit the feasibility of the standard difference-in-differences approach. DIreceipt is observed retrospectively as year of entry and only repeated cross-sections ofa representative sample of the population are available. Since total DI incidence in thepopulation is low, actual DI entry observed in each sampling year is low and insufficientfor the analysis. Note that DI entry year and sampling year can be distinct. As the DIentry year is observed for each recipient, irrespective of the sampling date, pooling alldata increases power substantially. This is due to the fact that all information on DI entryin any given year which is available from subsequent years in which data was sampled canbe utilized.Pooling all cross-sectional data and conducting the analysis by age instead of samplingyear (time-on-study) also limits the possibility of implicit sampling bias. With inflowobserved retrospectively, relying on absolute sampling time as the time measure forthe analysis would require creating a pseudo-panel structure by inferring past incidencefigures from a post-treatment cross-section and adjusting for past eligibility. Since thedisability risk is concentrated at older ages near the official retirement age, extrapolatingpast incidence causes bias due to intermittent entry into the retirement scheme. A non-negligible share of those in the old-age pension system at the sampling date may havereceived DI previously, but are not observed to do so any more when they are sampled.This share will increase the further past incidence figures are inferred retrospectively.Incidence figures inferred this way will be artificially low and the cross-sectional dataceases to be representative. Finally, estimation of effects on incidence rates in a standard difference-in-differencesframework would require modifying the standard common trend assumption in a way Comparisons with aggregate data indicate that the reported aggregate rates are underestimated byabout 20% going back five years. Inferring incidence further retrospectively, inferred inflow continues todecrease as attrition caused by moving to the old age pension system and mortality increase. Going back30 years, inferred incidence converges to zero and is almost exclusively driven by small-sample variation ofindividuals who were awarded DI when they were very young.
The standard assumptions for difference-in-differences estimation have to be restated forproportional hazard models. The exponentiated coefficient on the interaction betweentreatment time and region represents a ratio of hazard ratiosexp (cid:16) β (cid:17) = h ( t | D = 1 , P = 1) / h ( t | D = 1 , P = 0) h ( t | D = 0 , P = 1) / h ( t | D = 0 , P = 0) . (2)The distance condition has been dropped to ease notation. The effect of interest is therelative change in the hazard for the treated, a relative average treatment effect on thetreated (rATT), rATT = h ( t | D = 1 , P = 1) h ( t | D = 1 , P = 1) , (3)where h D denotes potential hazard rates. I assume SUTVA (Rubin 1977) holds, i.e., eitherof the two potential treatment states is observed. As disability insurance applicants are asmall fraction of the population, it is credible that general equilibrium effects are absent.Identification then requires the two usual conditions in restated form h ( t | D = 1 , P = 0) = h ( t | D = 1 , P = 0) , ( no anticipation , 4)and h ( t | D = 1 , P = 1) h ( t | D = 1 , P = 0) = h ( t | D = 0 , P = 1) h ( t | D = 0 , P = 0) . ( common trend , 5)The main identifying assumption (5) is that in the absence of mandatory medical review,incidence for individuals in both pilot and non-pilot (border) regions would have changed15roportionally. The common trend assumption is not invariant to the scaling of thedependent variable (e.g. Lechner 2010) and is modified accordingly. Instead of assuming acommon trend between regions over time in differences, I am assuming a constant hazardratio, i.e., a common relative change or a common absolute change in logs. In addition, Iassume that anticipation effects are absent. Given these assumptions, the coefficient ofthe interaction identifies the hazard ratio of interest, the relative ATT. The two main threats to identification are a violation of the no anticipation conditionand the common trend assumption. Prospective or ongoing reform changes may inducesome individuals to change their behavior in anticipation of future loss or gain. The mainconfounding mechanisms are mobility (individuals move to untreated regions to apply forDI) and the timing of applications (early application in anticipation of medical review).The implementation and chronology of the reform alleviate these concerns. The firstdraft of the reform which included the institutional changes introducing medical reviewwas proposed in parliament in February 2001, and underwent some revisions until beingapproved by popular vote in March 2003. The pilot project began already in January 2002,before the changes were approved. The early adopter scheme was scheduled immediatelyafter the reform proposal was publicised and began only ten months afterwards.Importantly, the pilot scheme was never publicly announced. Communication onlyoccurred internally between the Federal Ministry of Social Insurances and the DI officesand was never publicised. The person responsible for the yearly committee meetingconfirmed that the medical review pilot was never publicly communicated to outsiders.Pilots are published only since 2007, and the medical review pilot was one of the firstpilots launched by the ministry. To be certain, I conducted a systematic news search onnewspaper databases Factiva, LexisNexis, Pressreader and Swissdox. These do not list asingle record mentioning the early adopter program. Overall, the medical review changesimplied by the reform proposal received little public attention and were only scheduled tobe implemented in 2005. Moreover, considering the one-year earnings loss restriction required for DI eligibility,the time frame until implementation leaves limited scope for the strategic timing ofapplications in both treated and control regions, even if public knowledge of the programwere available. In the treated regions, the project started ten months after the first reformproposal, effectively leaving too little time for the strategic timing of applications in treatedregions. Similarly, there is only a relatively short time period between the reforms definite Other reform measures scheduled to come into effect at a later time included the introduction of athree-quarter benefit and the abolishment of additional benefits for spouses. These measures received thebulk of public attention. The changes were adopted nationwide and only became effective in late 2004.There were no further reforms to DI or other social insurances during the introduction period. igure 3: Trends for aggregate disability insurance inflow I n f l o w (a) Inflow I n f l o w (b) Inflow (partial weighted) Control Treated
Note: Disability insurance inflow for treated and control regions in panel (a). Insurance inflow partialweighted by pension amount shown in panel (b). Data were provided by the Federal Ministry of SocialInsurances. igure 4: Log cumulative hazard by age and treatment region − − − − − l n [ − l n ( s u r v i v a l ) ]
30 35 40 45 50 55 60 65AgeControl Treated
Note: Log-log plot showing log cumulative hazard estimates by age for individuals in treated and controlregions. and follow the same trend in treated and control regions (Appendix Figure A4). Regardingthe treatment, the number of full-time equivalent positions for DI physicians exhibitsthe expected increase due to the pilot and the nationwide implementation (AppendixFigure A2, Panel a). Correspondingly, the caseload per physician drops substantially(Appendix Figure A2, Panel b). Note that all descriptive graphs are based on data fromnational statistics or federal social insurance reports and not conditioned on the localsample.While an indication of comparability between regions, strictly seen, the trend plots inFigure 3 do not correspond to the dependent variable used in the estimations. Generatingan equivalent plot in an age-based duration framework is hindered by the fact that thetreatment occurs for every individual at a different time in life, i.e., the age at whichthey experience the reform being implemented. An alternative test for the common trendassumption are the typical placebo specifications for pre-reform effects (discussed amongthe robustness checks in section 5.5). These do not indicate that the common trend isviolated. Another possibility to investigate the assumption is to look at the log cumulativehazard by age as shown in Figure 4 (referred to as a ‘log-log plot’ in biostatistics). Sinceindividuals are randomly sampled across regions and have the same age distributions, thelog cumulative hazard estimates for both groups should be parallel.In addition, the log-log plot is a common diagnostic to assess the validity of theproportional hazards assumption in the Cox model, with non-parallel or crossing lines seenas an indication that the proportionality assumption is violated (e.g. Vittinghoff et al. 2011).19isually assessing the validity of the assumption from the log-minus-log transformation ispreferable to comparing survival curves directly, as it is easier to determine whether twocurves are apart by a constant difference than to judge whether they are an exponentialtransformation. The curves in Figure 4 appear parallel and provide no indication thatthe proportional hazards assumption is violated. The same applies when the cumulativehazard estimate is stratified by five-year birth cohorts as in the analysis (Figure B1, OnlineAppendix).Finally, another potential issue pertains to bias incurred by selective sampling due to DIoutflow. Previous benefit receipt is not registered in the data—only current benefit receiptat the time of sampling is observed. If the reform affected DI outflow as well, samplingmay be biased, as those who were barred from receiving insurance due to treatment arenot observed in later years. This may result in a selected sample with artificially lowerinflow in treatment regions. An actual outflow effect would be mistaken for an infloweffect due to unobserved dropout. I test for such outflow effects using the stock data andthe results do not indicate that outflow is affected. The results for outflow are discussedtogether with the robustness checks in section 5.5.
The assumptions outlined in section 4.2 are sufficient to identify the reduced form effectof introducing mandatory external medical review on the DI inflow rate. This sectionexplores additional conditions under which the interpretation of the reduced form effectcan be extended. The main estimation results in the paper indicate a reduction in DIinflow. By layering two additional assumption, this reduced form effect can be interpretedas a lower bound of the effect on false positive award errors. Conditioning on individuals’latent eligibility status, the total effect can be decomposed into a mixture of effects on thefalse positive and false negative DI misclassification rates.It is the main duty of the insurance office to separate meritorious from non-meritoriousclaims (‘tag’ the eligible, Akerlof 1978). Given the null hypothesis of ‘no disability’, twotypes of classification errors can occur in this situation: (1) Award errors (type-I, falsepositive) and (2) Rejection errors (type-II, false negative). If medical review is imperfect,benefits may be awarded to persons who are ineligible, and deserving applicants may bedenied benefits.Hence, medical review may not reduce insurance inflow unambiguously. Suppose thatintroducing mandatory medical review increases the probability to detect applicants’ truetype. This implies that medical review can reduce both type-I and type-II misclassification, Since h ( t | x ) = h ( t )exp( xβ ), the equivalent relation for the survival curve is S ( t | x ) = S ( t ) exp( xβ ) .Visual inspection requires identifying this exponential relationship. The log-minus-log transformationof the last equation gives log( − log( S ( t | x ))) = log( − log( S ( t ))) + xβ , i.e., if the proportional hazardsassumption holds, the curves of the treatment groups should be a constant distance apart. E = n , o ,rATT = h ( t | D = 1 , P = 1 , E = 1) · p ( D = 1 , P = 1 , E = 1)+ h ( t | D = 1 , P = 1 , E = 0) · h − p ( D = 1 , P = 1 , E = 1) i h ( t | D = 1 , P = 1 , E = 1) · p ( D = 1 , P = 1 , E = 1)+ h ( t | D = 1 , P = 1 , E = 0) · h − p ( D = 1 , P = 1 , E = 1) i . (6)This underscores that the identified effect is a mixture of changes in the hazard for botheligible and ineligible types. Using this expression, it is possible to explore the conditionsfor a negative treatment effect—an inflow reduction, corresponding to a hazard ratiosmaller than one—depending on the effect for each type separately. In the following, Isimplify notation by omitting the parameters common to all objects in the conditioningset.Unlike in other treatment effect settings, population shares in (6) are superscripted bythe corresponding counterfactual states. In this setting, distinguishing them is sensibleas they can be thought of as shares of applications by eligibility types which might beinfluenced by the treatment. Ruling this out to ease interpretation, assume that p ( E = 1) = p ( E = 1) . ( no self-screening , 7)This assumption implies continuity in the composition of applications, effectively ruling outthat the propensity to apply for DI is influenced by the pilot. The most likely mechanismto confound this assumption is self-screening, i.e., individuals are selectively discouragedfrom applying for benefits (Parsons 1991). For the reasons outlined in the previous section,this behavior is unlikely since information about the pilot program did not transpire tothe public. Parsons’s (1991) original paper on the self-screening mechanism is about howchanges in screening stringency and administrative hassle which are perfectly observed byapplicants influence the application decision. The medical review process is largely hiddento the applicants and there is no information available to them detailing it. This view isalso supported by the data. Looking at the limited aggregate data available, applicationrates evolve similarly across both groups of cantons, are very stable over time and do notdiverge during the pilot (Appendix Figure A4). Due to the non-public introduction ofmedical review and the common trend in applications, differential variations in applicationbehavior are likely to be negligible.The estimation results in the next section indicate a reduction in inflow. This in mind21nd simplifying notation due to (7), the hazard ratio must be smaller than one,rATT = h ( t | E = 1) · p ( E = 1) + h ( t | E = 0) · h − p ( E = 1) i h ( t | E = 1) · p ( E = 1) + h ( t | E = 0) · h − p ( E = 1) i ≤ . (8)Rearranging gives h h ( t | E = 1) − h ( t | E = 1) i p ( E = 1) ≤ − h h ( t | E = 0) − h ( t | E = 0) i h − p ( E = 1) i , (9)i.e., the absolute value of the population-weighted treatment effect for the ineligible mustexceed the population-weighted treatment effect for the eligible to observe an aggregatereduction in inflow. This implies the reduction in award errors (type-I, RHS) must exceedthe reduction in rejection errors (type-II, LHS) for the effect to be negative. This isconsistent with the interpretation of the effect in (3) as a net effect.Finally, assuming the treatment does not decrease inflow of eligible types, h ( t | E = 1) − h ( t | E = 1) ≥ , ( monotone treatment response for eligible types , 10)the left hand side of condition (9) is greater or equal zero. If medical review actuallydecreases the chances of the ineligible to get insurance benefits, the weighted decrease inthe hazard for the ineligible must be less in absolute value than the weighted increasein the hazard for the eligible for the condition to be fulfilled. In this case, any observedinflow reduction can be interpreted as a net reduction in DI award errors.This assumption is not directly testable with the available data. It relies on the factthat medical review is an intervention to improve screening quality and, unlike variations inscreening stringency, does not involve a trade-off between false positives and false negatives(Parsons 1991, Kleven and Kopczuk 2011, Low and Pistaferri 2015). Alternatively, thecondition in (9) is trivially fulfilled if (10) is violated and medical review actually has theperverse effect of worsening the chances of the truly eligible to get insurance, reducingtheir DI inflow hazard.I will consider the consequences of violations of these assumptions and how theycan be relaxed in turn. Assumption (7) posits that medical review does not change thecomposition of applications. This assumption could be weakened by assuming that medicalreview decreases the propensity of ineligible types to apply, i.e., p ( E = 1) ≥ p ( E = 1). This coincides with Parsons’s (1991) empirical result that self-screening is non-perverse.The finding is also confirmed by Low and Pistaferri (2015), who find that false applicationsdecrease with program stringency. Since medical review extracts information, those atthe margin of being discouraged from applying are those that are more likely to befound undeserving. The lower bound interpretation can be retained with non-perverse I am grateful to an anonymous referee for pointing out this possibility. As discussed, the institutional The leading physician in one office was aware of the fact that more intense medical review couldincrease DI incidence. She stated that in her experience, rejection errors do occur and are sometimes
The main results are presented in Table 1, separately for the unrestricted and the localsample. The first column for each sample considers only spells which are censored orresult in failure before the end of the pilot period in 2005, the remaining columns use allrecorded spells and control for the post-treatment period in which the intervention wasextended nationwide. The last column adds individual control variables, including gender, encountered during revisions, but are much less frequent in relation to the amount of award errorsuncovered ex post. able 1: Disability incidence (a) Full sample (b) Local sample (within 20 km)(1) (2) (3) (4) (5) (6)Treat 1.322*** 1.322*** 1.236*** 1.150*** 1.151*** 1.148***(0.041) (0.041) (0.039) (0.061) (0.061) (0.061)Pilot time 1.083 1.088 1.110 1.257* 1.267** 1.298**(0.089) (0.089) (0.090) (0.148) (0.148) (0.152)Treat x pilot 0.856** 0.856** 0.860* 0.770** 0.771** 0.766**(0.067) (0.067) (0.068) (0.087) (0.087) (0.086)Post time 0.690*** 0.731*** 0.867 0.918(0.068) (0.072) (0.151) (0.160)Treat x post 0.971 0.970 0.841 0.829(0.078) (0.078) (0.105) (0.104)Other controls - - (cid:88) - - (cid:88) N municipalities 2,337 2,338 2,338 1,086 1,087 1,087N individuals 249,750 259,323 259,323 128,536 133,549 133,549N failures 7,877 9,204 9,204 3,985 4,693 4,693N failures during pilot 1,713 1,713 1,713 885 885 885Note: Cox Proportional Hazard estimates for individuals in treated and control regions based onSESAM individual-level survey and administrative data sampled during 1999–2011. Estimationsseparately for a complete representative sample of the Swiss population and only for individuals inthe vicinity of the border between treated and non-treated regions. Baseline hazard for all regres-sions stratified by 5-year birth cohorts. Survey weights applied for the full sample. Observations inthe local sample are weighted for nearest-neighbor pairwise differences. Results are reported in ex-ponentiated form as hazard ratios. The hazard ratio for ‘Treat x pilot’ corresponds to the relativeaverage treatment effect on the treated as defined in section 4. Standard errors clustered at the in-dividual level in parentheses, number of observations given below. *, ** and *** denote significanceat the 10%, 5% and 1% level respectively. education, marital status, number of children and foreign citizenship. All specificationsstratify the baseline hazard by five year birth cohort intervals to account for cohort specificdifferences in health environment. Survey weights are applied in the full sample such thatestimates are representative of the Swiss population. Observations in the local sample areweighted for pairwise nearest-neighbor estimation. All tables report hazard ratios, i.e.,exponentiated coefficients and corresponding standard errors.All estimates of the effect of the reform are negative (corresponding to a hazard ratioless than one) and significant at conventional levels, indicating that third-party medicalreview significantly reduced insurance inflow. The estimate for the full sample implies a14% reduction. The magnitude for the local sample is slightly higher and correspondsto a 23% lower inflow rate. Both estimates are stable in magnitude across specifications.The post coefficient estimates are negative as well, reflecting the fact that the reform wasextended to the federal level after 2004 and funding increased even further. However, thepost estimates for the local sample are imprecise as the failure density in the local sampleis not dense enough in later years, when many observations are censored at the samplingdate.The preferred specification for the remainder of the paper is given in column (5), sinceadding covariates does not affect the results in a notable way. The remaining analysisfocuses on the local sample. Results for the main sample are qualitatively similar.External medical review is also likely to affect the classification of the severity of25 able 2: Disability classification
All Partial Full DD < 70 DD ≥ health impediments for new awards. I analyse whether medical review changes the relativeincidence of partial and full benefit awards. Results in Table 2 show that incidencereductions occur only for full benefit awards (columns 2 and 3) and those due to limitationsclassified as very serious (disability degree of 70% or larger, columns 4 and 5). Estimatesfor partial benefit awards and those classified as less serious are too imprecisely estimatedto draw a clear conclusion, but may be unaffected. One possible explanation is thatincidence reductions occur mainly for full benefit applicants. However, it is unlikely thatonly applicants claiming 100% work incapability constitute the affected marginal cases. Amore likely scenario is that DI incidence reductions occur at all latent health levels. Afterintroducing medical review, some individuals who would have received the full benefitamount previously are now downgraded, resulting in a zero net effect for partial DI benefits.This finding is also reflected by a moderate decrease in the aggregate share of full benefitawards—in 2005, 58% of new beneficiaries are awarded full benefits compared to 68% in2002. The main analysis indicates that DI awards declined substantially due to external medicalreview, most likely due to a reduction in false positive benefit awards. If the effect is drivenby more accurate health and functional capacity diagnoses, then incidence reductions aremore likely to occur for diseases which are difficult to diagnose and verify for treating26 able 3: Disability types
Illness: Illness: Illness: Congenital/All Illness Psych. Nerve MSC Accident Other(1) (2) (3) (4) (5) (6) (7)Treatment region 1.151*** 1.229*** 1.185* 1.100 1.245** 0.843 1.293**(0.061) (0.072) (0.106) (0.216) (0.136) (0.148) (0.162)Pilot period 1.267** 1.384** 1.450* 2.373* 1.412 0.900 0.795(0.148) (0.178) (0.282) (1.185) (0.330) (0.362) (0.201)Treat x pilot 0.771** 0.683*** 0.699* 0.377** 0.633** 1.729 1.150(0.087) (0.084) (0.129) (0.167) (0.145) (0.656) (0.290)Post time 0.867 0.974 0.667 1.737 1.285 0.175*** 1.220(0.151) (0.183) (0.188) (1.211) (0.460) (0.102) (0.441)Treat x post 0.841 0.733** 0.897 0.607 0.596** 6.436*** 0.748(0.105) (0.097) (0.176) (0.272) (0.156) (2.942) (0.197)N municipalities 1,087 1,087 1,087 1,087 1,087 1,087 1,087N individuals 133,549 133,549 133,549 133,549 133,549 133,549 133,549N failures 4,693 3,827 1,685 339 1,090 409 835N failures during pilot 885 753 352 61 210 59 149Note: Cox Proportional Hazard estimates for individuals in treated and control regions based on SESAMindividual-level survey and administrative data sampled during 1999–2011. Sample is based on individualsliving within 20 km of the border between treated and non-treated regions. Columns distinguish betweenDI awards due to different health impairments. Baseline hazard for all regressions stratified by 5-year birthcohorts. Observations are weighted for nearest-neighbor pairwise differences. Results are reported in expo-nentiated form as hazard ratios. The hazard ratio for ‘Treat x pilot’ corresponds to the relative averagetreatment effect on the treated as defined in section 4. Standard errors clustered at the municipality levelin parentheses, number of observations given below. *, ** and *** denote significance at the 10%, 5% and1% level respectively. physicians, the first DI gatekeeper. The reduction will be most pronounced for illnesseswhich are both difficult to diagnose and whose functional capacity implications are morelikely to be misjudged.Table 3 investigates this by differentiating between health impairments leading to benefitawards. The results confirm that reductions occur most frequently for difficult-to-diagnoseconditions, while conditions which can typically be diagnosed unambiguously are notaffected. Looking at column (3) and (4), the effect is pronounced for psychological diseasesand illnesses related to nerve problems. Benefit awards due to mental health problemsare reduced by 30%. Nerve-related handicaps are reduced by over 60%, but incidence inthis group is generally very low. Column (5) looks at the incidence of musculoskeletalconditions (MSC). This category also includes a variety of conditions which are difficult toverify (e.g. whiplash injuries, back pain). The hazard ratio suggest a substantial reductionin incidence as well. The specification in column (6) looks at disability benefit awards dueto handicaps incurred in accidents; the last column considers disabilities due to congenitaldefects and other diseases. These conditions are unlikely to be subject to award errors, asthere is rarely any ambiguity and they are typically well-documented. Indeed, there is noeffect on conditions which are unaffected by intensified medical review.
This section investigates the labor market reaction in response to external medical review.In case reductions in DI incidence are driven by rejections of individuals capable of27eturning to the labor market, medical review should also have a positive effect onlabor market participation. Conversely, if the reduction is largely driven by rejections ofindividuals incapable of working, medical review should not have an effect on employment,but possibly on the inflow into other social security programs (e.g. Inderbitzin et al.2016). Table 4 uses the pooled cross-sectional administrative SESAM data to estimate adifferences-in-differences specification using a linear model.The results in Table 4 for the full sample show that the share of individuals inregistered employment increases. Similarly, the share of individuals with positive (non-benefit) earnings increases as well. In addition, the share of individuals registered withthe employment office as job seekers also decreases (columns 1–3). In columns (4) and(5), I consider other pathways from unemployment and reasons for not being registeredwith the employment office anymore. I find no effect on dismissal from the employmentoffice (and the associated return-to-work measures) due to exhausting unemployment themaximum duration for unemployment benefits. Similarly, I find no effect on the receipt ofsocial assistance, the minimum social security provision. If rejected DI applicants wereincapable of working, we would expect to see an increase in these measures. However,the results do not provide evidence for this channel. The results for the local sample arecomparable in sign and magnitude to the estimates for the full sample. However, theyare insignificant, most likely due to a lack of power ( p = 0 .
17 for the main employmentestimate in the local sample).An explanation for these results is that DI applications are partly made by peoplecapable of gainful employment and driven by moral hazard. One possible mechanismbehind this result is the canonical substitution effect interpretation—applicants seekbenefits due to a distortion in the relative price of leisure. This distortion is caused by animplicit tax on work due to DI (’cash cliffs’). An alternative explanation is that applicationsare (partly) due to income effects, i.e., even if work is not implicitly taxed by the DIprogram, given the transfer payments, beneficiaries may prefer leisure to labor (e.g. Autorand Duggan 2007, Eugster and Deuchert 2017, Gelber et al. 2017). These effects havedifferent welfare implications. If DI reduces labor supply through the substitution effectthis implies a deadweight loss, which would be reduced by medical review. Alternatively,medical review would not be welfare improving if all of the labor supply increase is due toa reduced income effect. Since DI is provided (partially) contingent on work, I am unableto separate these effects. Taken together, the evidence from the analysis suggests thatdistorted incentives are likely to matter in this context.
Although the primary task of the medical staff is to screen applicants, they also aid withreviews of recipients’ disability degree classification. While scheduled by law to occur28 able 4: Labor market responses to medical review (a) Full sampleEmployment officeWorking Positive income registration dismissal Social assistance(1) (2) (3) (4) (5)Treat x pilot 0.009*** 0.008** −0.007*** −0.002 0.000(0.004) (0.003) (0.002) (0.001) (0.001)Individual covariates (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
Canton FE (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
Year FE (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
N 556,540 557,270 411,461 411,461 411,461(b) Local sample (within 20 km)Employment officeWorking Positive income registration dismissal Social assistance(1) (2) (3) (4) (5)Treat x pilot 0.007 0.006 −0.003 0.000 0.001(0.005) (0.005) (0.003) (0.002) (0.002)Individual covariates (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
Canton FE (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
Year FE (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
N 282,858 283,111 208,340 208,340 208,340Note: Linear model estimates for individuals in treated and control regions based on SESAMindividual-level survey and administrative data sampled during 1999–2011. Estimations separatelyfor a complete representative sample of the Swiss population (panel a) and only for individuals in thevicinity of the border between treated and non-treated regions (panel b). All models include cantonaland year specific effects and control for gender, age and native status. Standard errors clustered atthe municipality level given in parentheses. *, ** and *** denote significance at the 10%, 5% and 1%level respectively. able 5: Stock reclassification and pension cuts (a) Disability degreeAll All illnesses Psychological MSC Accident CongenitalTreated region 3.04*** 3.41*** 3.66*** 2.78*** 1.67*** 1.19***(0.08) (0.09) (0.13) (0.18) (0.26) (0.18)Pilot 0.49*** 0.60*** 0.60*** 0.39*** 0.46*** 0.30**(0.06) (0.07) (0.09) (0.13) (0.17) (0.13)Treat x pilot −0.35*** −0.42*** −0.58*** −0.39* −0.24 −0.10(0.09) (0.11) (0.15) (0.20) (0.30) (0.21)Post 1.63*** 1.80*** 1.44*** 0.87*** 0.89*** 1.39***(0.05) (0.06) (0.09) (0.12) (0.16) (0.12)Treat x post −0.52*** −0.61*** −0.83*** −0.47** −0.39 −0.28(0.09) (0.10) (0.14) (0.19) (0.28) (0.19)Constant 78.54*** 77.98*** 82.75*** 72.31*** 74.53*** 86.07***(0.05) (0.06) (0.08) (0.11) (0.15) (0.11)(b) Pension amountAll All illnesses Psychological MSC Accident CongenitalTreated region 124.68*** 141.87*** 101.22*** 133.26*** 88.41*** 8.36***(2.20) (2.61) (3.67) (4.94) (7.25) (3.14)Pilot 37.04*** 39.92*** 33.12*** 39.02*** 33.29*** 28.43***(1.55) (1.85) (2.62) (3.56) (4.83) (2.25)Treat x pilot −17.25*** −21.77*** −17.79*** −21.90*** −8.10 −0.45(2.52) (2.98) (4.17) (5.66) (8.33) (3.64)Post 143.12*** 148.37*** 126.96*** 139.82*** 122.50*** 126.26***(1.46) (1.75) (2.46) (3.38) (4.54) (2.10)Treat x post −41.25*** −48.74*** −33.50*** −50.51*** −19.17** −1.61(2.38) (2.82) (3.92) (5.38) (7.84) (3.39)Constant 1232.08*** 1221.01*** 1311.78*** 1134.61*** 1199.86*** 1343.85***(1.35) (1.61) (2.30) (3.10) (4.20) (1.95)N 2,489,323 1,884,876 887,604 537,191 282,224 274,918Note: Estimates from a linear model. Outcomes are the disability degree in percent (panel a) andthe effective benefit amount paid to recipients in panel (b). The reference group are individuals inthe non-treated regions in 2001. Based on administrative panel data provided by the Swiss FederalMinistry of Social insurances which tracks the complete stock of Swiss DI benefit recipients in 2001until 2011. Standard errors in parentheses, number of observations given below. *, ** and *** denotesignificance at the 10%, 5% and 1% level respectively. regularly, revisions seldom resulted in actual disability degree or benefit cuts and typicallyinvolved DI caseworkers going over beneficiaries files without personal contact. Revisionsalso commonly take place if applicants have submitted new medical information, typicallydocumenting deteriorating health, and often result in benefit increases. With the newregime in place, files that are scheduled for review are now also passed to the DI physiciansin charge of medical review.To assess whether stock reclassifications occur, I estimate a linear difference-in-differencemodel using data for the stock of all DI beneficiaries in Switzerland in 2001. I conditionon benefit receipt prior to treatment and track the changes to the disability degree andthe effective benefit payments of existing beneficiaries over time. Results are given inTable 5. The sample is again stratified by disease groups. The outcome in Panel (a)is the individual disability degree, Panel (b) looks at the benefit amount. On average,recipients are classified less disabled by 0.35 percentage points and lose about 17 CHF inmonthly benefits. The effect magnitudes are small since reclassification remains a rareevent. Summary statistics indicate that only 9.3% of individuals of the 2001 stock are30eclassified during the three years of the pilot period. Complete denial of benefits aftera revision occurs only in exceptional cases. Upward revisions are far more common,downward changes only account for 2.3 percentage points. Still, introducing mandatorymedical review appears to cause revisions of the disability status of beneficiaries whosedocumentation is deemed insufficient, suspicious or whose health has improved. Boththe disability classification and payouts are again only adjusted for those beneficiarieswith illnesses which are more difficult to screen. Again, cuts are most pronounced forthose who receive DI due to mental health problems or musculoskeletal conditions, whilebeneficiaries with congenital diseases or handicaps incurred in accidents are unaffected.Unlike previously, nerve-related diseases are not declared in this data.
To assess the validity of the main identifying assumption, I test the effect of a placebo reformprior to the treatment period and assume a pseudo-treatment to be effective during 1999–2001. Hazard ratio estimates across all specifications are close to one, precisely estimatedand insignificant at conventional levels, supporting the validity of the identification strategy(Appendix Table A4). Placebo results for employment also do not indicate any violationof the common trend assumption (Table B3, Online Appendix).Another potential concern is that the results are sensitive to the choice of distancewindow. Figure A7 addresses this issue by plotting treatment effect estimates across alarge set of bandwidths, using both actual travel distance and travel time as distancemeasures. The coefficient of interest remains stable in size and significant across a largeset of distances. The estimates consistently suggest at least a 20% reduction in incidencein the treatment group during the pilot program. More detailed estimates over selecteddistances are provided in Appendix Table A5.As discussed in section 4.3, potential outflow effects of the reform might confound themain result. Since previous benefit receipt is unobserved, outflow effects would lead toinflow being measured with error in the sample. I use the stock data to test for outfloweffects. A duration model similar to the main specification is estimated for those whoare beneficiaries prior to treatment in 2001. Exit from the DI rolls is considered failure,individuals are censored at the sampling limit in 2011 or when they exit at the relevantpension age. Variable measurements are less clean-cut in this case. Exit due to work orexpulsion cannot be separated. However, there is no explicit reason why trends in worktake-up by insurees (a similarly rare event) should differ between regions. Results are givenin Appendix Table A6, separately for all individuals and those below age 50 in 2001, anage requirement which prohibits early retirement within the analysis horizon and selects a Complete benefit denial is legally difficult, unless fraud or malingering are proven beyond reasonabledoubt. These cases also require high up-front investment from DI offices and are initiated only in extremecases.
This paper provides a comprehensive evaluation of the introduction of medical review forDI applications in a setting in which treating physician testimony is decisive. The resultsindicate that external medical review can reduce insurance inflow substantially. The mainestimate suggests that medical review reduces DI uptake by 23%. Reductions are closelytied to difficult-to-diagnose conditions, suggesting a more accurate assessment of complexor multidisciplinary diseases. This is corroborated by the fact that disability status andbenefit revisions in the stock of recipients occur only for individuals with the same typesof conditions and the fact that medical review also increases labor market participation.32nder additional assumptions, the results suggest that medical review is likely to reducethe amount of false positive award errors and that these errors occur frequently in theabsence of medical review.Results from the local approach (sample restricted to commuting distance aroundborders) have the same sign and are comparable in magnitude to the global approach (usingthe full sample). The distance variations in Figure A7 consistently suggest a reductionin the hazard of about 20%. Considering the sizeable effect of medical review on the DIhazard, it is illustrative to assess how large the absolute effects induced by introduction ofthe medical reviews are. Looking at the main specification, without treatment, the baselineDI hazard in the treated regions is about 0.38%, i.e., on average 3.8 persons per thousandenter DI. The medical review process reduces this by about 23% to 0.29%, implying thatapproximately one person less per thousand enters DI due to a second medical assessment.Given the substantial present-discounted value of DI benefits, it is interesting to examinewhether external medical review is a cost-effective policy. Simple back-of-the-envelopecalculations indicate that outlays for hiring physicians are more than offset by reductionsin the beneficiary payload. The calculations are based on the observed increase in thenumber of physicians, a conservative effect estimate and the average benefit amount andremaining spell duration until retirement, assuming rejections are permanent. Based onthese parameters, the yearly savings only in the treated regions during the pilot are likelyto be above 650 million Swiss Francs (approximately 650 million US$ in 2018). Extendingmedical review nationwide in 2005 may have saved in excess of 1.2 billion Swiss Francsin that year alone. Even if all rejected applicants never reenter the labor market andimmediately receive social assistance, estimated yearly savings for 2005 are upwards of 500million Swiss Francs. These calculations disregard the fact that benefit decisions are tied toadditional occupational benefits and private pension schemes, which are substantially moregenerous than the main state DI benefits and would result in further savings. Nevertheless,the yearly savings far exceed potential outlays for the medical personnel that was hired.Introducing external medical review is a highly cost-effective tool to reduce insuranceinflow.Taken together, the results cast doubt on the practice to assign a large weight to thetreating physician’s opinion in DI insurance decisions. Considering that inflow reductionsare restricted to difficult-to-diagnose conditions and the results indicate that work take-upincreases when medical review is done by clinical specialists, treating physicians may not bewell-suited to serve as the main gatekeeper to DI. This result corroborates medical studieswhich posit that specialists may be better suited to judge social insurance eligibility thanpersonal physicians (e.g. Novack et al. 1989, Zinn and Furutani 1996, Freeman et al. 1999,Wynia et al. 2000, Everett et al. 2011). In addition, treating physicians have often voiceddiscomfort with being both care-takers of patients and gatekeepers to public insurancesystems. In surveys, physicians are overwhelmingly in favor of designating independent33hird-party physicians to determine disability status to prevent damaging physician-patientrelations (e.g. Zinn and Furutani 1996).Since external medical review by DI physicians appears to be effective in the Swisssetting, it might provide a viable policy option for other countries which are burdenedby high disability insurance costs and rely on treating physician assessments for DI.However, it is important to bear in mind that prior to the reform, medical review wasconducted almost exclusively by treating physicians and DI physicians could not examinepatients. Both the policy impact and the size of award errors are likely to depend onthe initial level of screening intensity. Still, treating physician testimony is influentialfor DI determinations in many OECD countries. The results suggest that subjectingtreating physcicians’ opinions to medical review by a third party is a cost-effective policyto regulate inflow and award errors. Since the policy also lifted bans on personal medicalexaminations, the changes in Switzerland can potentially also provide some insight aboutextending medical review in systems which exclusively rely on file-based review.It is important to note that screening during the pilot does not necessarily come at thecost of increased program complexity (e.g. as modeled by Kleven and Kopczuk 2011). Theadditional administrative hassle is low, and there are few visible additional up-front costsborne by the applicant. As such, external medical review is unlikely to discourage take-upstrongly in the long-term. This situation might differ if medical review is announcedpublicly. Since medical review extracts information, it may also discourage ineligibleapplicants from applying for benefits, as they have higher chances to be ultimately denied.This deterrence effect is found to be pronounced by Low and Pistaferri (2015).The mechanisms behind the results in this paper merit further investigation. Onepossible channel behind the incidence reductions are inaccurate diagnoses by treatingphysicians, the first gatekeeper to the DI system. However, whether and how muchapplication behavior suffers from moral hazard remains ultimately unclear. Applicantscould be largely myopic or actively engage in malingering. Still, the overall reduction ininflow provides a tentative suggestion that award errors exceed rejection errors in awarddecisions. This result diverges from previous analyses for the US. However, given thatbenefits are substantially more generous in Switzerland, this finding is in line with Lowand Pistaferri’s (2015) result that false applications are strongly increasing with benefitgenerosity. Hence, the result is also a first indication that the relative prevalence of errorsmay be different in European DI systems which offer higher replacement rates. Separatingtype-I and type-II classification errors more cleanly and examining the mechanisms throughwhich they occur remains a promising pursuit for further research.34 eferences
Adam, S., Bozio, A. and Emmerson, C. (2010). Reforming disability insurance in theuk: Evaluation of the pathways to work programme.
Working paper , Insitute for FiscalStudies, London.Akerlof, G. A. (1978). The economics of “tagging” as applied to the optimal income tax,welfare programs, and manpower planning.
The American Economic Review
The Quarterly Journal of Economics
American Economic Review
NBER Working Papers 10219 ,National Bureau of Economic Research, Inc.Bolduc, D., Fortin, B., Labrecque, F. and Lanoie, P. (2002). Workers’ compensation, moralhazard and the composition of workplace injuries.
The Journal of Human Resources
American Economic Journal: Economic Policy
Working Paper 2816 , National Bureau of Economic Research.BSV (2012).
Statistiken zur sozialen Sicherheit – IV-Statistik 2011 . Bundesamt fürSozialversicherungen.Butler, J. S., Burkhauser, R. V., Mitchell, J. M. and Pincus, T. P. (1987). Measurementerror in self-reported health variables.
The Review of Economics and Statistics
IZA Journal of Labor Policy
Canadian Public Policy / Analyse de Politiques
Contributions to Economic Analysis & Policy
Journal of Public Economics
Journal of Econometrics
Journal of the Royal StatisticalSociety. Series B (Methodological)
Scandinavian Journal of Primary Health Care
Economics Working Paper Series 1709 , University of St. Gallen,School of Economics and Political Science.Eugster, B. and Parchet, R. (2018). Culture and taxes.
Journal of Political Economy (forthcoming).Everett, J. P., Walters, C. A., Stottlemyer, D. L., Knight, C. A., Oppenberg, A. A. andOrr, R. D. (2011). To lie or not to lie: Resident physician attitudes about the use ofdeception in clinical practice.
Journal of Medical Ethics
Archives of Internal Medicine
American Economic Journal: Economic Policy
Journal of the American Statistical Association
Psychiatric Services
Working paper , Tinbergen Institute.Gelber, A., Moore, T. J. and Strand, A. (2017). The effect of disability insurance paymentson beneficiaries’ earnings.
American Economic Journal: Economic Policy
AmericanEconomic Journal: Economic Policy
Journal of the European Economic Association
The European Journal of Public Health
Journal of PublicEconomics
PoliticalScience Research and Methods
American Economic Journal: Economic Policy
American Journal of Epidemiology
Social Security Bulletin
Journal of Human Resources
Journal of the American Statistical Association
Journalof Applied Econometrics
Foundations and Trends in Econometrics
American Economic Review
American Economic Review
American Economic Review
Journal ofPublic Economics
Disability and rehabilitation: Legal, clinical, and self-concepts andmeasurement.
Columbus, Ohio State University Press.Novack, D., Detering, B., Arnold, R., Forrow, L., Ladinsky, M. and Pezzullo, J. (1989).Physicians’ attitudes toward using deception to resolve difficult ethical problems.
JAMA
Transforming Disability into Ability . Paris, OECD Publishing.OECD (2006).
Sickness, Disability and Work: Breaking the Barriers—Norway, Polandand Switzerland, Vol. 1 . Paris, OECD Publishing.OECD (2009). Sickness, disability and work: Keeping on track in the economic downturn.
Working paper , High-Level Forum, Stockholm.OECD (2010).
Sickness, Disability and Work: Breaking the Barriers—A Synthesis ofFindings across OECD countries . Paris, OECD Publishing.Parsons, D. O. (1991). Self-screening in targeted public transfer programs.
Journal ofPolitical Economy
Journal of PublicEconomics
Journalof Educational and Behavioral Statistics
Journal ofPublic Economics
The Social Security Disability program: Anevaluation study . 39, US Social Security Administration, Office of Research and Statistics.Staten, M. E. and Umbeck, J. (1982). Information costs and incentives to shirk: Disabilitycompensation of air traffic controllers.
The American Economic Review
Journal of Public Economics
Statistics in Medicine
Regressionmethods in biostatistics: linear, logistic, survival, and repeated measures models . SpringerScience & Business Media.von Wachter, T., Song, J. and Manchester, J. (2011). Trends in employment and earningsof allowed and rejected applicants to the social security disability insurance program.
American Economic Review
Beiträgezur Sozialen Sicherheit,
Bericht im Rahmen des mehrjährigen Forschungsprogramms zuInvalidität und Behinderung, Forschungsbericht Nr. 13/07.Wynia, M., Cummins, D., VanGeest, J. and Wilson, I. (2000). Physician manipulationof reimbursement rules for patients: Between a rock and a hard place.
JAMA
Journal of General Internal Medicine ppendix A: Additional tables and figures
Table A1: DI recipients before and after filing for benefits
DI filing-2 -1 year +1 +2
Worked last week 0.761 0.622 0.368 0.320 0.296(197) (410) (810) (1268) (1457)Looking for work last month 0.484 0.333 0.120 0.096 0.079(31) (84) (357) (748) (953)Work contract but absent at work last week 0.326 0.455 0.294 0.120 0.057(46) (154) (506) (845) (1006)Yearly income (1k CHF) 53.113 47.785 33.262 17.284 11.574(197) (410) (810) (1268) (1457)Dismissed from unemployment office 0.025 0.053 0.061 0.056 0.066(79) (206) (445) (784) (948)Social assistance 0.051 0.058 0.038 0.051 0.044(79) (206) (445) (784) (948)Age 47.365 48.480 50.022 50.445 50.648(197) (410) (810) (1268) (1457)Mental or physical problem 0.234 0.393 0.688 0.844 0.836(124) (262) (523) (841) (980)Accident within the last 12 months 0.208 0.176 0.118 0.057 0.082(48) (85) (136) (174) (184)Note: This table shows the mean values of selected variables for DI recipients from two years prior tofiling the application until two years afterwards. The table utilizes the limited longitudinal informa-tion that is available in the SESAM data. The number of observations in a cell is given in parentheses.Note that sample sizes vary because not all recipients have the same historic coverage and not all sur-vey modules are administered every year. able A2: Descriptive statistics (a) Full sampleMean SD Min Max NAll individualsAge 50.316 18.033 18.0 104.0 259,323Female 0.539 0.498 0.0 1.0 259,323Married 0.552 0.497 0.0 1.0 259,323Foreign 0.322 0.467 0.0 1.0 259,323Nr. of children 0.582 0.973 0.0 7.0 259,323Education: Primary 0.234 0.423 0.0 1.0 259,323Education: Secondary 0.510 0.500 0.0 1.0 259,323Education: Tertiary 0.255 0.436 0.0 1.0 259,323Gross annual earnings 41.450 107.251 0.0 42,317.4 259,323Travel distance (km) 34.297 31.825 0.2 194.1 259,323Travel time (min) 31.411 23.167 0.6 169.5 259,323Unemployed 0.027 0.163 0.0 1.0 259,323Receives DI 0.035 0.185 0.0 1.0 259,323RegionLéman 0.191 0.393 0.0 1.0 259,323Mittelland 0.194 0.396 0.0 1.0 259,323Nordwestschweiz 0.136 0.343 0.0 1.0 259,323Zürich 0.166 0.372 0.0 1.0 259,323Ostschweiz 0.122 0.328 0.0 1.0 259,323Zentralschweiz 0.107 0.310 0.0 1.0 259,323Tessin 0.083 0.275 0.0 1.0 259,323DI recipientsYears in DI 9.415 6.847 0.0 48.0 9,204Disability: Psych. problems 0.341 0.474 0.0 1.0 9,204Disability: Nerve 0.072 0.259 0.0 1.0 9,204Disability: Muscoloskeletal cond. 0.235 0.424 0.0 1.0 9,204Disability: Accident 0.092 0.289 0.0 1.0 9,204Disability: Congenital disease/other 0.185 0.388 0.0 1.0 9,204(b) Local sample (within 20 km)Mean SD Min Max NAll individualsAge 49.950 18.019 18.0 104.0 133,549Female 0.538 0.499 0.0 1.0 133,549Married 0.546 0.498 0.0 1.0 133,549Foreign 0.329 0.470 0.0 1.0 133,549Nr. of children 0.580 0.972 0.0 7.0 133,549Education: Primary 0.226 0.418 0.0 1.0 133,549Education: Secondary 0.510 0.500 0.0 1.0 133,549Education: Tertiary 0.265 0.441 0.0 1.0 133,549Gross annual earnings 43.252 134.295 0.0 42,317.4 133,549Travel distance (km) 11.871 4.753 0.2 20.0 133,549Travel time (min) 14.981 5.170 0.6 30.1 133,549Unemployed 0.027 0.163 0.0 1.0 133,549Receives DI 0.035 0.184 0.0 1.0 133,549RegionLéman 0.119 0.324 0.0 1.0 133,549Mittelland 0.156 0.363 0.0 1.0 133,549Nordwestschweiz 0.260 0.439 0.0 1.0 133,549Zürich 0.256 0.436 0.0 1.0 133,549Ostschweiz 0.068 0.252 0.0 1.0 133,549Zentralschweiz 0.140 0.347 0.0 1.0 133,549Tessin 0.000 0.003 0.0 1.0 133,549DI recipientsYears in DI 9.294 6.779 0.0 47.0 4,693Disability: Psych. problems 0.359 0.480 0.0 1.0 4,693Disability: Nerve 0.072 0.259 0.0 1.0 4,693Disability: Muscoloskeletal cond. 0.232 0.422 0.0 1.0 4,693Disability: Accident 0.087 0.282 0.0 1.0 4,693Disability: Congenital disease/other 0.178 0.382 0.0 1.0 4,693Note: Descriptive statistics for the unrestricted and the local estimation sample. Based onthe 1999–2011 SESAM data. able A3: Pre-treatment covariate balance (a) Full sample (b) Local sample (within 20 km)Total Treated Control Difference Total Treated Control DifferenceAll individualsAge 48.34 47.74 48.66 −0.926*** 48.55 48.53 48.68 −0.153(18.28) (18.83) (17.95) (0.309) (18.56) (10.61) (40.06) (0.605)Female 0.54 0.55 0.54 0.009 0.55 0.55 0.54 0.009(0.50) (0.52) (0.49) (0.009) (0.50) (0.29) (1.08) (0.016)Married 0.52 0.58 0.50 0.078*** 0.52 0.53 0.51 0.021(0.50) (0.52) (0.49) (0.009) (0.50) (0.29) (1.08) (0.016)Foreign 0.09 0.12 0.08 0.043*** 0.13 0.14 0.11 0.027***(0.29) (0.34) (0.26) (0.005) (0.34) (0.20) (0.67) (0.010)Nr. of children 0.56 0.66 0.51 0.142*** 0.57 0.57 0.59 −0.023(0.98) (1.08) (0.91) (0.018) (0.98) (0.56) (2.15) (0.035)Education: Primary 0.21 0.23 0.20 0.028*** 0.24 0.24 0.22 0.024*(0.41) (0.44) (0.39) (0.007) (0.43) (0.25) (0.89) (0.014)Education: Secondary 0.59 0.59 0.60 −0.010 0.58 0.58 0.60 −0.021(0.49) (0.52) (0.48) (0.009) (0.49) (0.28) (1.06) (0.016)Education: Tertiary 0.20 0.19 0.21 −0.019*** 0.18 0.18 0.18 −0.004(0.40) (0.41) (0.40) (0.007) (0.39) (0.22) (0.84) (0.012)Gross annual earnings 36.09 35.36 36.49 −1.135 34.19 33.93 35.59 −1.658(48.35) (50.57) (47.10) (0.877) (45.81) (26.26) (97.48) (1.444)Travel distance (km) 28.69 43.02 20.90 22.125*** 10.28 10.26 10.42 −0.158(27.22) (37.62) (15.95) (0.506) (4.80) (2.74) (10.35) (0.150)Travel time (min) 27.80 37.15 22.72 14.434*** 13.25 13.22 13.46 −0.240(20.27) (27.79) (12.98) (0.378) (5.24) (3.00) (11.23) (0.165)Unemployed 0.02 0.02 0.01 0.005 0.02 0.02 0.01 0.008**(0.12) (0.14) (0.11) (0.002) (0.14) (0.08) (0.25) (0.004)Receives DI in 2001 0.04 0.04 0.04 −0.005 0.04 0.04 0.04 −0.004(0.20) (0.20) (0.20) (0.004) (0.19) (0.11) (0.43) (0.008)DI recipientsYears in DI 7.90 7.64 8.03 −0.391 7.41 7.67 6.09 1.582(6.94) (7.48) (6.62) (0.646) (6.68) (3.71) (12.70) (0.967)Entry age 43.11 44.20 42.55 1.654 45.05 45.26 43.99 1.270(11.69) (13.25) (10.84) (1.142) (11.51) (6.23) (24.99) (2.271)DI: Psych. problems 0.29 0.27 0.30 −0.028 0.29 0.27 0.35 −0.081(0.46) (0.49) (0.43) (0.043) (0.45) (0.24) (1.01) (0.091)DI: Nerve 0.11 0.09 0.12 −0.033 0.11 0.11 0.11 0.004(0.31) (0.31) (0.31) (0.029) (0.32) (0.18) (0.66) (0.051)DI: MSK 0.21 0.27 0.18 0.089** 0.23 0.26 0.12 0.136**(0.41) (0.49) (0.37) (0.041) (0.42) (0.24) (0.69) (0.064)DI: Other illness 0.21 0.21 0.21 −0.002 0.19 0.20 0.17 0.034(0.41) (0.45) (0.38) (0.039) (0.40) (0.22) (0.79) (0.063)DI: Accident 0.10 0.09 0.11 −0.025 0.08 0.06 0.19 −0.129(0.30) (0.31) (0.30) (0.029) (0.27) (0.13) (0.83) (0.090)All individuals 15,522 5,983 9,539 8,570 2,367 6,203DI recipients 506 207 299 280 70 210Note: Means of selected covariates for individuals in treated and control regions sampled between 1999–2001, prior to thepilot period. Separate statistics for all individuals and those within a distance of 20 kilometers in border regions. Standarddeviation in parentheses. The last column in each block shows the difference between treated and control individuals foreach variable, standard error in parentheses. Survey weights applied for the full sample. Observations weighted for pairwisedifferences in the local sample. *, ** and *** denote significance at the 10%, 5% and 1% level respectively. able A4: Placebo reform (a) Full sample (b) Local sample (within 20 km)(1) (2) (3) (4) (5) (6) (7) (8)Treatment region 1.337*** 1.337*** 1.337*** 1.248*** 1.150** 1.150** 1.150** 1.148**(0.051) (0.051) (0.051) (0.048) (0.076) (0.076) (0.076) (0.076)Pre-pilot time 1.235*** 1.241*** 1.241*** 1.274*** 1.204 1.213 1.213 1.253*(0.082) (0.082) (0.082) (0.084) (0.146) (0.146) (0.146) (0.150)Treat x pre 0.970 0.970 0.970 0.975 0.999 0.999 0.999 0.996(0.064) (0.064) (0.064) (0.064) (0.111) (0.111) (0.111) (0.111)Pilot time 1.320*** 1.326*** 1.390*** 1.514*** 1.525*** 1.612***(0.129) (0.129) (0.135) (0.228) (0.229) (0.241)Treat x pilot 0.847** 0.846** 0.852** 0.770** 0.771** 0.765**(0.069) (0.069) (0.069) (0.092) (0.092) (0.091)Post time 0.842 0.917 1.046 1.142(0.094) (0.103) (0.207) (0.226)Treat x post 0.960 0.961 0.841 0.829(0.080) (0.080) (0.110) (0.109)Other controls - - - (cid:88) - - - (cid:88) N municipalities 2,336 2,337 2,338 2,338 1,086 1,086 1,087 1,087N individuals 242,531 249,750 259,323 259,323 124,747 128,633 133,648 133,648N failures 6,164 7,877 9,204 9,204 3,100 3,985 4,693 4,693N fail during pilot 0 1,713 1,713 1,713 0 885 885 885N fail during prepilot 1,950 1,950 1,950 1,950 989 989 989 989N 439,761 631,782 787,954 787,954 226,345 325,321 406,221 406,221Note: Cox Proportional Hazard estimates for individuals in treated and control regions based on SESAM individual-level survey and administrative data sampled during 1999–2011. Baseline hazard for all regressions stratified by 5-year birth cohorts. Survey weights applied for the full sample. Observations in the local sample are weighted forpairwise estimation. Results are reported in exponentiated form as hazard ratios. The hazard ratio for ‘Treat x pilot’corresponds to the relative average treatment effect on the treated as defined in section 4. Standard errors clusteredat the individual level in parentheses, number of observations given below. *, ** and *** denote significance at the10%, 5% and 1% level respectively.
Table A5: Distance windows (a) Travel distance (km) (b) Travel time (min)10 km 15 km 20 km 25 km 30 km 10 min 15 min 20 min 25 min 30 minTreatment region 1.13 1.20*** 1.15*** 1.16*** 1.20*** 1.040 1.13 1.18*** 1.16*** 1.09(0.10) (0.08) (0.06) (0.06) (0.06) (0.115) (0.09) (0.07) (0.06) (0.06)Pilot time 1.29 1.38** 1.27** 1.25** 1.25** 1.469* 1.43** 1.30** 1.32** 1.20(0.23) (0.19) (0.15) (0.14) (0.13) (0.333) (0.25) (0.17) (0.16) (0.14)Treat x pilot 0.75* 0.71** 0.77** 0.78** 0.78** 0.740 0.66** 0.73** 0.76** 0.81*(0.13) (0.10) (0.09) (0.08) (0.08) (0.164) (0.11) (0.09) (0.09) (0.09)Post time 0.92 0.91 0.87 0.82 0.84 1.086 0.87 0.78 0.80 0.80(0.24) (0.18) (0.15) (0.14) (0.13) (0.337) (0.21) (0.15) (0.14) (0.13)Treat x post 0.79 0.83 0.84 0.85 0.85 0.995 0.85 0.86 0.90 0.94(0.16) (0.13) (0.11) (0.10) (0.10) (0.241) (0.16) (0.12) (0.12) (0.12)N municipalities 549 825 1,087 1,286 1,414 372 649 922 1,159 1,371N individuals 47,403 88,990 133,549 151,215 163,852 26,956 56,609 119,572 143,504 166,486N failures 1,626 3,230 4,693 5,223 5,690 942 1,948 4,253 5,031 5,752N failures during pilot 332 612 885 980 1,063 180 379 811 961 1,087N 107,479 200,431 300,432 340,370 369,235 61,269 128,479 269,155 323,290 375,210Note: Cox Proportional Hazard estimates for individuals in treated and control regions across various distance windows from the border.Based on SESAM individual-level survey and administrative data sampled during 1999–2011. Observations are weighted for pairwiseestimation. Results are reported in exponentiated form as hazard ratios. The hazard ratio for ‘Treat x pilot’ corresponds to the relativeaverage treatment effect on the treated as defined in section 4. Standard errors clustered at the individual level in parentheses, numberof observations given below. *, ** and *** denote significance at the 10%, 5% and 1% level respectively. able A6: Stock outflow (a) All individuals (b) Age ≤
50 in 2001(1) (2) (3) (4) (5) (6)Treat 0.925*** 0.923*** 0.911*** 0.871*** 0.871*** 0.872***(0.027) (0.027) (0.027) (0.041) (0.041) (0.041)Pilot time 7.698*** 7.677*** 7.825*** 7.515*** 7.479*** 7.652***(0.157) (0.156) (0.160) (0.243) (0.240) (0.247)Treat x pilot 0.985 0.986 0.992 0.995 0.997 0.997(0.033) (0.033) (0.033) (0.053) (0.053) (0.053)Post time 7.518*** 7.728*** 7.676*** 7.931***(0.152) (0.157) (0.236) (0.246)Treat x post 1.008 1.014 1.036 1.035(0.032) (0.033) (0.052) (0.051)Other controls - - (cid:88) - - (cid:88)
N individuals 314,249 327,580 327,580 145,018 154,020 154,020N failures 20,481 44,529 44,529 8,904 23,547 23,547N failures during pilot 15,389 15,389 15,389 6,957 6,957 6,957N 1,032,666 2,489,323 2,489,323 504,801 1,470,137 1,470,137Note: Cox Proportional Hazard estimates for individuals in treated and control regions based on SESAMindividual-level survey and administrative data sampled during 1999–2011. Baseline hazard for all regres-sions stratified by 5-year birth cohorts. Survey weights applied for the full sample. Observations in thelocal sample are weighted for nearest-neighbor pairwise differences. Results are reported in exponenti-ated form as hazard ratios. Standard errors clustered at the municipality level in parentheses, number ofobservations given below. *, ** and *** denote significance at the 10%, 5% and 1% level respectively.
Table A7: Determinants of local sample
Full sample Treated Control(1) (2) (3)Age −0.0004* −0.0008*** 0.0002(0.0002) (0.0002) (0.0003)Female 0.0040 −0.0080 0.0159***(0.0054) (0.0063) (0.0057)Married −0.0115 0.0093 −0.0320*(0.0165) (0.0181) (0.0181)Foreign 0.0175 −0.0360 0.1117***(0.0258) (0.0270) (0.0210)Nr. of children −0.0030 0.0037 −0.0050(0.0041) (0.0033) (0.0049)Education: Secondary 0.0195*** 0.0041 0.0189**(0.0068) (0.0066) (0.0083)Education: Tertiary 0.0373 0.0008 0.0490**(0.0228) (0.0240) (0.0239)N 259,323 117,701 141,622Note: Probit estimates for the probability to be included in the localsample separately for treated and control regions. Marginal effectsat the mean reported. Standard errors clustered at the municipalitylevel in parentheses, number of observations given below. *, ** and*** denote significance at the 10%, 5% and 1% level respectively. igure A1: Sample composition Non−treatedNon−treated localTreated localTreatedLake
Note: Pilot cantons in shaded in dark and medium grey, control cantons shaded in light grey and white. Intermediateshades indicate the municipalities that are included in the local sample. Lakes shown in black. igure A2: Trends in DI physicians and caseload P h y s i c i a n s (a) DI physicians Applications/physician Dossiers/physician (2006) C a s e l o a d (b) Caseload Control Treated
Note: Panel (a) shows the number of full-time equivalent medical staff positions before and after thereform changes, panel (b) approximates the application caseload per physician. Cantons in westernSwitzerland for which the electronic reporting system is known to have been faulty are omitted from thesample for the statistics in panel (b) (Fribourg, Genève, Jura, Neuchâtel, Vaud). Applications are onlyavailable from 2002. igure A3: Trends for aggregate disability insurance stock S t o ck Note: Disability insurance stock for treated and control regions.Figure A4: Trends for disability insurance applications D I a pp li c a t i o n s Note: Disability insurance applications for the years 2002–2012. Cantons in western Switzerland for whichthe electronic reporting system is known to have been faulty are omitted from the sample (Fribourg,Genève, Jura, Neuchâtel, Vaud). igure A5: Possible effects of medical review on latent typesFigure A6: Disability insurance court cases l c l a i m s (a) All claims l c l a i m s (b) Rejected claims Note: Mean cantonal total and rejected disability insurance legal claims for the years 2002–2012. igure A7: Distance windows(a) Travel distance (km) . . . . H a z a r d r a t i o
10 20 30 40 50 60 70 80 90 100 110 120Distance (km)Estimate 90% Confidence bounds (b) Travel time (min) . . . . H a z a r d r a t i o
10 20 30 40 50 60 70 80 90 100 110 120Distance (min)Estimate 90% Confidence bounds
Note: Treatment effect estimates and 90% confidence bounds from the main specification for differentdistance windows measured using actual travel distance and travel time. upplementary material for online publication onlyAppendix B: Further tables and figures Title : Does external medical review reduce disability insurance inflow?
Author : Helge Liebert
Table B1: Main disability incidence results, linear probability model (a) Full sample (b) Local sample (within 20 km)(1) (2) (3) (4)Treat x pilot −0.000265 −0.000272 −0.000106 −0.000107(0.000519) (0.000518) (0.001146) (0.001144)relative ATT (implied) -0.1698 -0.1742 -0.0696 -0.0701¯ y (cid:88) - (cid:88) Canton fixed effects (cid:88) (cid:88) (cid:88) (cid:88)
Time fixed effects (cid:88) (cid:88) (cid:88) (cid:88)
N 592491 592491 299545 299545Note: Linear probability model estimates of DI receipt for individuals in treated and controlregions based on SESAM individual-level survey and administrative data sampled during 1999–2011. Estimations separately for a complete representative sample of the Swiss population andonly for individuals in the vicinity of the border between treated and non-treated regions. Stan-dard errors clustered at the cantonal level in parentheses.
Table B2: Main disability incidence results with canton fixed effects (a) Full sample (b) Local sample (within 20 km)(1) (2) (3) (4) (5) (6)Treat x pilot 0.857** 0.855** 0.859* 0.765** 0.765** 0.760**(0.067) (0.067) (0.068) (0.086) (0.087) (0.086)Canton fixed effects (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
Time fixed effects (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
Other controls - - (cid:88) - - (cid:88)
N municipalities 2,337 2,338 2,338 1,086 1,087 1,087N individuals 249,750 259,323 259,323 128,536 133,549 133,549N failures 7,877 9,204 9,204 3,985 4,693 4,693N failures during pilot 1,713 1,713 1,713 885 885 885Note: Cox Proportional Hazard estimates for individuals in treated and control regions based onSESAM individual-level survey and administrative data sampled during 1999–2011. Estimationsseparately for a complete representative sample of the Swiss population and only for individuals inthe vicinity of the border between treated and non-treated regions. Baseline hazard for all regres-sions stratified by 5-year birth cohorts. Survey weights applied for the full sample. Observations inthe local sample are weighted for nearest-neighbor pairwise differences. Results are reported in ex-ponentiated form as hazard ratios. The hazard ratio for ‘Treat x pilot’ corresponds to the relativeaverage treatment effect on the treated as defined in section 4. Standard errors clustered at the in-dividual level in parentheses, number of observations given below. *, ** and *** denote significanceat the 10%, 5% and 1% level respectively. igure B1: Log cumulative hazard by age, treatment region and birth cohort strata − − − − − l n [ − l n ( s u r v i v a l ) ]
35 40 45 50 55 60 65
Age
Control Treated − − − − − l n [ − l n ( s u r v i v a l ) ]
35 40 45 50 55 60 65
Age
Control Treated − − − − − l n [ − l n ( s u r v i v a l ) ]
35 40 45 50 55 60 65
Age
Control Treated − − − − − l n [ − l n ( s u r v i v a l ) ]
35 40 45 50 55 60 65
Age
Control Treated − − − − − l n [ − l n ( s u r v i v a l ) ]
35 40 45 50 55 60 65
Age
Control Treated − − − − − l n [ − l n ( s u r v i v a l ) ]
35 40 45 50 55 60 65
Age
Control Treated
Note: Log-log plot showing log cumulative hazard estimates by age and birthcohort for individuals intreated and control regions, separately for major birth cohort strata. able B3: Robustness: Placebo test labor market participation (a) Full sample (b) Local sample (within 20 km)(1) (2) (3) (4) (5) (6) (7) (8)Treat x pre 0.0107 0.0004 0.0127 0.0004(2001) (0.0076) (0.0063) (0.0109) (0.0090)Treat x pre 0.0045 −0.0055 0.0055 −0.0064(2000, 2001) (0.0077) (0.0046) (0.0112) (0.0066)Treat x pilot 0.0091*** 0.0087*** 0.0068* 0.0063*(0.0026) (0.0026) (0.0038) (0.0038)Individual covariates (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) Canton FE (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
Year FE (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
Only years before 2002 (cid:88) (cid:88) (cid:88) (cid:88)
All years (cid:88) (cid:88) (cid:88) (cid:88)