Quasi-Experimental Shift-Share Research Designs
QQuasi-Experimental Shift-Share Research Designs
Kirill BorusyakUCL Peter HullU Chicago and NBER Xavier JaravelLSE ∗ December 2019
Abstract
Many studies use shift-share (or “Bartik”) instruments, which average aset of shocks with exposure share weights. We provide a new econometricframework for such designs in which identification follows from the quasi-random assignment of shocks, allowing exposure shares to be endogenous.This framework is centered around a numerical equivalence: conventionalshift-share instrumental variable (SSIV) regression coefficients are equiv-alently obtained from a transformed regression where the shocks are useddirectly as an instrument. This equivalence implies a shock-level trans-lation of the SSIV exclusion restriction, which holds when shocks are as-good-as-randomly assigned and large in number, with sufficient dispersionin their average exposure. We discuss and illustrate several practical in-sights delivered by this framework. ∗ Contact: [email protected], [email protected], and [email protected]. We are grateful to Rodrigo Adão,Joshua Angrist, David Autor, Moya Chin, Andy Garin, Ed Glaeser, Paul Goldsmith-Pinkham, Larry Katz, MichalKolesár, Gabriel Kreindler, Jack Liebersohn, Eduardo Morales, Jack Mountjoy, Jörn-Steffen Pischke, Brendan Price,Isaac Sorkin, Jann Spiess, Itzchak Tzachi Raz, various seminar participants, and four anonymous referees for helpfulcomments. We thank David Autor, David Dorn, and Gordon Hanson, as well as Paul Goldsmith-Pinkham, Isaac Sorkin,and Henry Swift, for providing replication code and data. a r X i v : . [ ec on . E M ] D ec Introduction
A large and growing number of empirical studies use shift-share instruments: weighted averages of acommon set of shocks, with weights reflecting heterogeneous shock exposure. In many settings, suchas those of Bartik (1991), Blanchard et al. (1992) and Autor et al. (2013), a regional instrument isconstructed from shocks to industries with local industry employment shares measuring the shockexposure. In other settings, researchers may combine shocks across countries, income groups, orforeign markets to instrument for treatments at the regional, individual, or firm level. The claim for instrument validity in shift-share instrumental variable (SSIV) regressions mustrely on some assumptions about the shocks, exposure shares, or both. This paper develops a novelframework for understanding SSIV regressions as leveraging exogenous variation in shocks, even whenvariation in exposure shares is endogenous. Our approach is centered around a simple numericalequivalence: we show that SSIV regression coefficients are identically obtained from a transformed IVregression, estimated at the level of shocks. In this equivalent regression the outcome and treatmentvariables are first averaged, using exposure shares as weights, to obtain shock-level aggregates. Theshocks then directly instrument for the aggregated treatment. Importantly, this equivalence only relieson the structure of the shift-share instrument and thus applies to outcomes and treatments that arenot typically computed at the level of shocks. It follows that the SSIV exclusion restriction holds if andonly if shocks are uncorrelated, in large samples, with a particular residual: the average unobserveddeterminants of the original outcome among observations most exposed to a given shock.We use this equivalence result to derive two conditions sufficient for such “shock orthogonality.”First, we assume shocks are as-good-as-randomly assigned, as if arising from a natural experiment.This is enough for the SSIV exclusion restriction to hold on average across shock realizations. Second,we assume that a shock-level law of large numbers applies: that the instrument incorporates manysufficiently independent shocks, each with sufficiently small average exposure. Instrument relevancefurther holds when individual units are mostly exposed to only a small number of shocks, providedthose shocks affect treatment. While novel for SSIV, our two quasi-experimental conditions are similarto ones imposed in other settings where the underlying shocks are directly used as instruments,bringing SSIV to familiar econometric territory. We illustrate these ideas in an idealized example, inwhich a local labor supply elasticity is estimated with a shift-share instrument constructed from quasi-random output subsidy shocks to different industries. We highlight that the SSIV coefficient retains Observations in shift-share designs may, for example, represent regions impacted by immigration shocks fromdifferent countries (Card 2001), firms differentially exposed to foreign market shocks (Hummels et al. 2014), productgroups demanded by different types of consumers (Jaravel 2019), or groups of individuals facing different income growthrates (Boustan et al. 2013). Other influential and recent examples of shift-share IVs include Luttmer (2005), Saiz (2010),Kovak (2013), Nakamura and Steinsson (2014), Oberfield and Raval (2014), Greenstone et al. (2014), Diamond (2016),Suárez and Zidar (2016), and Hornbeck and Moretti (2019). For example, Acemoglu et al. (2016) study the impact of import competition from China on U.S. industry employ-ment using industry (i.e. shock-level) regressions with shocks constructed similarly to those underlying the regionalshift-share instrument used in Autor et al. (2013). Our framework shows that both studies can rely on similar econo-metric assumptions, even though the economic interpretations of the estimates may be different. ssaggregate , which we have developed to help practitionersimplement the appropriate shock-level analyses. Our quasi-experimental approach is not the only way to satisfy the SSIV exclusion restriction. Inrelated work, Goldsmith-Pinkham et al. (2019) formalize a different framework based on the exogeneityof the exposure shares, imposing no assumption of shock exogeneity. This approach is motivatedby a different numerical equivalence: the SSIV coefficient also coincides with a generalized method This Stata package creates the shock-level aggregates used in the equivalent regression. Users can install thispackage with the command ssc install ssaggregate.
See the associated help file and this paper’s replication archive athttps://github.com/borusyak/shift-share for more details.
2f moments estimator, with exposure shares as multiple excluded instruments. Though exposureexogeneity is a sufficient condition for identification (and, as such, implies our shock-level orthogonalitycondition), we focus on plausible conditions under which it is not necessary.Identification via exogenous shocks seems attractive in many SSIV settings. Consider the Autor etal. (2013; hereafter ADH) shift-share instrument, which combines industry-specific changes in Chineseimport competition (the shocks) with local exposure given by the lagged industrial composition ofU.S. regions (the exposure shares). In such a setting, exogeneity of industry employment shares maybe difficult to justify a priori . Indeed, we show that the “shares” view to identification generally failswhen there are any unobserved industry shocks that affect regional outcomes through the shares (e.g.,unobserved automation trends). Our approach, in contrast, allows researchers to specify a set of shocksthat are plausibly uncorrelated with such unobserved factors. Consistent with this general principle,ADH attempt to purge their industry shocks from U.S.-specific confounders by measuring Chineseimport growth outside of the United States. Similarly, Hummels et al. (2014) combine country-by-product changes in transportation costs to Denmark (as shocks) with lagged firm-specific compositionof intermediate inputs and their sources (as shares). They argue these shocks are “idiosyncratic,” whichour approach formalizes as “independent from relevant country-by-product unobservables.” Otherrecent examples of where our approach may naturally apply include the exchange rate shocks ofHummels et al. (2011), the education policy shocks of Stuen et al. (2012), the demographic shocks ofJaravel (2019), and the bank health shocks of Xu (2019).In other shift-share designs, the shocks are equilibrium objects that can be difficult to view asbeing quasi-experimentally assigned. In the canonical estimation of regional labor supply elasticitiesby Bartik (1991), for example, the shocks are measured as national industry growth rates. Such growthcaptures national industry labor demand shocks, which one may be willing to assume are as-good-as-randomly assigned across industries; however, industry growth rates also depend on unobservedregional labor supply shocks. We show that our framework can still apply to such settings by castingthe industry employment growth rates as noisy estimates of latent quasi-experimental demand shocksand establishing conditions to ensure the supply-driven estimation error is asymptotically ignorable.These conditions are weaker if the latent shocks are estimated as leave-one-out averages. Althoughleave-one-out shift-share IV estimates do not have a convenient shock-level representation, we provideevidence that in the Bartik (1991) setting this leave-out adjustment is unimportant.Formally, our approach to SSIV relates to the analysis of IV estimators with many invalid instru-ments by Kolesar et al. (2015). Consistency in that setting follows when violations of individual instru-ment exclusion restrictions are uncorrelated with their first-stage effects. For quasi-experimental SSIV,the exposure shares can be thought of as a set of invalid instruments (per the Goldsmith-Pinkham etal. (2019) interpretation), and our orthogonality condition requires their exclusion restriction violationsto be uncorrelated with the shocks. Despite this formal similarity, we argue that shift-share identi-3cation is better understood through the quasi-random assignment of a single instrument (shocks),rather than through a large set of invalid instruments (exposure shares) that nevertheless produce aconsistent estimate. This view is reinforced by our numerical equivalence, yields a natural shock-levelidentification condition, and suggests new validations and extensions of SSIV.Our analysis also relates to other recent methodological studies of shift-share designs, includingthose of Jaeger et al. (2018) and Broxterman and Larson (2018). The former highlights biases of SSIVdue to endogenous local labor market dynamics, and we show how their solution can be implementedin our setting, while the latter studies the empirical performance of different shift-share instrumentconstructions. As discussed above we also draw on the inferential framework of Adão et al. (2019),who derive valid standard errors in shift-share designs with a large number of idiosyncratic shocks.More broadly, our paper adds to a growing literature studying the causal interpretation of commonresearch designs, including work by Borusyak and Jaravel (2017) and Goodman-Bacon (2018) for eventstudy designs; Hudson et al. (2017) and Chaisemartin and D’Haultfoeuille (2019) for instrumenteddifference-in-difference designs; and Hull (2018) for mover designs.The remainder of this paper is organized as follows. Section 2 introduces the environment, derivesour numerical equivalence for shift-share IV, and discusses the key shock orthogonality condition.Section 3 then establishes the quasi-experimental assumptions under which this condition is satisfied,and derives various extensions. Section 4 discusses shock-level procedures for valid SSIV inferenceand testing, while Section 5 illustrates the methodology in the ADH setting. Section 6 concludes.Additional results and proofs are included in the paper’s appendix.
We begin by defining the SSIV estimator and showing that it coincides with a new IV procedure,estimated at the level of shocks. Motivated by this equivalence result, we derive a necessary andsufficient shock-level orthogonality condition for shift-share instrument validity.
We consider a sequence of data generating processes, indexed by the number of observations L . Thedata include an outcome variable y (cid:96) , an endogenous (or “treatment”) variable x (cid:96) , a vector of controls w (cid:96) (which includes a constant), and an observation importance weight e (cid:96) > (with (cid:80) L(cid:96) =1 e (cid:96) = 1 ; e (cid:96) = L covers the unweighted case). For reasons we will discuss in detail in Section 3, we do notassume independent or identically-distributed draws of these variables across (cid:96) .We are interested in estimation of the causal effect or structural parameter β in a linear modelrelating outcomes to treatment: y (cid:96) = βx (cid:96) + (cid:15) (cid:96) . To accommodate the control vector we assume that We consider models with heterogeneous treatment effects in Appendix A.1; see footnote 8 for a summary. e (cid:96) -weighted projection of unobserved untreated potential outcomes, (cid:15) (cid:96) , on w (cid:96) is well-defined; that is ˆ γ = (cid:16)(cid:80) L(cid:96) =1 e (cid:96) w (cid:96) w (cid:48) (cid:96) (cid:17) − (cid:16)(cid:80) L(cid:96) =1 e (cid:96) w (cid:96) (cid:15) (cid:96) (cid:17) p −→ γ for some γ as L → ∞ . We thenstudy the expanded model y (cid:96) = βx (cid:96) + w (cid:48) (cid:96) γ + ε (cid:96) , (1)where ε (cid:96) = (cid:15) (cid:96) − w (cid:48) (cid:96) γ is a structural residual. For example, we might be interested in estimating the inverse labor supply elasticity β fromobservations of log wage growth y (cid:96) and log employment growth x (cid:96) across local labor markets (cid:96) . Theresidual ε (cid:96) in (1) would then contain all labor supply shocks, such as those arising from demographic,human capital, or migration changes, that are not asymptotically correlated with the control vector w (cid:96) . In estimating labor supply we may weight observations by the overall lagged regional employment, e (cid:96) . We return to this labor supply example throughout the following theoretic discussion.To estimate β we construct a shift-share instrument from a set of shocks g n , for n = 1 , . . . , N , andshares s (cid:96)n ≥ which define the relative exposure of each observation (cid:96) to each shock n . Specifically,the instrument is given by an exposure-weighted average of the shocks: z (cid:96) = N (cid:88) n =1 s (cid:96)n g n . (2)In the labor supply example, where a local labor demand instrument is called for, g n may for instancedenote new government subsidies to the output of different industries n and s (cid:96)n may be location (cid:96) ’s lagged shares of industry employment. For now we require that the sum of exposure weights isconstant across observations, i.e. that (cid:80) Nn =1 s (cid:96)n = 1 ; we relax this in Section 3.4. Although our focusis on shift-share IV, we also note that this setup nests shift-share reduced-form regressions of y (cid:96) on z (cid:96) , when x (cid:96) = z (cid:96) .The SSIV estimator ˆ β uses z (cid:96) to instrument for x (cid:96) in equation (1), weighting by e (cid:96) . By the Frisch-Waugh-Lovell Theorem this estimator can be represented as a bivariate IV regression of outcome andtreatment residuals, i.e. as the ratio of e (cid:96) -weighted sample covariances between the instrument andthe residualized outcome and treatment: ˆ β = (cid:80) L(cid:96) =1 e (cid:96) z (cid:96) y ⊥ (cid:96) (cid:80) L(cid:96) =1 e (cid:96) z (cid:96) x ⊥ (cid:96) , (3)where v ⊥ (cid:96) denotes the residual from an e (cid:96) -weighted projection of variable v (cid:96) on the control vector w (cid:96) .Note that by the properties of such residualization, it is enough to residualize y (cid:96) and x (cid:96) without alsoresidualizing the instrument z (cid:96) . For simplicity we suppose here that w (cid:96) is of fixed length, such that ˆ γ consistently estimates a fixed number ofcoefficients. We weaken this assumption in the appendix proofs to the following results, allowing for an increasingnumber of group fixed effects or other controls subject to regularity conditions. .2 An Equivalent Shock-Level IV Estimator Cross-sectional variation in the shift-share instrument only arises from differences in the exposureshares s (cid:96)n , since the set of shocks g n is the same in the construction of each z (cid:96) . It may be natural toconclude that such share variation is central for the exogeneity of z (cid:96) . The following result suggeststhat this view is incomplete: Prop. 1
The SSIV estimator ˆ β equals the second-stage coefficient from a shock-level IV regressionthat uses the shocks g n as the instrument in estimating ¯ y ⊥ n = α + β ¯ x ⊥ n + ¯ ε ⊥ n , (4)where ¯ v n = (cid:80) L(cid:96) =1 e (cid:96) s (cid:96)n v (cid:96) (cid:80) L(cid:96) =1 e (cid:96) s (cid:96)n denotes an exposure-weighted average of variable v (cid:96) and the IVestimation is weighted by average shock exposure s n = (cid:80) L(cid:96) =1 e (cid:96) s (cid:96)n . Proof : By definition of z (cid:96) , and by exchanging the order of summation in (3), ˆ β = (cid:80) L(cid:96) =1 e (cid:96) (cid:16)(cid:80) Nn =1 s (cid:96)n g n (cid:17) y ⊥ (cid:96) (cid:80) L(cid:96) =1 e (cid:96) (cid:16)(cid:80) Nn =1 s (cid:96)n g n (cid:17) x ⊥ (cid:96) = (cid:80) Nn =1 g n (cid:16)(cid:80) L(cid:96) =1 e (cid:96) s (cid:96)n y ⊥ (cid:96) (cid:17)(cid:80) Nn =1 g n (cid:16)(cid:80) L(cid:96) =1 e (cid:96) s (cid:96)n x ⊥ (cid:96) (cid:17) = (cid:80) Nn =1 s n g n ¯ y ⊥ n (cid:80) Nn =1 s n g n ¯ x ⊥ n . (5)Furthermore (cid:80) Nn =1 s n ¯ y ⊥ n = (cid:80) L(cid:96) =1 e (cid:96) (cid:16)(cid:80) Nn =1 s (cid:96)n (cid:17) y ⊥ (cid:96) = (cid:80) L(cid:96) =1 e (cid:96) y ⊥ (cid:96) = 0 , since y ⊥ (cid:96) is an e (cid:96) -weighted regression residual and (cid:80) Nn =1 s (cid:96)n = 1 . This and an analogous equality for ¯ x ⊥ n imply that ˆ β in (5) is a ratio of s n -weighted covariances of ¯ y ⊥ n and ¯ x ⊥ n with g n ; hence it isobtained from the specified shock-level IV regression. (cid:3) Proposition 1 is our main equivalence result, which shows that SSIV estimates can also be thought toarise from variation across shocks, rather than across observations. The IV regression that leveragesthis variation uses shock-level aggregates of the original (residualized) outcome and treatment, ¯ y ⊥ n and ¯ x ⊥ n . Specifically, ¯ y ⊥ n reflects the average residualized outcome of the observations most exposedto the n th shock, while ¯ x ⊥ n is the same weighted average of residualized treatment. Each shock in thisregression is weighted by s n , representing its average ( e (cid:96) -weighted) exposure across observations. The labor supply example is useful for unpacking this general result. One expects industries n which receive a higher output subsidy g n to increase employment in the regions (cid:96) that they aremost active in, such that the shock-level first-stage regression of ¯ x ⊥ n on g n is positive. The shock-level reduced-form regression of ¯ y ⊥ n on g n further reflects the extent to which industries with highersubsidies also tend to be concentrated in regions with higher wage growth. Proposition 1 shows thatthe resulting shock-level IV regression estimate is numerically the same as the original SSIV estimate Note that in the special case where ˆ β comes from a reduced-form shift-share regression, Proposition 1 shows thatthe equivalent shock-level procedure is still an IV regression, of ¯ y ⊥ n on the transformed shift-share instrument ¯ z ⊥ n , againinstrumented by g n and weighted by s n . s n are similarly intuitive in this setting: if, as is commonin practice, the industry employment shares s (cid:96)n are measured in the same pre-period as total regionalemployment e (cid:96) , then s n will be proportional to total lagged industry employment. Without locationimportance weights (i.e. e (cid:96) = L ), s n is the average employment share of industry n across locations.It is worth emphasizing that ¯ y n and ¯ x n (and their residualized versions in Proposition 1) areunconventional shock-level objects. They can, for example, be computed for outcomes and treatmentsthat are not typically observed at the level of the shocks, such as when n indexes industries and y (cid:96) measures regional marriage rates (Autor et al. 2019). Furthermore, even if the outcome and treatmenthave natural measures at the shock level, ¯ y n and ¯ x n will generally not coincide with them. For example,in the labor supply setting ¯ y n is not industry n ’s wage growth; rather, it measures the average wagegrowth in regions where industry n employs the most workers. Accordingly, the shift-share regressionestimates the elasticity of regional, rather than industry, labor supply. Proposition 1 is an algebraic result which, by itself, does not speak to the consistency of ˆ β . Atthe same time it suggests a shock-level orthogonality condition that is both necessary and sufficientfor SSIV consistency, given the standard assumption of first-stage relevance. We derive and interpretthis condition next, before developing a quasi-experimental framework that implies it. As usual, SSIV consistency (that ˆ β p −→ β as L → ∞ ) requires and implies instrument exclusion(that z (cid:96) and ε ⊥ (cid:96) are asymptotically uncorrelated), given instrument relevance (that z (cid:96) and x ⊥ (cid:96) areasymptotically correlated). We discuss relevance in the following section and focus here on exclusion.The following result shows how applying the equivalency logic of Proposition 1 translates the cross-sectional exclusion condition to a novel orthogonality condition at the shock level: Prop. 2
Suppose (cid:80)
L(cid:96) =1 e (cid:96) w (cid:96) z (cid:96) p −→ Ω zw , (cid:80) L(cid:96) =1 e (cid:96) w (cid:96) (cid:15) (cid:96) p −→ Ω w(cid:15) , and (cid:80) L(cid:96) =1 e (cid:96) w (cid:96) w (cid:48) (cid:96) p −→ Ω ww with Ω ww full rank. Suppose further than instrument relevance holds: (cid:80) L(cid:96) =1 e (cid:96) z (cid:96) x ⊥ (cid:96) p −→ π with π (cid:54) = 0 .Then ˆ β is consistent if and only if N (cid:88) n =1 s n g n ¯ ε n p −→ . (6) Proof : By Proposition 1 and instrument relevance, ˆ β − β = (cid:80) Nn =1 s n g n ¯ ε ⊥ n (cid:80) Nn =1 s n g n ¯ x ⊥ n = (cid:80) Nn =1 s n g n ¯ ε ⊥ n (cid:80) L(cid:96) =1 e (cid:96) z (cid:96) x ⊥ (cid:96) = π (cid:80) Nn =1 s n g n ¯ ε ⊥ n + o p (1) , so the result is trivial absent a control vector (i.e. when ¯ ε n = ¯ ε ⊥ n ).Otherwise by the second two regularity conditions ˆ γ p −→ Ω − ww Ω w(cid:15) = γ and (cid:80) Nn =1 s n g n ¯ ε n − In Appendix A.2 we develop a stylized model to illustrate how the SSIV coefficient can differ from a “native” shock-level IV coefficient in the presence of local spillovers or treatment effect heterogeneity, though both parameters maybe of interest. Intuitively, in the labor supply case one may estimate a low regional elasticity but a high elasticity ofindustry labor supply if, for example, migration is constrained but workers are mobile across industries within a region. Nn =1 s n g n ¯ ε ⊥ n = (cid:80) L(cid:96) =1 e (cid:96) z (cid:96) (cid:0) ε (cid:96) − ε ⊥ (cid:96) (cid:1) = (cid:16)(cid:80) L(cid:96) =1 e (cid:96) z (cid:96) w (cid:48) (cid:96) (cid:17) (ˆ γ − γ ) p −→ by the first regularitycondition, so ˆ β − β = π (cid:80) Nn =1 s n g n ¯ ε n + o p (1) . (cid:3) Equation (6) is our shock-level orthogonality condition for SSIV, restricting the s n -weighted covarianceof the shocks g n and the aggregated structural residuals ¯ ε n = (cid:80) L(cid:96) =1 e (cid:96) s (cid:96)n ε (cid:96) (cid:80) L(cid:96) =1 e (cid:96) s (cid:96)n . Proposition 2 shows that forSSIV to be consistent, given first-stage relevance and three mild regularity conditions, this covariancemust be asymptotically zero. In our running labor supply example, such shock orthogonality holdswhen output subsidies g n are not systematically higher or lower for industries with higher ¯ ε n ; i.e. thosethat are most active (in terms of lagged employment s (cid:96)n e (cid:96) ) in the regions that face high unobservedlabor supply conditions ε (cid:96) .As a necessary condition, shock orthogonality is satisfied when the exposure shares are exogenous,as in the preferred interpretation of SSIV in Goldsmith-Pinkham et al. (2019). That is, if each s (cid:96)n is uncorrelated with a mean-zero structural error ε (cid:96) (given the importance weights e (cid:96) ), N is fixed,and a conventional law of large numbers applies to (cid:80) L(cid:96) =1 e (cid:96) s (cid:96)n ε (cid:96) for each n , then (¯ ε , . . . , ¯ ε N ) p → and equation (6) is satisfied for any fixed set of shocks g n . This share exogeneity generally rules outany unobserved shocks ν n that affect the outcome via the exposure shares (i.e. when (cid:80) Nn =1 s (cid:96)n ν n isincluded in ε (cid:96) ), even if such ν n are uncorrelated with the observed shocks g n , the exposure sharesare randomly assigned to observations, and N is allowed to grow (see Appendix A.3 for a formalargument). In the labor supply example this would mean there are no unobserved industry shocksimpacting the local labor market besides government output subsidies – a strong restriction in practice.When shares are endogenous, shock orthogonality (6) may instead be satisfied by certain propertiesof the SSIV shocks. We now develop a quasi-experimental framework that yields this result. Our approach to satisfying the shock orthogonality condition specifies a quasi-experiment in whichshocks are as-good-as-randomly assigned, mutually uncorrelated, large in number, and sufficientlydispersed in terms of their average exposure. Instrument relevance generally holds in such settingswhen the exposure of individual observations tends to be concentrated in a small number of shocks,and that those shocks affect treatment. We then show how this framework is naturally generalizedto settings in which shocks are only conditionally quasi-randomly assigned or exhibit some forms ofmutual dependence, such as clustering, and to settings with panel data. We also consider furtherextensions, to SSIV regressions with varying exposure share sums, estimated shocks, and multipleendogenous variables or instruments. 8 .1 Quasi-Randomly Assigned and Mutually Uncorrelated Shocks
To establish SSIV consistency, we consider a two-step data-generating process in which the shocks g n are drawn conditional on the shock-level unobservables ¯ ε n and exposure weights s n . Placingassumptions on this process, rather than on the sampling properties of observations, is akin to astandard analysis of randomized treatment assignment in experimental settings (Abadie et al. 2019)and has two key advantages. First, as explained below, conventional independent or clustered samplingprocesses are generally inconsistent with the shift-share data structure when the shocks are consideredrandom variables. Second, in conditioning on ¯ ε n and s n we place no restrictions on the dependencebetween the s (cid:96)n and ε (cid:96) , allowing shock exposure to be endogenous (i.e. ¯ ε n (cid:54) p −→ ). The following resultshows that such endogeneity does not pose problems for SSIV exclusion in our framework: Prop. 3 If Var [ g n | ¯ ε n , s n ] and E (cid:2) ¯ ε n | s n (cid:3) are uniformly bounded and the three regularity condi-tions in Proposition 2 hold, then shock orthogonality is satisfied by the following conditions: Assumption 1 (Quasi-random shock assignment) : E [ g n | ¯ ε n , s n ] = µ , for all n ; Assumption 2 (Many uncorrelated shocks) : E (cid:104)(cid:80) Nn =1 s n (cid:105) → and for all n and n (cid:48) (cid:54) = n , Cov [ g n , g n (cid:48) | ¯ ε n , ¯ ε n (cid:48) , s n , s n (cid:48) ] = 0 . Proof : See Appendix B.1.Proposition 3 shows that the SSIV exclusion restriction holds under two substantive assumptionsand two additional regularity conditions. In words, Assumption 1 states that the shocks g n are as-good-as-randomly assigned, in that the same mean shock µ is expected across n regardless of therealization of average exposure s n or the relevant unobservable ¯ ε n . As shown in Appendix B.1 thisis enough to make the left-hand side of the orthogonality condition (6) zero in large samples, inexpectation over different shock draws. For it to converge to this mean in probability, a shock-levellaw of large numbers must hold. This is ensured by Assumption 2, which states that shocks aremutually uncorrelated given the unobservables and that the expected Herfindahl index of averageexposure, E (cid:104)(cid:80) Nn =1 s n (cid:105) , converges to zero. The latter condition implies that the number of observedshocks grows with the sample (since (cid:80) Nn =1 s n ≥ /N ). An equivalent condition is that the largest s n becomes vanishingly small: that is, that the largest shock importance weight vanishes asymptotically. Both of these conditions, while novel for SSIV, would be standard requirements for the consistency ofa conventional shock-level IV estimator with s n weights. Appendix A.1 shows how SSIV identifies a convex average of heterogeneous treatment effects (varying potentiallyacross both (cid:96) and n ) under a stronger notion of as-good-as-random shock assignment and a first-stage monotonicitycondition. This can be seen as generalizing both the IV identification result of Angrist et al. (2000) to shift-shareinstruments, as well as the reduced-form shift-share identification result in Adão et al. (2019). Goldsmith-Pinkham et al. (2019) propose a different measure of the importance of a given n , termed “Rotembergweights.” In Appendix A.4 we show the formal connection between s n and Rotemberg weights, and that the latter do notcarry the sensitivity-to-misspecification interpretation as they do in the exogenous shares view of Goldsmith-Pinkhamet al. (2019). Instead, the Rotemberg weight of shock n measures the leverage of n in the equivalent shock-level IVregression from Proposition 1. Shocks may have large leverage either because of large s n , as would be captured by theHerfindahl index, or because the shocks have a heavy-tailed distribution, which is allowed by Proposition 3. (cid:96) which are commonly used in non-experimentalsettings. Independent sampling of the shift-share instrument is impossible when shocks are stochas-tic, since the shift-share instrument z (cid:96) = (cid:80) Nn =1 s (cid:96)n g n is inherently correlated across observations.Unobserved shocks ν n may further induce complex exposure-driven dependencies in the residual ε (cid:96) ,precluding independence of ( z (cid:96) , ε (cid:96) ) even conditionally on the observed g n . Independent samplingacross n in the equivalent shock-level IV regression of Proposition 1 is also untenable, since the aggre-gated outcome and treatment residuals ¯ y ⊥ n and ¯ x ⊥ n are constructed from a common set of observations.These dependency issues also complicate SSIV inference, as we discuss in Section 4.Second, and relatedly, the large- L and large- N asymptotic sequence of Proposition 3 is intendedto approximate the finite-sample distribution of the SSIV estimator, rather than to define an actualprocess for realizations of shocks or observations. In practice shift-share instruments are typicallyconstructed from a fixed population of relevant shocks (e.g. subsidies across all industries) and oftenfrom the full population of observations (e.g. all local labor markets). Our approach takes this datastructure seriously, relying on the quasi-random assignment of shocks to a given set of industries. Per Proposition 2, Assumptions 1 and 2 and the additional regularity conditions imply consistencyof ˆ β under the usual IV relevance condition, that (cid:80) L(cid:96) =1 e (cid:96) z (cid:96) x ⊥ (cid:96) p −→ π (cid:54) = 0 . In practice, the existence ofsuch a first stage can be inferred from the data. To see when instrument relevance might hold withquasi-experimental shocks, consider a stylized setting without controls w (cid:96) and where treatment is ashare-weighted average of shock-specific components: x (cid:96) = (cid:80) Nn =1 s (cid:96)n x (cid:96)n , where x (cid:96)n = π (cid:96)n g n + η (cid:96)n with π (cid:96)n ≥ ¯ π almost surely for some fixed ¯ π > . In line with Assumptions 1 and 2, suppose furtherthat the shocks are mean-zero and mutually independent given the exposure share and weight matrices s and e and the full set of π (cid:96)n and η (cid:96)n , with variances σ n ≥ σ g for some fixed σ g > . Then E (cid:34) L (cid:88) (cid:96) =1 e (cid:96) z (cid:96) x (cid:96) (cid:35) = E (cid:34) L (cid:88) (cid:96) =1 e (cid:96) (cid:32) N (cid:88) n =1 s (cid:96)n g n (cid:33) (cid:32) N (cid:88) n =1 s (cid:96)n ( π (cid:96)n g n + η (cid:96)n ) (cid:33)(cid:35) ≥ ¯ π ¯ σ g E (cid:34) L (cid:88) (cid:96) =1 e (cid:96) N (cid:88) n =1 s (cid:96)n (cid:35) , (7)and under appropriate regularity conditions (cid:80) L(cid:96) =1 e (cid:96) z ⊥ (cid:96) x (cid:96) converges in probability to this mean. In thiscase, SSIV relevance holds when the e (cid:96) -weighted average of local exposure Herfindahl indices (cid:80) Nn =1 s (cid:96)n across observations does not vanish in expectation. In our running labor supply example, where x (cid:96)n isindustry-by-region employment growth, SSIV relevance thus arises from individual regions (cid:96) tending This is similar to the way Bekker (1994) uses a non-standard asymptotic sequence to analyze IV estimators withmany weak instruments: “The [asymptotic] sequence is designed to make the asymptotic distribution fit the finite sampledistribution better. It is completely irrelevant whether or not further sampling will lead to samples conforming to thissequence” (p. 658).
10o specialize in a small number of industries n , provided subsidies have a non-vanishing effect on localindustry employment. Compare this condition to the Herfindahl index condition in Assumption2, which instead states that the average shares of industries across locations s n grow small. Bothconditions may simultaneously hold when most regions specialize in a small number of industries,differentially across a large number of industries. While Proposition 3 establishes SSIV consistency when shocks have the same expectation across n and are mutually uncorrelated, both requirements are straightforward to relax. Here we provideextensions of Assumptions 1 and 2 that allow the shock expectation to depend on observables and forweak mutual dependence (such as clustering or serial correlation) of the residual shock variation.As with other research designs, one may wish to assume that Assumptions 1 and 2 only holdconditionally on a vector of shock-level observables q n (that includes a constant). For example, itmay be more plausible that shocks are as-good-as-randomly assigned within a set of observed clusters c ( n ) ∈ { , . . . , C } with non-random cluster-average shocks, in which case q n collects C − clusterdummies and a constant. In general we consider the following weakened version of Assumption 1: Assumption 3 (Conditional quasi-random shock assignment) : E [ g n | ¯ ε n , q n , s n ] = q (cid:48) n µ , for all n . Similarly, one may prefer to impose the mutual uncorrelatedness condition of Assumption 2 on theresidual g ∗ n = g n − q (cid:48) n µ , in place of g n : Assumption 4 (Many uncorrelated shock residuals) : E (cid:104)(cid:80) Nn =1 s n (cid:105) → and for all n and n (cid:48) (cid:54) = n , Cov [ g ∗ n , g ∗ n (cid:48) | ¯ ε n , ¯ ε n (cid:48) , s n , s n (cid:48) ] = 0 . In the shock cluster example, Assumption 4 would allow the number of clusters to remain small, eachwith its own random effect, as in that case a law of large numbers may apply to the within-clusterresiduals g ∗ n but not the original shocks g n .By a simple extension of the proof to Proposition 3, the orthogonality condition (6) is satis-fied when these conditions replace Assumptions 1 and 2 and the residual shift-share instrument z ∗ (cid:96) = (cid:80) Nn =1 s (cid:96)n g ∗ n replaces z (cid:96) . While this instrument is infeasible, the following result shows it implic-itly drives variation in SSIV regressions that control for the exposure-weighted vector of shock-levelcontrols, w ∗ (cid:96) = (cid:80) Nn =1 s (cid:96)n q n : Prop. 4
Suppose Assumptions 3 and 4 hold,
Var [ g ∗ n | ¯ ε n , q n , s n ] and E (cid:2) ¯ ε n | q n , s n (cid:3) are uniformlybounded, and the regularity conditions in Proposition 2 hold. Then shock orthogonality(6) is satisfied provided w ∗ (cid:96) is included in w (cid:96) . Note that this precludes consideration of an asymptotic sequence where L remains finite as N grows. With L (andalso e , . . . , e L ) fixed, Assumption 2 implies (cid:80) L(cid:96) =1 e (cid:96) E (cid:104)(cid:80) Nn =1 s (cid:96)n (cid:105) → and thus Var [ z (cid:96) ] = Var (cid:104)(cid:80) Nn =1 s (cid:96)n g n (cid:105) → foreach (cid:96) if Var [ g n ] is bounded. If the instrument has asymptotically no variation it cannot have a first stage, unless the π (cid:96)n grow without bound. roof : See Appendix B.2.In particular, Proposition 4 shows that controlling for each observation’s individual exposure to eachcluster, (cid:80) Nn =1 s (cid:96)n [ c ( n ) = c ] , isolates the within-cluster variation in shocks.Even conditional on observables, mutual shock uncorrelatedness may be undesirably strong. It is,however, straightforward to further relax this assumption to allow for shock assignment processes withweak mutual dependence, such as further clustering or autocorrelation. In Appendix B.1 we provegeneralizations of Proposition 3 which replace Assumption 2 with one of the following alternatives,with generalizations of Proposition 4 following analogously: Assumption 5 (Many uncorrelated shock clusters) : There exists a partition of industries into clusters c ( n ) such that E (cid:104)(cid:80) Cc =1 s c (cid:105) → for s c = (cid:80) n : c ( n )= c s n and for all n and n (cid:48) such that c ( n ) (cid:54) = c ( n (cid:48) ) , Cov (cid:2) g n , g n (cid:48) | ¯ ε n , ¯ ε n (cid:48) , s c ( n ) , s c ( n (cid:48) ) (cid:3) = 0 ; Assumption 6 (Many weakly-correlated shocks) : For some sequence of numbers B L ≥ and afixed function f ( · ) ≥ with (cid:80) ∞ r =1 f ( r ) < ∞ , B L E (cid:104)(cid:80) Nn =1 s n (cid:105) → and for all n and n (cid:48) | Cov [ g n , g n (cid:48) | ¯ ε n , ¯ ε n (cid:48) , s n , s n (cid:48) ] | ≤ B L · f ( | n (cid:48) − n | ) .Here Assumption 5 relaxes Assumption 2 by allowing shocks to be grouped within mutually mean-independent clusters c ( n ) , while placing no restriction on within-cluster shock correlation. At thesame time, the Herfindahl index assumption of Assumption 2 is strengthened to hold for industryclusters, with s c denoting the average exposure of cluster c . Assumption 6 takes a different approach,allowing all nearby shocks to be mutually correlated provided their covariance is bounded by a function B L · f ( | n (cid:48) − n | ) . This accommodates, for example, the case of first-order autoregressive time serieswith the covariance bound declining at a geometric rate, i.e. f ( r ) = δ r for δ ∈ [0 , and constant B L .With B L growing, stronger dependence of nearby shocks is also allowed (see Appendix B.1). In practice, SSIV regressions are often estimated with panel data, where the outcome y (cid:96)t , treatment x (cid:96)t , controls w (cid:96)t , importance weights e (cid:96)t , exposure shares s (cid:96)nt , and shocks g nt are additionally indexedby time periods t = 1 , . . . , T . In such settings a time-varying instrument, z (cid:96)t = N (cid:88) n =1 s (cid:96)nt g nt , (8)is used, and the controls w (cid:96)t may include unit- or period-specific fixed effects. Exposure shares are typically lagged and sometimes fixed in a pre-period. Our subscript t notation indicates thatthese shares are used to construct the instrument for period t , not that they are measured in that period. We also notethat, as in our running labor supply example, the SSIV outcome, treatment, and shocks may be already measured aschanges or growth rates over time in the previous “cross-sectional” discussion.
12t is straightforward to map this panel case to the previous cross-sectional setting by a simplerelabeling. Let ˜ (cid:96) ∈ { ( (cid:96), t ) : (cid:96) = 1 , . . . , L ; t = 1 , . . . , T } and ˜ n ∈ { ( n, t ) : n = 1 , . . . , N ; t = 1 , . . . , T } ,with the time-varying outcomes now indexed as y ˜ (cid:96) and similarly for x ˜ (cid:96) , w ˜ (cid:96) , e ˜ (cid:96) , and g ˜ n . Further let ˜ s ˜ (cid:96) ˜ n =˜ s ( (cid:96),t ) , ( n,p ) = s (cid:96)nt [ t = p ] denote the exposure of observation (cid:96) in period t to shock n in period p , whichis by definition zero for t (cid:54) = p . The time-varying instrument (8) can then be rewritten z ˜ (cid:96) = (cid:80) ˜ n ˜ s ˜ (cid:96) ˜ n g ˜ n ,as in the cross-sectional case. Generalizations of the equivalency result (Proposition 1), orthogonalitycondition (Proposition 2), and quasi-experimental framework (Propositions 3–4) immediately follow.Unpacking these results in the panel case delivers three new insights. First, consistency of thepanel SSIV estimator is established as LT → ∞ , with the Herfindahl condition in Proposition 3 (that E (cid:2)(cid:80) ˜ n s n (cid:3) = E (cid:104)(cid:80) Nn =1 (cid:80) Tt =1 s nt (cid:105) → ) requiring N T → ∞ . This means that our quasi-experimentalframework can be applied to settings, such as that of Nunn and Qian (2014), with relatively fewshocks N (and perhaps few observations L ) in the cross-section, but many time periods T . Theasymptotic sequence may also well-approximate settings like those of Berman et al. (2017) and Imbertet al. (2019) where N T is large despite moderate N and T . Second, while shocks must be mutuallyuncorrelated across periods under the baseline Assumption 2, arbitrary clustering across periods canbe accommodated by Assumption 5, provided N → ∞ . If N is finite but T → ∞ , weak serialdependence is accommodated by Assumption 6. A third set of insights concerns the role of fixed effect (FE) controls in panel SSIV regressions.As in any panel regression, unit fixed effects purge time-invariant unobservables from the residual (cid:15) (cid:96) .Assumption 1 would thus hold if shocks are as-good-as-randomly assigned with respect to aggregatedtime-varying unobservables. However, when exposure shares are fixed across periods, i.e. s (cid:96)nt ≡ s (cid:96)n ,unit FEs can also be understood as isolating the time-varying variation in shocks . This follows fromProposition 4: exposure-weighted averages of the shock FEs in q nt , w ∗ (cid:96)t ¯ n = (cid:80) n s (cid:96)n [ n = ¯ n ] , aretime-invariant and thus subsumed by the unit FEs. A similar argument applies to period FEs, whichisolate within-period shock variation. This only relies on exposure shares adding up to one, since thenone can represent period FEs in the original data as exposure-weighted averages of shock-level periodFEs in the relabeled data, i.e. [ t = ¯ t ] = (cid:80) ( n,p ) ˜ s ˜ (cid:96) ( n,p ) [ p = ¯ t ] . To make these insights concrete, consider our labor supply example in a panel setting with subsidiesallocated to industries in each period. Imagine that certain industries get permanent subsidy shocksthat are not as-good-as-randomly assigned across industries. In that case, one may prefer to only usethe changes in industry subsidies over time as identifying variation. With fixed exposure shares, oneway to proceed is to include region FEs in the SSIV specification, which implicitly control for industry Proposition 4 also suggests an alternative way to handle serial correlation when the time series properties of shocksare known. For example given a first-order autoregressive process g nt = ρ + ρ g n,t − + g ∗ nt , controlling for theexposure-weighted average of past shocks (cid:80) Nn =1 s (cid:96)nt g n,t − extracts the idiosyncratic shock component g ∗ nt . Technically these observations hold under the regularity conditions used to prove Propositions 2–4, which requirethe control coefficient vector γ to be consistently estimated as LT → ∞ , even if w (cid:96)t contains an increasing numberof FEs (see footnote 5). In Appendix A.5 we show how stronger notions of shock exogeneity address this incidentalparameters problem. ∆ y (cid:96)t = β ∆ x (cid:96)t + γ (cid:48) ∆ w (cid:96)t + ∆ ε (cid:96)t , (9)instrumenting ∆ x (cid:96)t with z (cid:96)t,F D = (cid:80) n s (cid:96)n,t − ∆ g nt , where ∆ is the first-differencing operator for bothobservations and shocks. We conclude this section by presenting several further extensions of the quasi-experimental SSIVframework, accommodating features of shocks and shares that are often encountered in practice.
The incomplete shares problem
While we have previously assumed the sum of exposure shares is constant, in practice this S (cid:96) = (cid:80) Nn =1 s (cid:96)n may vary across (cid:96) . For example, in the labor supply setting, government output subsidiesmay be only introduced to manufacturing industries while the lagged manufacturing employmentshares of s (cid:96)n may be measured relative to total employment in region (cid:96) . In this case S (cid:96) correspondsto the lagged total share of manufacturing employment in region (cid:96) .Our framework highlights a problem with such “incomplete share” settings: even if Assumptions 1and 2 hold, the SSIV estimator will in general leverage non-experimental variation in S (cid:96) , in additionto quasi-experimental variation in shocks. To see this formally, note that one can always rewrite theshift-share instrument with the “missing” (e.g., non-manufacturing) shock included to return to thecomplete shares setting: z (cid:96) = s (cid:96) g + N (cid:88) n =1 s (cid:96)n g n , (10) There is another argument for fixing the shares in a pre-period that applies when the current shares are affected bylagged shocks in a way that is correlated with unobservables ε (cid:96)t . In the labor supply example suppose local labor marketsvary in flexibility, with stronger reallocation of employment to industries with bigger subsidies in flexible markets. Ifsubsides are random but persistent, more subsidized industries will be increasingly concentrated in regions with flexiblelabor markets and Assumption 1 will be violated if such flexibility is correlated with ε (cid:96)t . This concern is distinct fromthat in Jaeger et al. (2018) who focus on the endogeneity of shares to the lagged residuals, rather than shocks, in asetting closer to Goldsmith-Pinkham et al. (2019). Jaeger et al. also point out another issue relevant to panel SSIV,that the outcome may respond to both current and lagged shocks; we return to this issue in Section 3.4. g = 0 and s (cid:96) = 1 − S (cid:96) , such that (cid:80) Nn =0 s (cid:96)n = 1 for all (cid:96) . The previous quasi-experimentalframework then applies to this expanded set of shocks g , . . . , g N . Since g = 0 , Proposition 3 requiresin this case that E [ g n | s n , ¯ ε n ] = 0 for n > as well; that is, that the expected shock to eachmanufacturing industry is the same as the “missing” non-manufacturing shock of zero. Otherwise,even if the manufacturing shocks are random, regions with higher manufacturing shares S (cid:96) will tendto have systematically different values of the instrument z (cid:96) , leading to bias when these regions alsohave different unobservables. Cast in this way, the incomplete shares problem has a natural solution via Assumption 3. Namely,one can allow the missing and non-missing shocks to have endogenously different means by condi-tioning on the indicator [ n > in the q n vector. By Proposition 4, the SSIV estimator allows forsuch conditional quasi-random assignment when the control vector w (cid:96) contains the exposure-weightedaverage of [ n > , which is here (cid:80) Nn =0 s (cid:96)n [ n >
0] = S (cid:96) . Thus, in the labor supply example,quasi-experimental variation in manufacturing shocks is isolated in regressions with incomplete sharesprovided one controls for a region’s lagged manufacturing share S (cid:96) .Two further points on incomplete shares are worth highlighting. First, allowing the observed shockmean to depend on other observables will tend to involve controlling for share-weighted averages ofthese controls interacted with the indicator [ n > . For example with period indicators in q nt , within-period shock variation is isolated by controlling for sums of period-specific exposure S (cid:96)t , interacted withperiod indicators (i.e. (cid:80) Nn =0 s (cid:96)nt [ n > [ t = ¯ t ] = S (cid:96)t [ t = ¯ t ] ). Second, by effectively “dummyingout” the missing industry, SSIV regressions that control for S (cid:96) require a weaker Herfindahl condition: E (cid:104)(cid:80) Nn =1 s n (cid:105) → , allowing the non-manufacturing industry share s to stay large. Shift-share designs with estimated shocks
In some shift-share designs, the shocks are equilibrium objects that can be difficult to view as beingquasi-randomly assigned. For example, in the canonical Bartik (1991) estimation of the regionallabor supply elasticity, the shocks are national industry employment growth rates. Such growthreflects labor demand shifters, which one may be willing to assume are as-good-as-randomly assignedacross industries. However industry growth also aggregates regional labor supply shocks that directlyenter the residual ε (cid:96) . Here we show how the quasi-experimental SSIV framework can still applyin such cases, by viewing the g n as noisy estimates of some latent true shocks g ∗ n (labor demandshifters, in the Bartik (1991) example) which satisfy Assumption 1. We establish the conditions onestimation noise (aggregated labor supply shocks, in Bartik (1991)) such that a feasible shift-shareinstrument estimator, perhaps involving a leave-one-out correction as in Autor and Duggan (2003), is Formally, if Assumptions 1 and 2 hold for all n > we have from the proof to Proposition 3 that (cid:80) Nn =1 s n g n ¯ ε n = (cid:80) Nn =0 s n g n ¯ ε n = E (cid:104)(cid:80) Nn =0 s n ( g n − µ )¯ ε n (cid:105) + o p (1) = − µ E [ s ¯ ε ] + o p (1) . If µ (cid:54) = 0 and the missing industry share is large( s (cid:54) p −→ ) this can only converge to zero when E [ s ¯ ε ] = E (cid:104)(cid:80) L(cid:96) =1 e (cid:96) s (cid:96) ε (cid:96) (cid:105) does, i.e. when S (cid:96) is exogenous. g n can be written as weighted averages of thegrowth of each industry in each region: g n = (cid:80) L(cid:96) =1 ω (cid:96)n g (cid:96)n , where the weights ω (cid:96)n are the lagged shareof industry employment located in region (cid:96) , with (cid:80) L(cid:96) =1 ω (cid:96)n = 1 for each n . In a standard model ofregional labor markets, g (cid:96)n includes (to first-order approximation) an industry labor demand shock g ∗ n and a term that is proportional to the regional supply shock ε (cid:96) . We suppose that the demand shocksare as-good-as-randomly assigned across industries, such that the infeasible SSIV estimator which uses z ∗ (cid:96) = (cid:80) Nn =1 s (cid:96)n g ∗ n as an instrument satisfies our quasi-experimental framework. The asymptotic bias ofthe feasible SSIV estimator which uses z (cid:96) = (cid:80) Nn =1 s (cid:96)n g n then depends on the large-sample covariancebetween the labor supply shocks ε (cid:96) and an aggregate of the supply shock “estimation error,” ψ (cid:96) = z (cid:96) − z ∗ (cid:96) ∝ N (cid:88) n =1 s (cid:96)n L (cid:88) (cid:96) (cid:48) =1 ω (cid:96) (cid:48) n ε (cid:96) (cid:48) . (11)Two insights follow from considering the bias term (cid:80) L(cid:96) =1 e (cid:96) ψ (cid:96) ε (cid:96) . First, part of the covariancebetween ψ (cid:96) and ε (cid:96) is mechanical, since ε (cid:96) enters ψ (cid:96) . In fact, if supply shocks are spatially uncorrelatedthis is the only source of bias from using z (cid:96) rather than z ∗ (cid:96) as an instrument. This motivates the use of aleave-one-out (LOO) shock estimator, g n, − (cid:96) = (cid:80) (cid:96) (cid:48) (cid:54) = (cid:96) ω (cid:96) (cid:48) n g (cid:96) (cid:48) n / (cid:80) (cid:96) (cid:48) (cid:54) = (cid:96) ω (cid:96) (cid:48) n , and the feasible instrument z LOO(cid:96) = (cid:80) Nn =1 s (cid:96)n g n, − (cid:96) to remove this mechanical covariance. Conversely, if the regional supplyshocks ε (cid:96) are spatially correlated a LOO adjustment may not be sufficient to eliminate mechanical biasin the feasible SSIV instrument, though more restrictive split-sample methods (e.g. those estimatingshocks from distant regions) may suffice.Second, in settings where there are many regions contributing to each shock estimate even themechanical part of (cid:80) L(cid:96) =1 e (cid:96) ψ (cid:96) ε (cid:96) may be ignorable, such that the conventional non-LOO shift-shareinstrument z (cid:96) (which, unlike z LOO(cid:96) , has a convenient shock-level representation per Proposition 1) isasymptotically valid when z LOO(cid:96) is. In Appendix A.7, we derive a heuristic for this case, under theassumption of spatially-independent supply shocks. In a special case when each region is specializedin a single industry and there are no importance weights, the key condition is
L/N → ∞ , or that theaverage number of regions specializing in the typical industry is large. With incomplete specializationor weights, the corresponding condition requires the typical industry to be located in a much largernumber of regions than the number of industries that a typical region specializes in. Appendix A.8 presents such a model, showing that g (cid:96)n also depends on the regional average of g ∗ n (via local generalequilibrium effects) and on idiosyncratic region-specific demand shocks. Both of these are uncorrelated with the errorterm in the model and thus do not lead to violations of Assumption 1; we abstract away from this detail here. This problem of mechanical bias is similar to that of two-stage least squares with many instruments (Bound etal. 1995), and the solution is similar to the jackknife instrumental variable estimate approach of Angrist et al. (1999). Adão et al. (2019) derive the corrected standard errors for LOO SSIV and find that they are in practice very closeto the non-LOO ones, in which case the SSIV standard errors we derive in the next section are approximately valideven when the LOO correction is used.
16o illustrate the preceding points in the data, Appendix A.7 replicates the setting of Bartik (1991)with and without a LOO estimator, using data from Goldsmith-Pinkham et al. (2019). We find thatin practice the LOO correction does not matter for the SSIV estimate, consistent with the findingsof Goldsmith-Pinkham et al. (2019) and Adão et al. (2019), and especially so when the regression isestimated without regional employment weights. Our framework provides a explanation for this: theheuristic statistic we derive is much larger without importance weights. These findings imply thatif, in the canonical Bartik (1991) setting, one is willing to assume quasi-random assignment of theunderlying industry demand shocks and that the regional supply shocks are spatially-uncorrelated,one can interpret the uncorrected SSIV estimator as leveraging demand variation in large samples, assome of the literature has done (e.g. Suárez and Zidar (2016)).
Multiple shocks and treatments
In some shift-share designs one may have access to multiple sets of shocks satisfying Assumptions1 and 2 or their extensions. For example while Autor et al. (2013) construct an instrument fromaverage Chinese import growth across eight non-U.S. countries, in principle the industry shocks fromeach individual country may be each thought to be as-good-as-randomly assigned. In other settingsmultiple endogenous variables may be required for the SSIV exclusion restriction to plausibly hold. Forexample Jaeger et al. (2018) show that when local labor markets respond dynamically to immigrantinflows, it may be necessary to instrument both the current and lagged immigrant growth rate withcurrent and lagged shift-share instruments. Another example is provided by Bombardini and Li(2019), who estimate the reduced-form effects of two shocks: the regional growth of all exports andthe regional growth of exports in pollution-intensive sectors. Both are shift-share variables based onthe same regional employment shares across industries but different shocks: an overall industry exportshock and the overall shock interacted with industry pollution intensity. In Appendix A.9 we show how these extensions fit into our quasi-experimental framework. The keyinsight is that SSIV regressions with multiple instruments – with and without multiple endogenousvariables – again have an equivalent representation as particular shock-level IV estimators provided theexposure shares used to construct the instruments are the same. This immediately implies extensionsof the foregoing results that establish consistency for just-identified SSIV regressions with multipleinstruments, such as the dynamic adjustment case of Jaeger et al. (2018). In overidentified settings,the appendix derives new shock-level IV estimators that optimally combine the quasi-experimentalvariation and permit omnibus tests of the identifying assumptions, via the generalized method ofmoments theory of Hansen (1982) and inference results discussed in the next section. For example The instruments here are (cid:80) Nn =1 s (cid:96)n g n and (cid:80) Nn =1 s (cid:96)n g n q n where q n denotes industry n ’s pollution intensity. Ourframework applies in this case even if q n is not randomly assigned: as long as the export shock g n satisfies an appropriateversion of Assumption 1, E [ g n | q n , ¯ ε n , s n ] = µ , the interacted shock satisfies E [ g n q n | q n , ¯ ε n , s n ] = µq n , i.e. Assumption3. The natural extension of Proposition 4 to multiple instruments applies as long as (cid:80) Nn =1 s (cid:96)n q n , a measure of pollutionintensity of the region, is controlled for, as Bombardini and Li (2019) do in some specifications. ¯ y ⊥ n on ¯ x ⊥ n , weighted by s n andinstrumented by multiple shocks g n , . . . , g Jn yields an efficient estimate of β . A shock-level view of SSIV also brings new insights to inference and testing. In this section we firstshow how a problem with conventional inference in SSIV settings, first studied by Adão et al. (2019),has a convenient solution based on the equivalence result in Proposition 1. In particular, we showthat conventional standard error calculations are asymptotically valid when quasi-experimental SSIVcoefficients are estimated at the level of identifying variation (shocks). We then discuss how other novelshock-level procedures can be used to assess first-stage relevance and to implement valid falsificationtests of the SSIV exclusion restriction. Lastly, we summarize a variety of Monte-Carlo simulationsillustrating the finite-sample properties of SSIV and relating them to conventional shock-level analyses.
As with consistency, SSIV inference is complicated by the fact that the observed shocks g n and anyunobserved shocks ν n induce dependencies in the instrument z (cid:96) and residual ε (cid:96) across observationswith similar exposure shares, making it implausible to treat ( z (cid:96) , ε (cid:96) ) as independent draws. Thisproblem can be understood as an extension of the standard clustering concern (Moulton 1986), inwhich the instrument and structural residual are correlated across observations within predeterminedclusters, with the additional complication that in SSIV every pair of observations with overlappingshares may have correlated ( z (cid:96) , ε (cid:96) ) . Adão et al. (2019) study and develop solutions to this problemin a setting that builds on our own quasi-experimental framework. Their analysis shows both thatsuch “exposure clustering” is likely to arise in economic models motivating SSIV regressions and thatit can lead to misleading conventional inference procedures in practice. In Monte-Carlo simulations,for example, Adão et al. (2019) show that tests based on conventional standard errors with nominal5% significance can reject a true null in 55% of placebo shock realizations.Our equivalence result motivates a novel and convenient solution to SSIV exposure-based clus-tering. Namely, we next show that by estimating SSIV coefficients with an equivalent shock-levelIV regression one obtains standard errors that are exposure-robust – i.e. asymptotically valid in theprimary framework of Adão et al. (2019). This framework assumes that (as in Proposition 4) thecontrol vector can be partitioned as w (cid:96) = [ w ∗(cid:48) (cid:96) , u (cid:48) (cid:96) ] (cid:48) , where w ∗ (cid:96) = (cid:80) Nn =1 s (cid:96)n q n for some q n capturingall sources of shock-level confounding, while the other controls in u (cid:96) are included only to potentiallyincrease the efficiency of the estimator and not asymptotically correlated with z (cid:96) . To formalize thiswe follow Adão et al. (2019) in writing the first stage as x (cid:96) = (cid:80) n s (cid:96)n π (cid:96)n g n + η (cid:96) and considering thefollowing strengthened version of Assumption 3: 18 ssumption 7 (Strong conditional quasi-random shock assignment) : For all n , E (cid:2) g n | { q n (cid:48) } n (cid:48) , { u (cid:96) , (cid:15) (cid:96) , η (cid:96) , { s (cid:96)n (cid:48) , π (cid:96)n (cid:48) } n (cid:48) , e (cid:96) } (cid:96) (cid:3) = q (cid:48) n µ. We further adopt a strengthened version of Assumption 4 (Assumption A1 in Appendix B.3), whichassumes fully-independent shock residuals and a stronger condition on concentration of average expo-sure s n , as well as additional regularity conditions (Assumption A2 in Appendix B.3) consistent withAdão et al. (2019). We then have the following result: Prop. 5
Consider s n -weighted IV estimation of the second stage equation ¯ y ⊥ n = α + β ¯ x ⊥ n + q (cid:48) n τ + ¯ ε ⊥ n (12)where w ∗ (cid:96) = (cid:80) Nn =1 s (cid:96)n q n is included in the control vector w (cid:96) used to compute ¯ y ⊥ n and ¯ x ⊥ n , and ¯ x ⊥ n is instrumented by g n . The IV estimate of β is numerically equivalentto the SSIV estimate ˆ β . Furthermore when Assumptions 7, A1, and A2 hold, and (cid:80) L(cid:96) =1 e (cid:96) x ⊥ (cid:96) z (cid:96) p −→ π for π (cid:54) = 0 , the conventional heteroskedasticity-robust standard errorfor ˆ β yields asymptotically-valid confidence intervals for β . Proof : See Appendix B.3.Proposition 5 provides a straightforward way to compute standard errors that are valid regardlessof the correlation structure of the error term ε (cid:96) ; rather, its validity derives from the properties of theshocks, in line with our quasi-experimental framework. Equation (12) extends the previous shock-levelestimating equation (4) to include a vector of controls q n which, as in Proposition 4, are included inthe SSIV control vector w (cid:96) as exposure-weighted averages. The first result in Proposition 5 is that theaddition of these controls does not alter the equivalence of ˆ β . The second result establishes conditionsunder which conventional shock-level standard error calculations from estimation of (12) yield validasymptotic inference. Appendix B.3 further shows that absent any controls, i.e. with w (cid:96) = q n = 1 ,the shock-level robust standard error for β is numerically equivalent to the standard error formulathat Adão et al. (2019) propose. Outside this case, the appendix shows that standard errors estimatedusing our procedure are likely to be smaller than those of Adão et al. (2019) in finite samples.To understand Proposition 5, it is useful to relate it to a conventional solution to the standardclustering problem. The clustering environment can be viewed as a special case of shift-share IV inwhich the exposure shares are binary: s (cid:96),n ( (cid:96) ) = 1 for some n ( (cid:96) ) ∈ { , . . . , N } , with s (cid:96)n = 0 otherwise,for each (cid:96) . In this case z (cid:96) = g n ( (cid:96) ) , such that the shift-share instrument is constant among observationswith the same n ( (cid:96) ) . The usual clustering concern would then be that ε (cid:96) is also correlated withinthe n ( (cid:96) ) groupings. One solution is to estimate a grouped-data regression at the level of identifyingvariation (Angrist and Pischke 2008, p. 313). Proposition 5 generalizes this solution by running a19egression at the level of quasi-experimental shocks. Indeed, in the binary shares case all variables ¯ v n in equation (12) correspond to importance-weighted group averages of the corresponding v (cid:96) .This shock-level approach to obtaining valid SSIV standard errors has three practical advantages.First, it can be performed with standard statistical software packages given an simple initial trans-formation of the data (i.e. to obtain ¯ y ⊥ n , ¯ x ⊥ n , and s n ), for which we have released a Stata package ssaggregate (see footnote 3). Second, it is readily extended to settings where shocks are clustered (acase also considered by Adão et al. (2019)) or autoregressive, implying Assumptions 5 and 6, respec-tively; conventional cluster-robust or heteroskedastic-and-autocorrelation-consistent (HAC) standarderror calculations applied to equation (12) are then valid. Third, Appendix B.3 shows that the shock-level inference approach continues to work in some cases where the original standard error calculationprocedure of Adão et al. (2019) (which involves regressing z ⊥ (cid:96) on the vector of shares) fails: when N > L or when some exposure shares are collinear. Our shock-level equivalence also provides a convenient implementation for alternative inferenceprocedures which may have superior finite-sample performance. Adão et al. (2019) show, in partic-ular, how standard errors that impose a given null hypothesis β = β in estimating the residual ε (cid:96) can generate confidence intervals with better coverage in situations with few shocks (and a similarargument can be made in the case of shocks with a heavy-tailed distribution). Building on Propo-sition 5, such confidence intervals can be constructed in the same way as in any regular shock-levelIV regression. To test β = β , one regresses ¯ y ⊥ n − β ¯ x ⊥ n on the shocks g n (weighting by s n andincluding any relevant shock-level controls q n ) and uses a null-imposed residual variance estimate.This procedure corresponds to the standard shock-level Lagrange multiplier test for β = β that canbe implemented by standard statistical software. The confidence interval for β is constructed bycollecting all candidate β that are not rejected. Our Proposition 5 also provides a practical way to perform valid regression-based tests of the SSIVexclusion restriction (i.e. falsification tests) and first-stage relevance. In the quasi-experimental SSIVframework, such tests also require computing exposure-robust standard errors, and shock-level infer-ence procedures allow for convenient implementation via equivalent shock-level IV regressions.As usual, the SSIV exclusion restriction cannot be tested directly. However, indirect falsification This is an empirically relevant issue: for instance, employment shares of some industries are collinear in the Autoret al. (2013) setting. To give one example, SIC code 2068 “Salted and roasted nuts and seeds” was part of code 2065“Candy and other confectionery products” until the 1987 revision of the classification; the rest of code 2065 we reassignedto code 2064. Therefore, when using 1980 employment shares to construct the shift-share instrument for the 1990s,Autor et al. (2013) allocate employment between 2064 and 2068 codes proportionately. As explained by Adão et al. (2019), the problem that this “AKM0” confidence interval addresses generalizes thestandard finite-sample bias of cluster-robust standard errors with few clusters (Cameron and Miller 2015). With few orheavy-tailed shocks, estimates of the residual variance will tend to be biased downwards, leading to undercoverage ofconfidence intervals based on standard errors that do not impose the null. For example in Stata one can use the ivreg2 overidentification test statistic from regressing ¯ y ⊥ n − β ¯ x ⊥ n on q n withno endogenous variables and with g n specified as the instrument (again with s n weights). r (cid:96) thought to proxy for the structural error ε (cid:96) .Namely one may test whether r (cid:96) is uncorrelated with the shift-share instrument z (cid:96) , while controllingfor w (cid:96) . Examples of such an r (cid:96) may include a baseline characteristic realized prior to the shocks or alagged observation of the outcome y (cid:96) (resulting in what is often called a “pre-trend” test). To interpretthe magnitude of the reduced form falsification regression coefficient, researchers may also scale it bythe first stage regression of x (cid:96) , yielding a placebo SSIV coefficient.As with any SSIV regression, inference for such falsification tests must account for the exposure-induced correlation of z (cid:96) across observations with similar exposure profiles; the insights of Adão etal. (2019) and the previous section apply directly to this case. For exposure-robust inference on aplacebo e (cid:96) -weighted SSIV regression of r (cid:96) on x (cid:96) , instrumented by z (cid:96) , one may use the conventionalstandard errors from an s n -weighted regression of ¯ r ⊥ n on ¯ x ⊥ n , instrumenting by g n and controlling forany shock-level covariates q n . Similarly, for valid inference from a reduced form regression of r (cid:96) on z (cid:96) , one may use the conventional standard errors from an IV regression of ¯ r ⊥ n on ¯ z ⊥ n , with the sameinstrument, weights, and controls (see footnote 6). If a researcher observes a shock-level confounder r n , they can construct its observation-level average r (cid:96) = (cid:80) Nn =1 s (cid:96)n r n and perform a similar test.Unlike exclusion, the SSIV relevance condition can be evaluated directly, via OLS regressions of x (cid:96) on z (cid:96) that control for w (cid:96) . For exposure-robust inference on this OLS coefficient, one may againuse an equivalent shock-level IV regression: of ¯ x ⊥ n on ¯ z ⊥ n , instrumenting by g n , weighting by s n , andcontrolling for q n . The F -statistic, which is a common heuristic for the strength of the first-stagerelationship, is then obtained as a squared coefficient t -statistic. We generalize this result to the caseof multiple shift-share instruments in Appendix A.9 by detailing the appropriate construction of the“effective” first-stage F -statistic of Montiel Olea and Pflueger (2013), again based on the equivalentshock-level IV regression. Though the exposure-robust standard errors obtained from estimating equation (12) are asymptoti-cally valid, it is useful to verify that they offer appropriate coverage with a finite number of observationsand shocks. Of interest especially is whether the finite-sample performance of the equivalent regression(12) is comparable to that of more conventional shock-level IV regressions, in which the outcome andinstrument are not aggregated from a common set of y (cid:96) and x (cid:96) .In Appendix A.10 we provide Monte-Carlo evidence suggesting that the finite-sample properties of One might also consider a simpler shock-level OLS regression of ¯ r ⊥ n on g n weighted by s n and controlling for q n (i.e.the reduced form of the proposed IV regression). This produces a coefficient that typically cannot be generated fromthe original observations of ( r (cid:96) , z (cid:96) , w (cid:96) ) . Moreover, the power of the shock-level OLS balance test is likely to be lowerthan the proposed IV: the robust Wald statistic for both tests has the same form, (cid:16)(cid:80) Nn =1 s n ¯ r ⊥ n ˆ g n (cid:17) / (cid:80) s n ˆ κ n ˆ g n , where ˆ g n is the residual from an auxiliary s n -weighted projection of g n on q n and the only difference is in the ¯ r ⊥ n residuals, ˆ κ n . Under the alternative model of r (cid:96) = α + α z (cid:96) + κ (cid:96) , with α (cid:54) = 0 , the variance of these ˆ κ n is likely to be smallerin the correctly specified IV balance test than the OLS regression, leading to a likely higher value of the test statistic. F -statistic – applies equally well toboth the SSIV and the traditional IV estimators, when computed for SSIV as describe in AppendixA.9. Finally, our results also show that with the Herfindahl concentration index (cid:80) Nn =1 s n as high as / (i.e. with the “effective sample size” of the shock-level analyses as low as 20), the asymptoticapproximation for the SSIV estimator is still reasonable – with rejection rates in the vicinity of 7%for tests with a nominal size of 5%, despite heavy tails of the shock distribution. Together, theseresults indicate that a researcher who is comfortable with the finite-sample performance of a shock-level analysis with some set of g n should also be comfortable using such shocks in SSIV, providedthere is sufficient variation in exposure shares to yield a strong SSIV first stage. We next apply our theoretical framework to the setting of Autor et al. (2013; hereafter abbreviatedas ADH). We first describe this setting and then use it to illustrate the tools and lessons that emergefrom viewing it as leveraging a quasi-experimental shift-share research design.
ADH use a shift-share IV to estimate the causal effect of rising import penetration from China onU.S. local labor markets. They do so with a repeated cross section of 722 commuting zones (cid:96) and397 four-digit SIC manufacturing industries n over two periods t , 1990-2000 and 2000-2007. In theseyears U.S. commuting zones were exposed to a dramatic rise in import penetration from China, ahistoric change in trade patterns commonly referred to as the “China shock.” Variation in exposure tothis change across commuting zones results from the fact that different areas were initially specializedin different industries which saw different changes in the aggregate U.S. growth of Chinese imports. Naturally, these simulation results may be specific to the data-generating process we consider here, modeled afterthe “China shock” setting of Autor et al. (2013). In practice, we recommend that researchers perform similar simulationsbased on their data if they are concerned with the quality of asymptotic approximation—a suggestion that of courseapplies to conventional shock-level IV analyses as well. t in loca-tion (cid:96) , which we write as y (cid:96)t . The treatment variable x (cid:96)t measures local exposure to the growth ofimports from China in $1,000 per worker. The vector of controls w (cid:96)t , which comes from the preferredspecification of ADH (Column 6 of their Table 3), contains start-of-period measures of labor forcedemographics, period fixed effects, Census region fixed effects, and the start-of-period total manufac-turing share to which we return below. The shift-share instrument is z (cid:96)t = (cid:80) n =1 s (cid:96)nt g nt , where s (cid:96)nt is the share of manufacturing industry n in total employment in location (cid:96) , measured a decade beforeeach period t begins, and g nt is industry n ’s growth of imports from China in the eight comparableeconomies over period t (also expressed in $1,000 per U.S. worker). Importantly, the sum of laggedmanufacturing shares across industries ( S (cid:96)t = (cid:80) n =1 s (cid:96)nt ) is not constant across locations and periods,placing the ADH instrument in the “incomplete shares” class discussed in Section 3.4. All regressionsare weighted by e (cid:96)t , which measures the start-of-period population of the commuting zone, and allvariables are measured in ten-year equivalents.To see how the ADH instrument can be viewed as leveraging quasi-experimental shocks, it isuseful to imagine an idealized experiment generating random variation in the growth of importsfrom China across industries. One could imagine, for example, random variation in industry-specificproductivities in China affecting import growth in both the U.S. and in comparable economies. If weobserved and used these productivity changes directly as g nt , the resulting SSIV exclusion restrictionmay be satisfied by our Assumptions 1 and 2. This would require idiosyncratic productivity shocksacross many industries, with small average exposure to each shock across commuting zones. Weakerversions of this experimental ideal, in which productivity shocks can be partly predicted by industryobservables and are only weakly dependent across industries, are accommodated by the extensionsdiscussed in Section 3.In practice, industry-specific productivity shocks in China are not directly observed, and theobserved changes in trade patterns between the U.S. and China also depend on changes in U.S.supply and demand conditions across industries. This raises potential concerns over omitted variablesbias, since U.S. supply and demand shocks may have direct effects on employment dynamics across To be precise, local exposure to the growth of imports from China is constructed for period t as x (cid:96)t = (cid:80) n =1 s current (cid:96)nt g US nt . Here g US nt = ∆ M US nt E current nt is the growth of U.S. imports from China in thousands of dollars ( ∆ M US nt )divided by the industry employment in the U.S. at the beginning of the current period ( E current nt ) and s current (cid:96)nt arelocal employment shares, also measured at the beginning of the period. The instrument, in contrast, is constructedas z (cid:96)t = (cid:80) Nn =1 s (cid:96)nt g nt with g nt = ∆ M nt E nt , where ∆ M nt measures the growth of imports from China ineight comparable economies (in thousands of U.S. dollars) and both local employment shares s (cid:96)nt and U.S. employment E nt are lagged by 10 years. The eight countries are Australia, Denmark, Finland, Germany, Japan, New Zealand,Spain, and Switzerland. Note that Autor et al. (2013) express the same instrument differently, based on employmentshares relative to the industry total, rather than the regional total. Our way of writing z (cid:96)t aims to clearly separate theexposure shares from the industry shocks, highlighting the shift-share structure of the instrument. g nt as the observed changes in trade patternsbetween China and a group of developed countries excluding the United States. Such variation reflectsChinese productivity shocks as well as supply and demand shocks in these other countries; in thisway, the ADH strategy eliminates bias from shocks that are specific to the United States. Our quasi-experimental view of the ADH research design places particular emphasis on the variationin Chinese import growth rates g nt and their average exposure s nt across U.S. commuting zones. Withfew or insufficiently-variable shocks, or highly concentrated shocks exposure, the large- N asymptoticapproximation developed in Section 3.1 is unlikely to be a useful tool for characterizing the finite-sample behavior of the SSIV estimator. Moreover if the mean of shocks clearly varies by observables,such as time periods or industry groups, controlling for these variables may be useful to avoid omittedvariables bias. We thus first summarize the distribution of g nt , as well as the industry-level weightsfrom our equivalence result, s nt ∝ (cid:80) (cid:96) =1 e (cid:96)t s (cid:96)nt (normalized to add up to one in the entire sample),to gauge the plausibility of this framework. We additionally summarize the shift-share instrument z (cid:96)t itself, to show how variation at the industry level translates to variation in predicted Chinese importgrowth across commuting zones, a necessary requirement of SSIV relevance.In summarizing industry-level variation it is instructive to recall that the ADH instrument isconstructed with “incomplete” manufacturing shares. Per the discussion in Section 3.4, this meansthat absent any regression controls the SSIV estimator uses variation not only in manufacturingindustry shocks but also implicitly the variation in the 10-year lagged total manufacturing share S (cid:96)t across commuting zones and periods. In practice, ADH control for the start-of-period manufacturingshare, which is highly – though not perfectly – correlated with S (cid:96)t . We thus summarize the ADHshocks both with and without the “missing industry” shock g t = 0 , which here represents the lack ofa “China shock” in service (i.e. non-manufacturing) industries.Table 1 reports summary statistics for the ADH shocks g nt computed with importance weights s nt and characterizes these weights. Column 1 includes the “missing” service industry shock of zero ineach period. It is evident that with this shock the distribution of g nt is unconventional: for example,its interquartile range is zero. This is because the service industry accounts for a large fraction oftotal employment ( s t is 71.9% of the period total in the 1990s and 79.5% in the 2000s). As a resultwe see a high concentration of industry exposure, as measured by the inverse of its Herfindahl index(HHI), / (cid:80) n,t s nt , which intuitively corresponds to the “effective sample size” and plays a key rolein our Assumption 2. With the “missing” service industry included, the effective sample size is only3.5. For an HHI computed at the level of three-digit industry codes (cid:80) c s c , where s c aggregates Note that s nt would be proportional to lagged industry employment if the ADH regression weights e (cid:96)t were laggedregional employment. ADH however use a slightly different e (cid:96)t : the start-of-period commuting zone population. c , it is even lower, at1.7. This suggests even less industry-level variation is available when shocks are allowed to be seriallycorrelated or clustered by groups. Furthermore, the mean of manufacturing shocks is significantlydifferent from the zero shock of the missing service industry. Together, these analyses suggest thatthe service industry should be excluded from the identifying variation, because it is likely to violateboth Assumption 1 ( E [ g nt | ¯ ε nt , s nt ] (cid:54) = g t = 0 ) and Assumption 2 ( (cid:80) n,t s nt is not close to zero).Column 2 of Table 1 therefore summarizes the sample with the service industry excluded. Withinmanufacturing, the average shock is 7.37, with a standard deviation of 20.93 and an interquartile rangeof 6.61. The inverse HHI of exposure shares is now relatively high, 191.6 across industry-by-periodcells and 58.4 when exposure is aggregated by SIC3 group. The largest shock weights in this columnare only 3.4% across industry-by-periods and 6.5% across SIC3 groups. This suggests a sizable degreeof variation at the industry level, consistent with Assumption 2.With Chinese import growth shocks measured in two sequential periods, differences in expectedshocks across periods present an obvious threat to the simplest assumption of quasi-experimentalshock assignment (Assumption 1) that would typically be addressed by including period fixed effectsin the industry-level analysis. Column 3 of Table 1 confirms that even within periods there is sizableresidual shock variation to implement the SSIV version of this strategy. The standard deviation andinterquartile range of shock residuals (obtained from regressing shocks on period fixed effects with s nt weights) are only somewhat smaller than in Column 2, despite the higher mean shock in the laterperiod, at 12.6 versus 3.6. Motivated by the controls used in an industry-level analysis of similarshocks in Acemoglu et al. (2016), in Column 4 we further explore residual variation from regressingshocks on 10 broad industrial sector fixed effects interacted with period fixed effects. These morestringent controls again reduce the variation in shocks only slightly, indicating that there is substantialresidual shock variation within periods and industrial sectors.To see how this variation translates to the commuting zone level, Table 2 summarizes the shift-share instrument z (cid:96)t , weighting by e (cid:96)t . Column 1 presents the raw variation in the data, with astandard deviation of 1.55 and an interquartile range of 1.74. Columns 2–4 add controls that mirrorthe specifications from Table 1. Specifically, Column 2 residualizes the instrument on the laggedmanufacturing share, effectively excluding the service industry. Column 3 similarly controls for thelagged manufacturing share interacted with period dummies, while Column 4 controls for laggedshares of each of the 10 manufacturing sectors, again interacted with periods indicators. These isolatevariation in z (cid:96)t that is due to variation in shocks within manufacturing industries within periods(Column 3) and within sector (Column 4), per the discussion in Section 3.4. The residual variationin the instrument falls with richer controls but remains substantial, with a standard deviation of The weighted mean of manufacturing shocks is 7.4, with a standard error clustered at the 3-digit SIC level (as inour analysis below) of 1.3. See Autor et al. (2014), Figure II, for the definitions of these sectors, which aggregate two-digit SIC codes. These ICCs come from a random effectsmodel, providing a hierarchical decomposition of residual within-period shock variation: g nt = µ t + a ten ( n ) ,t + b sic2 ( n ) ,t + c sic3 ( n ) ,t + d n + e nt , (13)where µ t are period fixed effects; a ten ( n ) ,t , b sic2 ( n ) ,t , and c sic3 ( n ) ,t denote time-varying (and possiblyauto-correlated) random effects generated by the ten industry groups in Acemoglu et al. (2016), 20groups identified by SIC2 codes, and 136 groups corresponding to SIC3 codes, respectively; and d n is atime-invariant industry random effect (across our 397 four-digit SIC industries). Following convention,we estimate equation (13) as a hierarchical linear model by maximum likelihood, assuming Gaussianresidual components. Table 3 reports estimated ICCs from equation (13), summarizing the share of the overall shockresidual variance due to each random effect. These reveal moderate clustering of shock residuals atthe industry and SIC3 level (with ICCs of 0.169 and 0.073, respectively). At the same time, thereis less evidence for clustering of shocks at a higher SIC2 level and particularly by ten cluster groups(ICCs of 0.047 and 0.016, respectively, with standard errors of comparable magnitude). This supportsthe assumption that shocks are mean-independent across SIC3 clusters, so it will be sufficient tocluster standard errors at the level of SIC3 groups, as Acemoglu et al. (2016) do in their conventionalindustry-level IV regressions. The inverse HHI estimates in Table 1 indicate that at this level of shockclustering there is still an adequate effective sample size. Note that similar ICC calculations could be implemented in a setting that directly regresses industry outcomes onindustry shocks, such as Acemoglu et al. (2016). Mutual correlation in the instrument is a generic concern that is notspecific to shift-share designs, although one that is rarely tested for. Getting the correlation structure in shocks right isespecially important for inference in our framework, since the outcome and treatment in the industry-level regression( ¯ y ⊥ nt and ¯ x ⊥ nt ) are by construction correlated across industries. In particular we estimate an unweighted mixed-effects regression using Stata’s mixed command, imposing an ex-changeable variance matrix for ( a ten ( n ) , , a ten ( n ) , ) , ( b sic2 ( n ) , , b sic2 ( n ) , ) , and ( c sic3 ( n ) , , c sic3 ( n ) , ) . .3 Estimates from Shock-Level Regressions Table 4 reports SSIV coefficients from regressing regional manufacturing employment growth in theU.S. on the growth of import competition from China, instrumented by predicted Chinese importgrowth. Per the results in Section 4.1, we estimate these coefficients with equivalent industry-level regressions in order to obtain valid exposure-robust standard errors. Consistent with the aboveanalysis of shock ICCs, we cluster standard errors at the SIC3 level when estimating these regressions.We also report first-stage F -statistics with corresponding exposure-robust inference. As discussed inSection 4.2, these come from industry-level IV regressions of the aggregated treatment and instrument(i.e. ¯ x ⊥ nt on ¯ z ⊥ nt ), instrumented with shocks and weighting by s nt . The F -statistics are well above theconventional threshold of ten in all columns of the table.Columns 1 through 4 of Table 4 document the sensitivity of SSIV estimates to the inclusion ofvarious controls, designed to isolate the conditional random assignment of shocks under alternativeversions of Assumption 3. Column 1 first replicates column 6 of Table 3 in Autor et al. (2013) byincluding in w (cid:96)t period fixed effects, Census division fixed effects, start-of-period conditions (% collegeeducated, % foreign-born, % employment among women, % employment in routine occupations, andthe average offshorability index), and the start-of-period manufacturing share. The point estimate is-0.596, with a corrected standard error of 0.114. As noted, the ADH specification in column 1 does not include the lagged manufacturing sharecontrol S (cid:96)t , which is necessary to solve the “incomplete shares” problem in Section 3.4, though it doesinclude a highly correlated control (start-of-period manufacturing share). In column 2 of Table 4we isolate within-manufacturing variation in shocks by replacing the latter sum-of-share control withthe former. The SSIV point estimate remains almost unchanged, at -0.489 (with a standard errorof 0.100). Here exposure-robust standard errors are obtained from an industry-level regression thatdrops the implicit service sector shock of g t = 0 .Isolating the within-period variation in manufacturing shocks requires further controls in the in-complete shares case, as discussed in Section 3.4. Specifically, Column 3 controls for lagged manufac-turing shares interacted with period indicators, which are the share-weighted sums of period effectsin q nt . With these controls the SSIV point estimate is to -0.267 with an exposure-robust standarderror of 0.099. While the coefficient remains statistically and economically significant, it is smaller Appendix Table C1 reports estimates for other outcomes in ADH: growth rates of unemployment, labor forcenon-participation, and average wages, corresponding to columns 3 and 4 of Table 5 and column 1 of Table 6 in ADH. Appendix Table C2 implements three alternative methods for conducting inference in Table 4, reporting conventionalstate-clustered standard errors as in ADH (which are not exposure-robust), the Adão et al. (2019) standard errors (whichare asymptotically equivalent to ours but differ in finite samples), and null-imposed confidence intervals obtained fromshock-level Lagrange multiplier tests (which may have better finite-sample properties). Consistent with the theoreticaldiscussion in Appendix B.3, the conventional standard errors are generally too low, while the Adão et al. (2019) standarderrors are slightly larger than those from Table 4, by 10–15% for the main specifications. Imposing the null widens theconfidence interval more substantially, by 30–50%, although more so on the left end, suggesting that much larger effectsare not rejected by the data. This last finding is consistent with Adão et al. (2019), except that we use the equivalentindustry-level regression to compute the null-imposed confidence interval. Appendix Figure C1 reports binned scatter plots that illustrate the first-stage and reduced-form relationships
27n magnitude than the estimates in Columns 1 and 2. The difference stems from the fact that 2000–07saw both a faster growth in imports from China (e.g., due to its entry to the WTO) and a faster declinein U.S. manufacturing. The earlier columns attribute the faster manufacturing decline to increasedtrade with China, while the specification in Column 3 controls for any unobserved shocks specificto the manufacturing sector overall in the 2000s (e.g., faster automation), as would a conventionalindustry-level IV regression with period fixed effects.Column 4 illustrates how our framework makes it straightforward to introduce more detailedindustry-level controls in SSIV, which are commonly used in industry-level studies of the China shock.In particular, Acemoglu et al. (2016) examine the impact of trade with China on U.S. employmentacross industries and control for fixed effects of 10 broad industry groups. By Proposition 4, we canalso exploit shock variation within these industry groups (and periods) in the SSIV design, weakeningthe assumption of quasi-random shock assignment. This entails controlling for the shares of exposureto the 10 industry groups defined by Acemoglu et al. (2016), interacted with period dummies. Theresulting point estimate in column 4 of Table 4 remains very similar to that of column 3, at -0.252with a standard error of 0.136. Here exposure-robust standard errors are obtained by including the10-groups-by-period fixed effects in the equivalent industry-level regression, per Section 4.1.Finally, columns 5 and 6 of Table 4 report falsification tests of the SSIV specifications in columns 3and 4 by lagging the manufacturing employment growth outcome by two decades. These specificationsthus provide “pre-trend” tests which are, as usual, informative about the plausibility of Assumption1 if employment pre-trends correlate with the contemporaneous trend residual ε (cid:96)t . Column 5 showsno significant relationship with lagged manufacturing employment when within-period manufacturingshocks are leveraged. The point estimate of -0.028 is a full order of magnitude smaller than thecomparable effect estimate in Column 3, with an exposure-robust standard error of 0.099. Whenadding the more stringent set of 10-sectoral controls in column 6, the coefficient flips signs whileremaining statistically insignificant at conventional levels. These tests lend support to the plausibilityof the identification assumption. Further support is found in Appendix Table C4, which shows weobtain similar estimates from different overidentified SSIV procedures (leveraging variation in country-specific Chinese import growth, instead of the ADH total), and a p -value for the the shock-leveloveridentification test derived in Appendix A.9 of 0.142.Taken together, these sensitivity, falsification, and overidentification exercises suggest that theADH approach can be reasonably viewed as leveraging exogenous shock variation via our framework.This is notably in contrast to the analysis of Goldsmith-Pinkham et al. (2019), who find the ADHexposure shares to be implausible instruments via different balance and overidentification tests. corresponding to the Column 3 IV specification. Also note that the column 3 estimate can be interpreted as a weightedaverage of two period-specific shift-share IV coefficients. Column 1 of Appendix Table C3 shows the underlying estimates,from a just-identified IV regression where both treatment and the instrument are interacted with period indicators (aswell as the manufacturing share control, as in column 3), with exposure-robust standard errors obtained by the equivalentindustry-level regression discussed in Section 4. The estimated effect of increased Chinese import competition is negativein both periods (–0.491 and –0.225). Other columns repeat the analysis for other outcomes. Conclusion
Shift-share instruments combine a common set of observed shocks with variation in shock exposure.In this paper, we provide a quasi-experimental framework for the validity of such instruments basedon identifying variation in the shocks, allowing the exposure shares to be endogenous. Our frameworkrevolves around a novel equivalence result: shift-share IV estimates can be reframed as coefficients fromweighted shock-level IV regressions, in which the shocks instrument directly for an exposure-weightedaverage of the original endogenous variable. Shift-share instruments are therefore valid when shocksare idiosyncratic with respect to an exposure-weighted average of the unobserved factors determiningthe outcome variable, and yield consistent IV estimates when the number of shocks is large and theyare sufficiently dispersed in terms of their average exposure.Through various extensions and illustrations, we show how our quasi-experimental SSIV frameworkcan guide empirical work in practice. By controlling for exposure-weighted averages of shock-levelconfounders, researchers can isolate more plausibly exogenous variation in shocks, such as over timeor within narrow industry groups. By estimating SSIV coefficients, placebo regressions, and firststage F -statistics at the level of shocks, researchers can easily perform exposure-robust inference thataccounts for the inherent non-standard clustering of observations with common shock exposure. Ourshock-level analysis also raises new concerns: SSIV designs with few or insufficiently dispersed shocksmay have effectively small samples, despite there being many underlying observations, and instrumentsconstructed from exposure shares that do not add up to a constant require appropriate controls inorder to isolate quasi-random shock variation.Ultimately, the plausibility of our exogenous shocks framework, as with the alternative frameworkof Goldsmith-Pinkham et al. (2019) based on exogenous shares, depends on the SSIV application. Weencourage practitioners to use shift-share instruments based on an a priori argument supporting theplausibility of either one of these approaches; various diagnostics and tests of the framework that ismost suitable for the setting may then be applied. While our paper develops such procedures forthe “shocks” view, Goldsmith-Pinkham et al. (2019) provide different tools for the “shares” view.Examples from the vast and expanding set of SSIV applications help to illustrate a priori argu-ments for or against the two SSIV frameworks. In some settings, the exposure shares underlying theinstrument are tailored to a specific economic question, and to the particular endogenous variableincluded in the model. In this case, the scenario considered in Section 2.3 – that there are unobservedshocks ν n which enter ε (cid:96) through the shares – may be less of a concern, making shares more plausibleinstruments. Mohnen (2019), for example, uses the age profile of older workers in local labor marketsas the exposure shares in a shift-share instrument for the change in the local elderly employment ratethe following decade. He argues, based on economic intuition, that these tailored shares are uncorre- In principle, shares and shocks may simultaneously provide valid identifying variation, but in practice it wouldseem unlikely for both sources of variation to be a priori plausible in the same setting, because these approaches toidentification are substantively different. g n , which only affect power of the instrument; in fact, the shocks aredispensed with altogether in robustness checks that directly instrument with the shares and report2SLS and LIML estimates. Similarly, Algan et al. (2017) use the lagged share of the construction sec-tor in the regional economy as an instrument for unemployment growth during the Great Recession,arguing that it does not predict changes in voting outcomes in other ways. With a single industryconsidered, the identification assumption reduces to that of conventional difference-in-differences withcontinuous treatment intensity, and our approach cannot be applied.In contrast, our framework is more appropriate in settings where shocks are tailored to a specificquestion while the shares are “generic,” in that they could conceivably measure an observation’s expo-sure to multiple shocks (both observed and unobserved). Both Autor et al. (2013) and Acemoglu andRestrepo (Forthcoming), for example, build shift-share instruments with similar lagged employmentshares but different shocks – rising trade with China and the adoption of industrial robots, respectively.According to the shares view, these papers use essentially the same instruments (lagged employmentshares) for different endogenous variables (growth of import competition and growth of robot usage),and are therefore mutually inconsistent. Our framework helps reconcile these identification strategies,provided the variation in each set of shocks can be described as arising from a natural experiment.In sum, our analysis formalizes the claim that SSIV identification may “come from” the exogeneityof shocks, while providing new guidance for SSIV estimation and inference that may be appliedacross a number of economic fields, including international trade, labor economics, urban economics,macroeconomics, and public finance. Our shock-level assumptions connect SSIV in these settings toconventional shock-level IV estimation, bringing shift-share instruments to more familiar econometricterritory and facilitating the assessment of SSIV credibility in practice.30able 1: Shock Summary Statistics in the Autor et al. (2013) Setting (1) (2) (3) (4)Mean 1.79 7.37 0 0Standard deviation 10.79 20.92 20.44 19.39Interquartile range 0 6.61 6.11 5.50SpecificationExcluding service industries (cid:8) (cid:8) (cid:8) Residualized on manufacturing-by-period FE (cid:8) (cid:8)
Residualized on 10-sectors-by-period FE (cid:8)
Effective sample size ( /HHI of s nt weights)Across industries and periods 3.5 191.6 191.6 191.6Across SIC3 groups 1.7 58.4 58.4 58.4Largest s nt weightAcross industries and periods 0.398 0.035 0.035 0.035Across SIC3 groups 0.757 0.066 0.066 0.066Observation counts Notes: This table summarizes the distribution of China import shocks g nt across industries n and periods t inthe Autor et al. (2013) replication. Shocks are measured using flows of imports from China in eight developedeconomics outside of the United States. All statistics are weighted by the average industry exposureshares s nt , computed as in Proposition 1; shares are measured from lagged manufacturing employment, asdescribed in the text. Column 1 includes the non-manufacturing industry aggregate in each period with ashock of 0, while columns 2-4 restrict to manufacturing industries. The following columns residualize theshocks on period indicators (column 3) or the indicators for each of the 10 sectors defined in Acemogluet al. (2016) interacted with period indicators (column 4). We report the effective sample size (the inverserenormalized Herfindahl index of the share weights, as described in the text) with and without the non-manufacturing industry, at both the industry-by-period level and at the level of aggregate SIC3 groups(across periods), along with the largest share weights. (1) (2) (3) (4)Mean 1.79 0 0 0Standard deviation 1.55 1.51 1.06 0.87Interquartile range 1.74 1.75 0.69 0.51ControlsLagged mfg. share (cid:8) (cid:8) (cid:8) Period-specific lagged mfg. share (cid:8) (cid:8)
Period-specific lagged 10-sector shares (cid:8)
Observation counts
Notes: This table summarizes the distribution of the shift-share instrument z (cid:96)t across commuting zones (cid:96) andperiods t in the Autor et al. (2013) replication. The shocks used in the instrument are measured using flowsof imports from China in eight developed economies outside of the United States and the exposure shares aremeasured from lagged manufacturing employment, as described in the text. All statistics are weighted by thestart-of-period commuting zone population, as in Autor et al. (2013). Columns 2–4 residualize the instrumenton the lagged commuting zone manufacturing share, the lagged manufacturing share interacted with periodindicators, and the lagged manufacturing share for each of the 10 sectors defined in Acemoglu et al. (2016),interacted with period indicators. These correspond to the share-weighted averages of industry-level controlsincluded in each column of Table 1. Estimate SE(1) (2)Shock ICCs10 sectors 0.016 (0.022)SIC2 0.047 (0.052)SIC3 0.073 (0.057)Industry 0.169 (0.047)Period means1990s 4.65 (1.38)2000s 16.87 (3.34)
Notes: This table reports shock intra-class correlation coefficients in the Autor et al. (2013) replication,estimated from the hierarchical model described in the text. Estimates come from a maximum likelihoodprocedure with an exchangeable covariance structure for each industry random effect and with periodfixed effects. Robust standard errors are reported in parentheses.
Effects Pre-trends(1) (2) (3) (4) (5) (6)Coefficient -0.596 -0.489 -0.267 -0.252 -0.028 0.142(0.114) (0.100) (0.099) (0.136) (0.092) (0.090)CZ-level controls ( w (cid:96)t )Autor et al. (2013) baseline (cid:8) (cid:8) (cid:8) (cid:8) (cid:8) (cid:8) Start-of-period mfg. share (cid:8)
Lagged mfg. share (cid:8) (cid:8) (cid:8) (cid:8) (cid:8)
Period-specific lagged mfg. share (cid:8) (cid:8) (cid:8) (cid:8)
Period-specific lagged 10-sector shares (cid:8) (cid:8)
Industry-level controls ( q nt )Period indicators (cid:8) (cid:8) (cid:8) (cid:8) Sector-by-period indicators (cid:8) (cid:8)
SSIV first stage F -stat. 185.59 166.73 123.64 46.50 123.64 46.50 Notes: This table reports shift-share IV coefficients from regressions of regional manufacturing employment growthin the U.S. on the growth of import competition from China, instrumented with predicted China import growth asdescribed in the text. Column 1 replicates column 6 of Table 3 in Autor et al. (2013) by controlling for period fixedeffects, Census division fixed effects, beginning-of-period conditions (% college educated, % foreign-born, % employmentamong women, % employment in routine occupations, and the average offshorability index), and the start-of-periodmanufacturing share. Column 2 replaces the start-of-period manufacturing shares control with the lagged manufacturingshares underlying the instrument, while column 3 interacts this control with period indicators. Column 4 adds laggedshares of the 10 industry sectors defined in Acemoglu et al. (2016), again interacted with period indicators. Columns5 and 6 report falsification tests using manufacturing employment growth lagged by two decades as an outcome, inspecifications that parallel those in columns 3 and 4. Exposure-robust standard errors and first-stage F -statisticsare obtained from equivalent industry-level IV regressions, as described in the text, with the indicated industry-levelcontrols and allowing for clustering of shocks at the level of three-digit SIC codes. The sample in columns 2–6 includes722 locations (commuting zones) and 397 industries, each observed in two periods; the estimate in column 1 implicitlyincludes an additional two periods of the non-manufacturing industry with a shock of zero. eferences Abadie, Alberto, Susan Athey, Guido W. Imbens, and Jeffrey M. Wooldridge.
Working Paper.
Acemoglu, Daron, David H. Autor, David Dorn, Gordon H. Hanson, and Brendard Price.
Journal of LaborEconomics
Acemoglu, Daron, and Pascual Restrepo.
Forthcoming. “Robots and Jobs: Evidence from USLabor Markets.”
Journal of Political Economy.
Adão, Rodrigo, Michal Kolesár, and Eduardo Morales.
The Quarterly Journal of Economics
Algan, Yann, Sergei Guriev, Elias Papaioannou, and Evgenia Passari.
Brookings Papers on Economic Activity
Angrist, Joshua D., Kate Graddy, and Guido W. Imbens.
The Review of Economic Studies
Angrist, Joshua D., Guido W. Imbens, and Alan B. Krueger.
Journal of Applied Econometrics
Angrist, Joshua D., and JS Pischke.
Mostly Harmless Econometrics: An Empiricist’s Com-panion.
Princeton University Press.
Autor, David H., David Dorn, and Gordon H. Hanson.
American Economic Review
American Economic Review: Insights
Autor, David H., David Dorn, Gordon H. Hanson, and Jae Song.
The Quarterly Journal of Economics
Autor, David H., and Mark G. Duggan.
The Quarterly Journal of Economics
Bartik, Timothy J.
Who Benefits from State and Local Economic Development Policies?
Kalamazoo, MI: W. E. Upjohn Institute for Employment Research.
Bekker, Paul A.
Econometrica
Berman, Nicolas, Mathieu Couttenier, Dominic Rohner, and Mathias Thoenig.
American Economic Review
Blanchard, Olivier Jean, Lawrence F. Katz, Robert E. Hall, and Barry Eichengreen.
Brookings Papers on Economic Activity ombardini, Matilde, and Bingjing Li.
WorkingPaper.
Borusyak, Kirill, and Xavier Jaravel.
Bound, John, David A. Jaeger, and Regina M. Baker.
Journal of the American Statistical Association
Boustan, Leah, Fernando Ferreira, Hernan Winkler, and Eric M. Zolt.
The Review of Economics and Statistics
Broxterman, Daniel A, and William D Larson.
Cameron, A. Colin, and Douglas L. Miller.
The Journal of Human Resources
Card, David.
Journal of Labor Economics
Chaisemartin, Clément de, and Xavier D’Haultfoeuille.
Diamond, Rebecca.
American Economic Review
Goldsmith-Pinkham, Paul, Isaac Sorkin, and Henry Swift.
Goodman-Bacon, Andrew.
Greenstone, Michael, Alexandre Mas, and Hoai-Luu Nguyen.
Hansen, Lars Peter.
Econometrica
Hornbeck, Richard, and Enrico Moretti.
Hudson, Sally, Peter Hull, and Jack Liebersohn.
Hull, Peter.
Hummels, David, Rasmus Jorgensen, Jakob Munch, and Chong Xiang. ummels, David, Rasmus Jorgensen, Jakob Munch, and Chong Xiang.
American EconomicReview
Imbert, Clement, Marlon Seror, Yifan Zhang, and Yanos Zylberberg.
Jaeger, David A, Joakim Ruist, and Jan Stuhler.
Jaravel, Xavier.
The Quarterly Journal of Economics
Kolesar, Michal, Raj Chetty, John Friedman, Edward Glaeser, and Guido W. Imbens.
Journal of Business andEconomic Statistics
Kovak, Brian K.
American Economic Review
Luttmer, Erzo F.P.
The Quar-terly Journal of Economics
Mohnen, Paul.
Working Paper.
Montiel Olea, Jose Luis, and Carolin Pflueger.
Journal of Business and Economic Statistics
Moulton, Brent R.
Journalof Econometrics
Nakamura, Emi, and Jón Steinsson.
American Economic Review
Nunn, Nathan, and Nancy Qian.
American EconomicReview
Oberfield, Ezra, and Devesh Raval.
Saiz, Albert.
The Quarterly Journal ofEconomics
Stuen, Eric T., Ahmed Mushfiq Mobarak, and Keith E. Maskus.
TheEconomic Journal
Suárez, Juan Carlos, and Owen Zidar.
American Economic Review
Xu, Chenzi. ppendix to “Quasi-Experimental Shift-Share ResearchDesigns”
Kirill Borusyak, UCLPeter Hull, University of ChicagoXavier Jaravel, London School of Economics
December 2019
Contents
A Appendix Results 39
A.1 Heterogeneous Treatment Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39A.2 Comparing SSIV and Native Shock-Level Regression Estimands . . . . . . . . . . . . . 40A.3 Unobserved n -level Shocks Violate Share Exogeneity . . . . . . . . . . . . . . . . . . . 42A.4 Connection to Rotemberg Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43A.5 SSIV Consistency in Short Panels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44A.6 SSIV Relevance with Panel Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45A.7 Estimated Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46A.8 Equilibrium Industry Growth in a Model of Local Labor Markets . . . . . . . . . . . . 50A.9 SSIV with Multiple Endogenous Variables or Instruments . . . . . . . . . . . . . . . . 51A.10 Finite-Sample Performance of SSIV: Monte-Carlo Evidence . . . . . . . . . . . . . . . 55 B Appendix Proofs 61
B.1 Proposition 3 and Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61B.2 Proposition 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63B.3 Proposition 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64B.4 Proposition A1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68B.5 Proposition A2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70B.6 Proposition A3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71B.7 Proposition A4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72B.8 Proposition A5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72B.9 Proposition A6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
C Appendix Figures and Tables 75 Appendix Results
A.1 Heterogeneous Treatment Effects
In this appendix we consider what a linear SSIV identifies when the structural relationship between y (cid:96) and x (cid:96) is nonlinear. We show that under a first-stage monotonicity condition the large-sample SSIVcoefficient estimates a convexly weighted average of heterogeneous treatment effects. This holds evenwhen the instrument has different effects on the outcome depending on the underlying realizationof shocks, for example when y (cid:96) = (cid:80) n s (cid:96)n ˜ β (cid:96)n x (cid:96)n + ε (cid:96) with ˜ β (cid:96)n capturing the effects of (possiblyunobserved) observation- and shock-specific treatments x (cid:96)n making up the observed x (cid:96) = (cid:80) n s (cid:96)n x (cid:96)n .Consider a general structural outcome model of y (cid:96) = y ( x (cid:96) , . . . , x (cid:96)R , ε (cid:96) ) , (14)where the R treatments are similarly defined as x (cid:96)r = x r ( g, η (cid:96)r ) with g collecting the vector of shocks g n and η (cid:96) = ( η (cid:96) , . . . , η (cid:96)R ) capturing first-stage heterogeneity. We consider an IV regression of y (cid:96) onsome aggregated treatment x (cid:96) = (cid:80) r α (cid:96)r x (cid:96)r with α (cid:96)r ≥ . Note that this nests the case of a singleaggregate treatment ( R = 1 and α (cid:96) = 1 ) with arbitrary effect heterogeneity, as well as the special caseabove ( R = N and α (cid:96)r = s (cid:96)n ). We abstract away from controls w (cid:96) and assume each shock is as-good-as-randomly assigned (mean-zero and mutually independent) conditional on the vector of second-stageunobservables ε and the matrices of first-stage unobservables η , exposure shares s , importance weights e , and aggregation weights α , collected in I = ( ε, η, s, e, α ) . This assumption is stronger than A1but generally necessary in a non-linear setting while still allowing for the endogeneity of exposureshares. For further notational simplicity we assume that y ( · , ε (cid:96) ) and each x r ( · , η (cid:96)r ) are almost surelycontinuously differentiable, such that β (cid:96)r ( · ) = ∂∂x r y ( · , ε (cid:96) ) captures the effect, for observation (cid:96) , ofmarginally increasing treatment r on the outcome and π (cid:96)nr ( · ) = ∂∂g n x r ( · , η (cid:96)r ) captures the effect ofmarginally increasing the n th shock on the r th treatment at (cid:96) .Under an appropriate law of large numbers, the shift-share IV estimator approximates a ratio ofsums of “reduced-form” and “first-stage” expectations: ˆ β = E [ (cid:80) (cid:96) e (cid:96) z (cid:96) y (cid:96) ] E [ (cid:80) (cid:96) e (cid:96) z (cid:96) x (cid:96) ] + o p (1) = (cid:80) (cid:96) (cid:80) n E [ s (cid:96)n e (cid:96) g n y (cid:96) ] (cid:80) (cid:96) (cid:80) n (cid:80) r E [ s (cid:96)n e (cid:96) g n α (cid:96)r x (cid:96)r ] + o p (1) . (15)Given this, we then have the following result: Prop. A1
When π (cid:96)nr ([ γ ; g − n ]) ≥ almost surely, equation (15) can be written ˆ β = (cid:80) (cid:96) (cid:80) n (cid:80) r E (cid:104)(cid:82) ∞−∞ ˜ β (cid:96)nr ( γ ) ω (cid:96)nr ( γ ) (cid:105) dγ (cid:80) (cid:96) (cid:80) n (cid:80) r E (cid:104)(cid:82) ∞−∞ ω (cid:96)nr ( γ ) (cid:105) dγ + o p (1) , (16)39here ω (cid:96)nr ( γ ) ≥ almost surely and ˜ β (cid:96)nr ( γ ) = β (cid:96)r ( x ([ γ ; g − n ] , η (cid:96) ) , . . . x R ([ γ ; g − n ] , η (cid:96)R )) α (cid:96)r (17)is a rescaled treatment effect, evaluated at ( x ([ γ ; g − n ] , η (cid:96) ) , . . . x R ([ γ ; g − n ] , η (cid:96)R ) for [ g n ; g − n ] =( g , . . . , g n − , g n , g n +1 , . . . g N ) (cid:48) . Proof : See Appendix B.4.This shows that in large samples ˆ β estimates a convex average of rescaled treatment effects, ˜ β (cid:96)nr ( · ) ,when each x r ( · , η (cid:96)r ) is almost surely monotone in each shock. Appendix B.4 shows that the weights ω (cid:96)nr ( γ ) are proportional to the first-stage effects π (cid:96)nr ([ γ ; g − n ]) , exposure shares s (cid:96)n , regression weights e (cid:96) , treatment aggregation weights α (cid:96)r , and a function of the shock distribution. In the case withoutaggregation, i.e. R = α (cid:96)r = 1 , there is no rescaling in the ˜ β (cid:96)nr ( γ ) . Equation (16) then can beseen as generalizing the result of Angrist et al. (2000), on the identification of heterogeneous effectsof continuous treatments, to the continuous shift-share instrument case. Intuition for the ω (cid:96)nr ( γ ) weights follows similarly from this connection. With aggregation – that is, when the realization ofshocks may have heterogeneous effects on y (cid:96) holding the aggregated x (cid:96) fixed – equation (16) showsthat SSIV captures a convex average of treatment effects per aggregated unit. Thus in the leadingexample of y (cid:96) = (cid:80) n s (cid:96)n ˜ β (cid:96)n x (cid:96)n + ε (cid:96) and x (cid:96) = (cid:80) n s (cid:96)n x (cid:96)n , this result establishes identification of aconvex average of the ˜ β (cid:96)n . In this way the result generalizes Adão et al. (2019b), who establish theidentification of convex averages of rescaled treatment effects in reduced form shift-share regressions. A.2 Comparing SSIV and Native Shock-Level Regression Estimands
In this appendix we illustrate economic differences between the estimands of two regressions thatresearchers may consider: SSIV using outcome and treatment observations y (cid:96) and x (cid:96) (which we showin Proposition 1 are equivalent to certain shock-level IV regressions), and more conventional shock-levelIV regressions using “native” y n and x n . These outcomes and treatments capture the same economicconcepts as the original y (cid:96) and x (cid:96) , in contrast to the constructed ¯ y n and ¯ x n discussed in Section 2.2.In line with the labor supply and other key SSIV examples, we will for concreteness refer to the (cid:96) and n as indexing regions and industries, respectively. We consider the case where both the outcome andtreatment can be naturally defined at the level of region-by-industry cells (henceforth, cells) – y (cid:96)n and x (cid:96)n , respectively – and thus suitable for aggregation across either dimension with some weights E (cid:96)n (e.g., cell employment growth rates aggregated with lagged cell employment weights): y (cid:96) = (cid:80) n s (cid:96)n y (cid:96)n for s (cid:96)n = E (cid:96)n (cid:80) n (cid:48) E (cid:96)n (cid:48) and y n = (cid:80) (cid:96) ω (cid:96)n y (cid:96)n for ω (cid:96)n = E (cid:96)n (cid:80) (cid:96) (cid:48) E (cid:96) (cid:48) n , with analogous expressions for x (cid:96) and x n .We further define E (cid:96) = (cid:80) n E (cid:96)n and E n = (cid:80) (cid:96) E (cid:96)n for conciseness. This formulation nests reduced-form shift-share regressions when x (cid:96)n = g n for each (cid:96) . The labor supply exampleof Section 2.1 fits only partially in this formal setup because the industry or regional wage growth y n is not equal to a
40e consider the estimands of two regression specifications: β from the regional level model (1),instrumented by z (cid:96) and weighted by e (cid:96) = E (cid:96) /E for E = (cid:80) (cid:96) E (cid:96) , and β ind from a simpler industry-levelIV regression of y n = β ind x n + ε n , (18)instrumented by the industry shock g n and weighted by s n = E n /E . For simplicity we do not includeany controls in either specification and implicitly condition on { E (cid:96)n } (cid:96),n (and some other variables asdescribed below), viewing them as non-stochastic. We show that β and β ind generally differ when there are within-region spillover effects or whentreatment effects are heterogenous. We study these cases in turn, maintaining several assumptions:(i) a first stage relationship analogous to the one considered in Section 3.1: x (cid:96)n = π (cid:96)n g n + η (cid:96)n , (19)for non-stochastic π (cid:96)n ≥ ¯ π > , (ii) a stronger version of our Assumption 1 that imposes E [ g n ] = E [ g n ε (cid:96)n (cid:48) ] = E [ g n η (cid:96)n (cid:48) ] = 0 for all (cid:96) , n , and n (cid:48) , with ε (cid:96)n (cid:48) denoting the structural cell-level residual ofeach model, (iii) the assumption that g n is uncorrelated with g n (cid:48) for all n and n (cid:48) , and (iv) that allappropriate laws of large numbers hold. Within-Region Spillover Effects
Suppose the structural model at the cell level is given by y (cid:96)n = β x (cid:96)n − β (cid:88) n (cid:48) s (cid:96)n (cid:48) x (cid:96)n (cid:48) + ε (cid:96)n . (20)Here β captures the direct effect of the shock on the cell outcome, and β captures a within-regionspillover effect. The local employment effects of industry demand shocks from the model in AppendixA.8 fit in this framework, see equation (47). The following proposition shows that the SSIV estimand β captures the effect of treatment net of spillovers (i.e. β − β ), whereas β ind subtracts the spilloveronly partially; this is intuitive since the spillover effect is fully contained within regions but not withinindustries. Prop. A2
Suppose equation (20) holds and the average local concentration index H L = (cid:80) (cid:96),n e (cid:96) s (cid:96)n is bounded from below by a constant ¯ H L > . Further assume π (cid:96)n = ¯ π and Var [ g n ] = σ g for all (cid:96) and n . Then the SSIV estimator satisfies ˆ β = β − β + o p (1) (21) weighted average of wage growth across cells: reallocation of employment affects the average wage growth even in theabsence of wage changes in any given cell. Note that we thereby condition on the shares s (cid:96)n and importance weights e (cid:96) . Yet we still allow for share endogeneityby not restricting E [ ε (cid:96)n ] to be zero. In the labor supply example from the main text y (cid:96)n is the cell wage, which is equalized within the region, and x (cid:96)n is cell employment. Equation 20 therefore holds for β = 0 and − β being the inverse labor supply elasticity. ˆ β ind = β − β H L + o p (1) , (22)If β (cid:54) = 0 (i.e. in presence of within-region spillovers), ˆ β and ˆ β ind asymptotically coin-cide if and only if H L p → , which corresponds to the case where the average region isasymptotically concentrated in one industry. Proof : See Appendix B.5.
Treatment Effect Heterogeneity
Now consider a different structural model which allows forheterogeneity in treatment effects: y (cid:96)n = β (cid:96)n x (cid:96)n + ε (cid:96)n . (23)We also allow the first-stage coefficients π (cid:96)n and shock variance σ n to vary. The following propositionshows that β and β ind differ in how they average effect β (cid:96)n (here treated as non-stochastic) acrossthe ( (cid:96), n ) cells. The weights corresponding to the SSIV estimand β are relatively higher for cells thatrepresent a larger fraction of the regional economy. This follows because in the regional regression s (cid:96)n determines the cell’s weight in both the outcome and the shift-share instrument, while in the industryregression only the former argument applies. Heterogeneity in the π (cid:96)n and σ n has equivalent effectson the weighting scheme of both estimands. Prop. A3
In the casual model (23), ˆ β = (cid:80) (cid:96),n E (cid:96)n s (cid:96)n π (cid:96)n σ n · β (cid:96)n (cid:80) (cid:96),n E (cid:96)n s (cid:96)n π (cid:96)n σ n + o p (1) (24)and ˆ β ind = (cid:80) (cid:96),n E (cid:96)n π (cid:96)n σ n · β (cid:96)n (cid:80) (cid:96),n E (cid:96)n π (cid:96)n σ n + o p (1) , (25) Proof : See Appendix B.6.
A.3 Unobserved n -level Shocks Violate Share Exogeneity In this appendix, we show that the assumption of SSIV share exogeneity from Goldsmith-Pinkhamet al. (2019) is violated when there are unobserved shocks ν n that affect outcomes via the exposureshares s (cid:96)n , i.e. when the residual has the structure ε (cid:96) = (cid:88) n s (cid:96)n ν n + ˇ ε (cid:96) . (26)42his is clear in the simple case of fixed N : even when shares are randomly assigned across observations(i.e. each s (cid:96)n is independent of each ν n and ˇ ε (cid:96) ), the structure of (26) will ensure E [ s (cid:96)n ε (cid:96) ] (cid:54) = 0 andthus ¯ ε n (cid:54) p −→ for each n .We next show that this result generalizes to the case of increasing N , where the contribution of each ν n to the variation in ε (cid:96) becomes small. The intuition is that the SSIV relevance condition typicallyrequires individual observations to be sufficiently concentrated in a small number of shocks (see Section3.1), and under this condition the share exclusion restriction violations remain asymptotically non-ignorable even as N → ∞ .In this case we define share endogeneity as non-vanishing Var [¯ ε n ] at least for some n . This willtend to violate SSIV exclusion, unless shocks are as-good-as-randomly assigned (Assumption 1), evenif the importance weights of individual shocks, s n , converge to zero (Assumption 2). As in the previoussection, we continue to treat e (cid:96) and s (cid:96)n as non-stochastic to show this result with simple notation. Prop. A4
Suppose condition (26) holds with the ν n mean-zero and uncorrelated with the ˇ ε (cid:96) andwith each other, and with Var [ ν n ] = σ n ≥ σ ν for a fixed σ ν > . Also assume H L = (cid:80) (cid:96) e (cid:96) (cid:80) n s (cid:96)n → ¯ H > such that first-stage relevance can be satisfied. Then there existsa constant δ > such that max n Var [¯ ε n ] > δ for sufficiently large L . Proof : See Appendix B.7.
A.4 Connection to Rotemberg Weights
In this appendix we rewrite the decomposition of the SSIV coefficient ˆ β from Goldsmith-Pinkhamet al. (2019) that gives rise to their “Rotemberg” weight interpretation, and show that these weightsmeasure the leverage of shocks in our equivalent shock-level IV regression. We then show that, inour framework, skewed Rotemberg weights do not measure sensitivity to misspecification (of shareexogeneity) and do not pose a problem for consistency. We finally discuss the implications of high-leverage observations for SSIV inference.Proposition 1 implies the following decomposition: ˆ β = (cid:80) n s n g n ¯ y ⊥ n (cid:80) n s n g n ¯ x ⊥ n = (cid:88) n α n ˆ β n , (27)where ˆ β n = ¯ y ⊥ n ¯ x ⊥ n = (cid:80) (cid:96) e (cid:96) s (cid:96)n y ⊥ (cid:96) (cid:80) (cid:96) e (cid:96) s (cid:96)n x ⊥ (cid:96) (28)and α n = s n g n ¯ x ⊥ n (cid:80) n (cid:48) s n (cid:48) g n (cid:48) ¯ x ⊥ n (cid:48) . (29)This is a shock-level version of the decomposition discussed in Goldsmith-Pinkham et al. (2019): ˆ β n is43he IV estimate of β that uses share s (cid:96)n as the instrument, and α n is the so-called Rotemberg weight.To see the connection with leverage (defined, typically in the context of OLS, as the derivative ofeach observation’s fitted value with respect to its outcome) in our equivalent IV regression, note that ∂ (cid:16) ¯ x ⊥ n ˆ β (cid:17) ∂ ¯ y ⊥ n = ¯ x ⊥ n s n g n (cid:80) n (cid:48) s n (cid:48) g n (cid:48) ¯ x ⊥ n (cid:48) = α n . (30)In this way, α n measures the sensitivity of ˆ β to ˆ β n . For Goldsmith-Pinkham et al. (2019), exposureto each shock should be a valid instrument and thus ˆ β n p → β for each n . However, in our frameworkdeviations of ˆ β n from β reflect nonzero ¯ ε n in large samples, and such share endogeneity is not ruledout; thus α n does not have the same sensitivity-to-misspecification interpretation. Moreover, a highleverage of certain shocks (“skewed Rotemberg weights,” in the language of Goldsmith-Pinkham etal. (2019)) is not a problem for consistency in our framework, provided it results from a heavy-tailedand high-variance distribution of shocks (that still satisfies our regularity conditions, such as finiteshock variance), and each s n is small as required by Assumption 2.Nevertheless, skewed α n may cause issues with SSIV inference, as may high leverage observationsin any regression. In general, the estimated residuals ˆ¯ ε ⊥ n of high-leverage observations will tend to bebiased toward zero, which may lead to underestimation of the residual variance and too small standarderrors (e.g., Cameron and Miller 2015). This issue can be addressed, for instance, by computingconfidence intervals with the null imposed, as Adão et al. (2019b) recommend and as we discuss inSection 4.1. In practice our Monte-Carlo simulations in Appendix A.10 find that the coverage ofconventional exposure-robust confidence intervals to be satisfactory even with Rotemberg weights asskewed as those reported in the applications of Goldsmith-Pinkham et al. (2019) analysis. A.5 SSIV Consistency in Short Panels
This appendix shows how alternative shock exogeneity assumptions imply the consistency of panelSSIV regressions with many inconsistently-estimated fixed effects (FEs). We consider this incidentalparameters problem in “short” panels, with fixed T and L → ∞ and with unit FE. Similar argumentsapply with L fixed and T → ∞ and with period FEs.Suppose for the linear causal model y (cid:96)t = βx (cid:96)t + (cid:15) (cid:96)t and control vector w (cid:96)t (which includes theFEs), (cid:80) (cid:96) e (cid:96)t w ∆ (cid:96)t z (cid:96)t p −→ Ω zw , (cid:80) (cid:96) e (cid:96)t w ∆ (cid:96)t (cid:15) ∆ (cid:96)t p −→ Ω w(cid:15) , and (cid:80) (cid:96) e (cid:96)t w ∆ (cid:96)t w ∆ (cid:48) (cid:96)t p −→ Ω ww for full-rank Ω ww , where v ∆ (cid:96)t is a subvector of the (weighted) unit-demeaned observation of variable v (cid:96)t , v (cid:96)t − (cid:80) t (cid:48) e (cid:96)t (cid:48) v (cid:96)t (cid:48) (cid:80) t (cid:48) e (cid:96)t (cid:48) , thatdrops any elements that are identically zero (e.g. those corresponding to the unit FEs in w (cid:96)t ). Thendefining γ = Ω − ww Ω w(cid:15) we can write y ∆ (cid:96)t = βx ∆ (cid:96)t + w ∆ (cid:48) (cid:96)t γ + ε ∆ (cid:96)t . Suppose also that (cid:80) (cid:96) e (cid:96)t z (cid:96)t x ⊥ (cid:96)t p −→ π for44ome π (cid:54) = 0 . Then, following the proof to Proposition 2, ˆ β is consistent if and only if (cid:88) n (cid:88) t s nt g nt ¯ ε ∆ nt p −→ , (31)where s nt = (cid:80) (cid:96) e (cid:96)t s (cid:96)nt and ¯ ε ∆ nt = (cid:80) (cid:96) e (cid:96)t s (cid:96)nt ε ∆ (cid:96)t (cid:80) (cid:96) e (cid:96)t s (cid:96)nt . This condition is satisfied when analogs of Assumptions1 and 2 and the regularity conditions in Proposition 3 hold, as are the various extensions discussedin Section 3. In particular when w (cid:96)t contains t -specific FE the key assumption of quasi-experimentalshock assignment is E (cid:2) g nt | ¯ ε ∆ nt , s nt (cid:3) = µ t , ∀ n, t , allowing endogenous period-specific shock means µ t via Proposition 4. This assumption avoids the incidental parameters problem by considering shocksas-good-as-randomly assigned given an unobserved ¯ ε ∆ nt , which is a function of the time-varying ε (cid:96)p across all periods p .An intuitive special case of this approach is when the exposure shares and importance weights arefixed: s (cid:96)nt = s (cid:96)n and e (cid:96)t = e (cid:96) . Then the weights in (31) are time-invariant, s nt = s n , and ¯ ε ∆ nt = (cid:80) (cid:96) e (cid:96) s (cid:96)n ε ∆ (cid:96)t (cid:80) (cid:96) e (cid:96) s (cid:96)n = (cid:80) (cid:96) e (cid:96) s (cid:96)n ( ε (cid:96)t − T (cid:80) t (cid:48) ε (cid:96)t (cid:48) ) (cid:80) (cid:96) e (cid:96) s (cid:96)n = ¯ ε nt − T (cid:88) t (cid:48) ¯ ε nt (cid:48) , (32)where ¯ ε nt = (cid:80) (cid:96) e (cid:96) s (cid:96)n ε (cid:96)t (cid:80) (cid:96) e (cid:96) s (cid:96)n is an aggregate of period-specific unobservables ε (cid:96)t . It is then straightfor-ward to extend Propositions 3 and 4 under a shock-level assumption of strong exogeneity, i.e. that E [ g nt | ¯ ε n , . . . ¯ ε nT , s n ] = µ n + τ t for all n and t . Here endogenous n -specific shock means are per-mitted by the observation in Section 3.3, that share-weighted n -specific FE at the shock level aresubsumed by (cid:96) -specific FE in the SSIV regression when shares and weights are fixed. A.6 SSIV Relevance with Panel Data
This appendix shows that holding the exposure shares fixed in a pre-period is likely to weaken theSSIV first-stage in panel regressions. Consider a panel extension of the first stage model used inSection 3.1, where x (cid:96)t = (cid:80) n s (cid:96)nt x (cid:96)nt with x (cid:96)nt = π (cid:96)nt g nt + η (cid:96)nt , π (cid:96)n ≥ ¯ π for some fixed ¯ π > , and the g nt are mutually independent and mean-zero with variance σ nt ≥ ¯ σ g for fixed σ g > , independentlyof { η (cid:96)nt } (cid:96),n,t . As in other appendices, we here treat s (cid:96)nt , e (cid:96)t , and π (cid:96)nt as non-stochastic. Then againomitting controls for simplicity, an SSIV regression with z (cid:96)t = (cid:80) Nn =1 s ∗ (cid:96)nt g nt as an instrument, where45 ∗ (cid:96)nt is either s (cid:96)nt (updated shares) or s (cid:96)n (fixed shares), yields an expected first-stage covariance of E (cid:34)(cid:88) (cid:96) (cid:88) t e (cid:96)t z (cid:96)t x ⊥ (cid:96)t (cid:35) ≥ ¯ σ g ¯ π (cid:88) (cid:96) (cid:88) t e (cid:96)t (cid:88) n s ∗ (cid:96)nt s (cid:96)tn . (33)For panel SSIV relevance we require the e (cid:96)t -weighted average of (cid:80) n s ∗ (cid:96)nt s (cid:96)nt to not vanish asymptoti-cally. With updated shares this is satisfied when the Herfindahl index of an average observation-period(across shocks) is non-vanishing, while in the fixed shares case the overlap of shares in periods and t , (cid:80) n s (cid:96)n s (cid:96)nt , may become weak or even vanish as T → ∞ , on average across observations. A.7 Estimated Shocks
This appendix establishes the formal conditions for the SSIV estimator, with or without a leave-one-out correction, to be consistent when shocks g n are noisy estimates of some latent g ∗ n satisfyingAssumptions 1 and 2. We also propose a heuristic measure that indicates whether the leave-one-outcorrection is likely to be important and compute it for the Bartik (1991) setting. Straightforwardextensions to other split-sample estimators follow.Suppose a researcher estimates shocks via a weighted average of variables g (cid:96)n . That is, givenweights ω (cid:96)n ≥ such that (cid:80) (cid:96) ω (cid:96)n = 1 for all n , she computes g n = (cid:88) (cid:96) ω (cid:96)n g (cid:96)n . (34)A leave-one-out (LOO) version of the shock estimator is instead g n, − (cid:96) = (cid:80) (cid:96) (cid:48) (cid:54) = (cid:96) ω (cid:96) (cid:48) n g (cid:96) (cid:48) n (cid:80) (cid:96) (cid:48) (cid:54) = (cid:96) ω (cid:96) (cid:48) n . (35)We assume that each g (cid:96)n is a noisy version of the same latent shock g ∗ n : g (cid:96)n = g ∗ n + ψ (cid:96)n , (36)where g ∗ n satisfies Assumptions 1 and 2 and ψ (cid:96)n is estimation error (in Section 3.4 we consideredthe special case of ψ (cid:96)n ∝ ε (cid:96) ). This implies a feasible shift-share instrument of z (cid:96) = z ∗ (cid:96) + ψ (cid:96) and itsLOO version z LOO(cid:96) = z ∗ (cid:96) + ψ LOO(cid:96) , where z ∗ (cid:96) = (cid:80) n s (cid:96)n g ∗ n , ψ (cid:96) = (cid:80) n s (cid:96)n (cid:80) (cid:96) (cid:48) ω (cid:96) (cid:48) n ψ (cid:96) (cid:48) n , and ψ LOO(cid:96) = (cid:80) n s (cid:96)n (cid:80) (cid:96) (cid:48)(cid:54) = (cid:96) ω (cid:96) (cid:48) n ψ (cid:96) (cid:48) n (cid:80) (cid:96) (cid:48)(cid:54) = (cid:96) ω (cid:96) (cid:48) n . The orthogonality condition for these instruments requires that (cid:80) (cid:96) e (cid:96) ε (cid:96) ψ (cid:96) p → and (cid:80) (cid:96) e (cid:96) ε (cid:96) ψ LOO(cid:96) p → , respectively.We now present three sets of results. First, we establish a simple sufficient condition under whichthe LOO instrument satisfies exclusion. We also propose stronger conditions that guarantee consis-tency of LOO-SSIV. Second, we explore the conditions under which the covariance between ε (cid:96) and ψ (cid:96)n
46s ignorable, i.e. asymptotically does not lead to a “mechanical” bias of the conventional leave-one-outestimator. We propose a heuristic measure that is large when the bias is likely to be small. Lastly, weapply these ideas to the setting of Bartik (1991) using the data from Goldsmith-Pinkham et al. (2019).In line with previous appendices, we condition on s (cid:96)n , ω (cid:96)n , and e (cid:96) and treat them as non-stochasticfor notational convenience. We also assume the SSIV regressions are estimated without controls w (cid:96) . LOO Exclusion and Consistency
The following proposition establishes three results. The firstis the most important one, providing the condition for exclusion to hold in expectation, which wediscuss below. The second strengthens this condition so that the estimator converges, which naturallyrequires that most shocks are estimated with sufficient amount of data. A tractable case of completespecialization is considered in last part, where there should be many more observations than shocks.
Prop. A5
1. If E [ ε (cid:96) ψ (cid:96) (cid:48) n ] = 0 for all (cid:96) (cid:54) = (cid:96) (cid:48) and n , then E [ (cid:80) (cid:96) e (cid:96) ε (cid:96) ψ (cid:96),LOO ] = 0 .2. If E (cid:104) ( ε (cid:96) , ψ (cid:96)n ) | { ( ε (cid:96) (cid:48) , ψ (cid:96) (cid:48) n (cid:48) ) } (cid:96) (cid:48) (cid:54) = (cid:96),n (cid:48) (cid:105) = 0 for all (cid:96) and n , then the LOO estimator is consistent,provided it has a first stage and two regularity conditions hold: E (cid:2)(cid:12)(cid:12) ε (cid:96) ε (cid:96) ψ (cid:96) (cid:48) n ψ (cid:96) (cid:48) n (cid:12)(cid:12)(cid:3) ≤ B fora constant B and all ( (cid:96) , (cid:96) , (cid:96) (cid:48) , (cid:96) (cid:48) , n , n ) and (cid:88) ( (cid:96) ,(cid:96) ,(cid:96) (cid:48) ,(cid:96) (cid:48) ) ∈J ,n ,n e (cid:96) e (cid:96) s (cid:96) n s (cid:96) n ω (cid:96) (cid:48) n (cid:80) (cid:96) (cid:54) = (cid:96) ω (cid:96)n ω (cid:96) (cid:48) n (cid:80) (cid:96) (cid:54) = (cid:96) ω (cid:96)n → , (37)with J denoting the set of tuples ( (cid:96) , (cid:96) , (cid:96) (cid:48) , (cid:96) (cid:48) ) for which one of the two conditions hold: (i) (cid:96) = (cid:96) and (cid:96) (cid:48) = (cid:96) (cid:48) (cid:54) = (cid:96) , (ii) (cid:96) = (cid:96) (cid:48) and (cid:96) = (cid:96) (cid:48) (cid:54) = (cid:96) .3. Condition (37) is satisfied if NL → in the special case where each region is specialized inone industry, i.e. s (cid:96)n = [ n = n ( (cid:96) )] for some n ( · ) , there are no importance weights ( e (cid:96) = L ),and shocks estimated by simple LOO averaging among observations exposed to a given shock( ω (cid:96)n = L n for L n = (cid:80) (cid:96) [ n ( (cid:96) ) = n ] ), assuming further that L n ≥ for each n so that the LOOestimator is well-defined. Proof : See Appendix B.8.The condition of (1) would be quite innocuous in random samples of (cid:96) – the environment in whichleave-one-out adjustments are often considered (e.g. Angrist et al. (1999)) – but is strong withoutrandom sampling. It requires ε (cid:96) and ψ (cid:96) (cid:48) n to be uncorrelated for (cid:96) (cid:48) (cid:54) = (cid:96) , which may easily be violatedwhen both (cid:96) and (cid:96) (cid:48) are exposed to the same shocks—a situation in which excluding own observation isnot sufficient. Moreover, since we have conditioned on the exposure shares throughout, E [ ε (cid:96) ψ (cid:96) (cid:48) n ] = 0 generally requires either ε (cid:96) or ψ (cid:96) (cid:48) n to have a zero conditional mean—the share exogeneity assumption47pplied to either the residuals or the estimation error. At the same time, this condition does notrequire E [ ε (cid:96) ψ (cid:96) (cid:48) n ] = 0 for (cid:96) = (cid:96) (cid:48) , which reflects the benefit of LOO: eliminating the mechanical biasfrom the residual directly entering shock estimates. Heuristic for Importance of LOO Correction
We now return to the non-LOO SSIV estimator.As in Proposition A4, we assume that E [ ε (cid:96) ψ (cid:96) (cid:48) n ] = 0 for (cid:96) (cid:48) (cid:54) = (cid:96) and all n , so the LOO estimatoris consistent under the additional regularity conditions. Then the “mechanical bias” mentioned inSection 3.4 is the only potential problem: under appropriate regularity conditions (similar to those inpart 2 of Proposition A4), ˆ β − β = E [ (cid:80) (cid:96) e (cid:96) ε (cid:96) ψ (cid:96) ] E (cid:2)(cid:80) (cid:96) e (cid:96) z ⊥ (cid:96) x (cid:96) (cid:3) + o p (1)= (cid:80) (cid:96),n e (cid:96) s (cid:96)n ω (cid:96)n E [ ε (cid:96) ψ (cid:96)n ] E (cid:2)(cid:80) (cid:96) e (cid:96) z ⊥ (cid:96) x (cid:96) (cid:3) + o p (1) . (38)With | E [ ε (cid:96) ψ (cid:96)n ] | bounded by some B > for all (cid:96) and n , the numerator of (38) is bounded by H N B , for an observable composite of the relevant shares H N = (cid:80) (cid:96),n e (cid:96) s (cid:96)n ω (cid:96)n . The structure of theshares also influences the strength of the first stage in the denominator. We assume, without loss ofgenerality, that z (cid:96) is mean-zero and impose our standard model of the first stage from Section 3.1(but specified based on the latent shock g ∗ n ): x (cid:96) = (cid:80) n s (cid:96)n x (cid:96)n for x (cid:96)n = π (cid:96)n g ∗ n + η (cid:96)n , η (cid:96)n mean-zeroand uncorrelated with g ∗ n (cid:48) for all (cid:96), n, n (cid:48) , Var [ g ∗ n ] ≥ ¯ σ g > and π (cid:96)n ≥ ¯ π > : E (cid:34)(cid:88) (cid:96) e (cid:96) z ⊥ (cid:96) x (cid:96) (cid:35) = (cid:88) (cid:96) e (cid:96) E (cid:34)(cid:32)(cid:88) n s (cid:96)n ( g ∗ n + ψ (cid:96)n ) (cid:33) (cid:32)(cid:88) n (cid:48) s (cid:96)n (cid:48) ( π (cid:96)n g ∗ n (cid:48) + η (cid:96)n (cid:48) ) (cid:33)(cid:35) = (cid:88) (cid:96),n e (cid:96) s (cid:96)n · π (cid:96)n Var [ g ∗ n ] + (cid:88) (cid:96) e (cid:96) (cid:88) n,n (cid:48) s (cid:96)n s (cid:96)n (cid:48) E [ ψ (cid:96)n ( π (cid:96)n g ∗ n (cid:48) + η (cid:96)n (cid:48) )] . (39)Excepting knife-edge cases where the two terms in (39) cancel out, E [ (cid:80) (cid:96) e (cid:96) z (cid:96) x (cid:96) ] (cid:54)→ provided H L = (cid:80) (cid:96),n e (cid:96) s (cid:96)n ≥ ¯ H for some fixed ¯ H > .We thus define the following heuristic: H = H L H N = (cid:80) (cid:96),n e (cid:96) s (cid:96)n (cid:80) (cid:96),n e (cid:96) s (cid:96)n ω (cid:96)n . (40)When H is large, we expect the non-LOO SSIV estimator to be relatively insensitive to the mechanicalbias generated by the average covariance between ψ (cid:96)n and ε (cid:96) , and thus similar to the LOO estimator.We note an important special case. Suppose all weights are derived from variable E (cid:96)n (e.g. laggedemployment level in region (cid:96) and industry n ) as s (cid:96)n = E (cid:96)n E (cid:96) , ω (cid:96)n = E (cid:96)n E n , and e (cid:96) = E (cid:96) E , for E (cid:96) = (cid:80) n E (cid:96)n ,48 n = (cid:80) (cid:96) E (cid:96)n , and E = (cid:80) (cid:96) E (cid:96) . Then H N = (cid:88) (cid:96),n E (cid:96) E E (cid:96)n E (cid:96) E (cid:96)n E n = (cid:88) (cid:96),n E n E (cid:18) E (cid:96)n E n (cid:19) = (cid:88) n s n (cid:88) (cid:96) ω (cid:96)n , (41)where s n = E n E is the weight in our equivalent shock-level regression. Therefore, H N is the weightedaverage across n of n -specific Herfindahl concentration indices, while H L is the weighted average across (cid:96) of the (cid:96) -specific Herfindahl indices. With E (cid:96)n denoting lagged employment, H is high (and thus weexpect the LOO correction to be unnecessary) when employment is much more concentrated acrossindustries in a typical region than it is concentrated across regions for a typical industry.The formula simplifies further with E (cid:96)n = [ n = n ( (cid:96) )] for all (cid:96), n , corresponding to the case ofcomplete specialization of observations in shocks with no regression or shock estimation weights, asin part 3 of Proposition A4. In that case, H = 1 (cid:80) (cid:96) L L n ( (cid:96) ) = 1 L (cid:80) n (cid:80) (cid:96) : n ( (cid:96) )= n L n = LN . (42)Our heuristic is therefore large when there are many observations per estimated shock. Application to Bartik (1991)
We finally apply our insights to the Bartik (1991) setting, using theGoldsmith-Pinkham et al. (2019) replication code and data. Table C5 reports the results. Column1 shows the estimates of the inverse local labor supply elasticity using SSIV estimators with andwithout the LOO correction and using population weights, replicating Table 3, column 2, of Goldsmith-Pinkham et al. (2019). Column 2 repeats the analysis without the population weights. We find allestimates to range between 1.2 and 1.3, showing that in practice for Bartik (1991) the LOO correctiondoes not play a substantial role.This is however especially true without weights, where the LOO and conventional SSIV estimatorsare 1.30 and 1.29, respectively. Our heuristic provides an explanation: H is almost 8 times biggerwhen computed without weights. The intuition is that large commuting zones, such as Los Angelesand New York, may constitute a substantial fraction of employment in industries of their comparativeadvantage. This generates a potential for the mechanical bias: labor supply shocks in those regionsaffect shock estimates; this bias is avoided by LOO estimators. However, the role of the largestcommuting zones is only significant in weighted regressions (by employment or, as in Goldsmith-Pinkham et al. (2019), population). Here /H = N/L is proportional to the “bias” of the non-LOO estimator, which is similar to how the finite-samplebias of conventional 2SLS is proportional to the number of instruments over the sample size (Nagar 1959). Industry growth shocks in this column are the same as in Column 1, again estimated with employment weights. .8 Equilibrium Industry Growth in a Model of Local Labor Markets This appendix develops a simple model of regional labor supply and demand, similar to the model inAdão et al. (2019a). Our goal is to show how the national growth rate of industry employment canbe viewed as a noisy version of the national industry-specific labor demand shocks, and how regionallabor supply shocks (along with some other terms) generate the “estimation error.”Consider an economy that consists of a set of L regions. In each region (cid:96) there is a prevailing wage W (cid:96) , and labor supply has constant elasticity φ : E (cid:96) = M (cid:96) W φ(cid:96) , (43)where E (cid:96) is total regional employment and M (cid:96) is the supply shifter that depends on the working-agepopulation, the outside option, and other factors. Labor demand in each industry n is given by aconstant-elasticity function E (cid:96)n = A n ξ (cid:96)n W − σ(cid:96) , (44)where E (cid:96)n is employment, A n is the national industry demand shifter, ξ (cid:96)n is its idiosyncratic compo-nent, and σ is the elasticity of labor demand. The equilibrium is given by (cid:88) n E (cid:96)n = E (cid:96) . (45)Now consider small changes in fundamentals A n , ξ (cid:96)n and M (cid:96) . We use log-linearization around theobserved equilibrium and employ the Jones 1965 hat algebra notation, with ˆ v denoting the relativechange in v between the equilibria. We then establish: Prop. A6
After a set of small changes to fundamentals, the national industry employment growthis characterized by g n = (cid:88) (cid:96) ω (cid:96)n g (cid:96)n , (46)for ω (cid:96)n = E (cid:96)n / (cid:80) (cid:96) (cid:48) E (cid:96) (cid:48) n denoting the share of region (cid:96) in industry employment, and thechange in region-by-industry employment g (cid:96)n is characterized by g (cid:96)n = g ∗ n + σσ + φ ε (cid:96) + ˆ ξ (cid:96)n − σσ + φ (cid:88) n s (cid:96)n (cid:16) g ∗ n + ˆ ξ (cid:96)n (cid:17) , (47)where g ∗ n = ˆ A n is the national industry labor demand shock, ε (cid:96) = ˆ M (cid:96) is the regional laborsupply shock, and s (cid:96)n = E (cid:96)n / (cid:80) n (cid:48) E (cid:96)n (cid:48) . Proof : See Appendix B.9. 50he first term in (47) justifies our interpretation of the observed industry employment growth as anoisy estimate of the latent labor demand shock g ∗ n . The other terms constitute the “estimation error.”The first of them is proportional to the structural residual of the labor supply equation, ε (cid:96) ; we havepreviously established the conditions under which it may or may not confound SSIV estimation. Theother terms, that we abstracted away from in Section 3.4, include the idiosyncratic demand shock ˆ ξ (cid:96)n and shift-share averages of both national and idiosyncratic demand shocks. If the model is correct,all of these are uncorrelated with ε (cid:96) , thus not affecting Assumption 1. A.9 SSIV with Multiple Endogenous Variables or Instruments
This appendix first generalizes our equivalence result for SSIV regressions with multiple endogenousvariables and instruments, and discusses corresponding extensions of our quasi-experimental frame-work via the setting of Jaeger et al. (2018). We also describe how to construct the effective first-stage F -statistic of Montiel Olea and Pflueger (2013) for SSIV with one endogenous variable but multipleinstruments. We then consider new shock-level IV procedures in this framework, which can be usedfor efficient estimation and specification testing. Finally, we illustrate these new procedures in theAutor et al. (2013) setting. Generalized Equivalence and SSIV Consistency
We consider a class of SSIV estimators of anoutcome model with multiple treatment channels, y (cid:96) = β (cid:48) x (cid:96) + γ (cid:48) w (cid:96) + ε (cid:96) , (48)where x (cid:96) = ( x (cid:96) , . . . , x K(cid:96) ) (cid:48) is instrumented by z (cid:96) = ( z (cid:96) , . . . , z J(cid:96) ) (cid:48) , for z j(cid:96) = (cid:80) n s (cid:96)n g jn and J ≥ K , andobservations are weighted by e (cid:96) . Members of this class are parameterized by a (possibly stochastic)full-rank K × J matrix c , which is used to combine the instruments into a vector of length J , c z (cid:96) . Forexample the two-stage least squares (2SLS) estimator sets c = x ⊥(cid:48) ez ( z ⊥(cid:48) ez ⊥ ) − , where z ⊥ stacksobservations of the residualized z ⊥(cid:48) (cid:96) . IV estimates using a given combination are written as ˆ β = ( cz (cid:48) ex ⊥ ) − cz (cid:48) ey ⊥ , (49)where y ⊥ and x ⊥ stack observations of the residualized y ⊥ (cid:96) and x ⊥(cid:48) (cid:96) , z stacks observations of z (cid:48) (cid:96) , and e is an L × L diagonal matrix of e (cid:96) weights. In just-identified IV models (i.e. J = K ) the two c ’s cancelin this expression and all IV estimators are equivalent. Note that while the shocks g jn are differentacross the multiple instruments, we assume here that the exposure shares s (cid:96)n are all the same.As in Proposition 1, ˆ β can be equivalently obtained by a particular shock-level IV regression.Intuitively, when the shares are the same, c z (cid:96) also has a shift-share structure based on a linearcombination of shocks c g n , and thus Proposition 1 extends. Formally, write z = sg where s is an51 × N matrix of exposure shares and g stacks observations of the shock vector g (cid:48) n ; then, ˆ β = ( cg (cid:48) s (cid:48) ex ⊥ ) − cg (cid:48) s (cid:48) ey ⊥ = ( cg (cid:48) S ¯ x ⊥ ) − ( cg (cid:48) S ¯ y ⊥ ) , (50)where S is an N × N diagonal matrix with elements s n , ¯ x ⊥ is an N × K matrix with elements ¯ x ⊥ kn ,and ¯ y ⊥ is an N × vector of ¯ y ⊥ n . This is the formula for an s n -weighted IV regression of ¯ y ⊥ n on ¯ x ⊥ n , . . . , ¯ x ⊥ Kn with shocks as instruments, no constant, and the same c matrix. Furthermore, as inProposition 1, ι (cid:48) S ¯ y ⊥ = (cid:88) n s n ¯ y ⊥ n = (cid:88) (cid:96) e (cid:96) (cid:32)(cid:88) n s (cid:96)n (cid:33) y ⊥ (cid:96) = (cid:88) (cid:96) e (cid:96) y ⊥ (cid:96) = 0 , (51)and similarly for ι (cid:48) S ¯ x (cid:48) , where ι is a N × vector of ones. Therefore, the same estimate is obtained byincluding a constant in this IV procedure (and the same result holds including a shock-level controlvector q n provided (cid:80) n s (cid:96)n has been included in w (cid:96) , as in Proposition 5). The c matrix is againredundant in the just-identified case.A natural generalization of the orthogonality condition from Section 2.3 and the quasi-experimentalframework of Section 3 follows. Rather than rederiving all of these results, we discuss them intuitivelyin the setting of Jaeger et al. (2018). Here y (cid:96) denotes the growth rate of wages in region (cid:96) in a givenperiod (residualized on Mincerian controls), x (cid:96) is the immigrant inflow rate in that period, and x (cid:96) is the previous period’s immigration rate. The residual ε (cid:96) captures changes to local productivity andother regional unobservables. Jaeger et al. (2018, Table 5) estimate this model with two “past settle-ment” instruments z (cid:96) = (cid:80) n s (cid:96)n g n and z (cid:96) = (cid:80) n s (cid:96)n g n , where s (cid:96)n is the share of immigrants fromcountry of origin n in location (cid:96) at a previous reference date and g n = ( g n , g n ) (cid:48) gives the current andprevious period’s national immigration rate from n . When this path of immigration shocks is as-good-as-randomly assigned with respect to the aggregated productivity shocks ¯ ε n (satisfying a generalizedAssumption 1), the g n are uncorrelated across countries and E (cid:2)(cid:80) n s n (cid:3) → (satisfying a general-ized Assumption 2), and appropriately generalized regularity conditions from Proposition 3 hold, themultiple-treatment shock orthogonality condition is satisfied: (cid:80) n s n g kn ¯ ε n p −→ for each k . Thenunder the relevance and regularity conditions from Proposition 2, again appropriately generalized, theSSIV estimates are consistent: ˆ β p −→ β . Effective First-Stage F -statistics With one endogenous variable and multiple instruments, theMontiel Olea and Pflueger (2013) effective first-stage F -statistic provides a state-of-art heuristic fordetecting a weak first-stage. Here we describe a correction to it for SSIV that generalizes the F -statistic in the single instrument case discussed in Section 4.2. The Stata command weakssivtest ,52rovided with our replication archive, implements this correction. Consider a structural first stage with multiple instruments and one endogenous variable: x (cid:96) = π (cid:48) z (cid:96) + ρw (cid:96) + η (cid:96) . (52)Suppose each of the shocks satisfies Assumption 3, i.e. E [ g jn | q n , ¯ ε n , s n ] = µ (cid:48) j q n , where (cid:80) n s (cid:96)n q n isincluded in w (cid:96) , and the residual shocks g ∗ jn = g jn − µ (cid:48) j q n are independent from { η (cid:96) } L(cid:96) =1 . The MontielOlea and Pflueger (2013) effective F -statistic for the 2SLS regression of y (cid:96) on x (cid:96) , instrumenting with z (cid:96) , . . . , z J(cid:96) , controlling for w (cid:96) , and weighting by e (cid:96) , is given by F eff = (cid:0)(cid:80) (cid:96) e (cid:96) x ⊥ (cid:96) z ⊥ (cid:96) (cid:1) (cid:48) (cid:0)(cid:80) (cid:96) e (cid:96) x ⊥ (cid:96) z ⊥ (cid:96) (cid:1) tr ˆ V , (53)where ˆ V estimates V = Var (cid:2)(cid:80) (cid:96) e (cid:96) z ⊥ (cid:96) η (cid:96) (cid:3) . Note that, as before, the first-stage covariance of the originalSSIV regression equals that of the equivalent shock-level one from Proposition 5: (cid:88) (cid:96) e (cid:96) x ⊥ (cid:96) z ⊥ (cid:96) = (cid:88) (cid:96) e (cid:96) x ⊥ (cid:96) z (cid:96) = (cid:88) n s n g n ¯ x ⊥ n = (cid:88) n s n g n ⊥ ¯ x ⊥ n , (54)where g n ⊥ is an s n -weighted projection of g n on q n , which consistently estimates g ∗ n . A naturalextension of Proposition 5 to many mutually-uncorrelated shocks further implies that V is well-approximated by ˆ V = (cid:88) n s n g n ⊥ ¯ˆ η n , (55)where, per the discussion in Section 4.2, ¯ˆ η n denotes the residuals from an IV regression of ¯ x ⊥ n on ¯ z ⊥ n , . . . , ¯ z ⊥ Jn , instrumented with g n , . . . , g Jn , weighted by s n and controlling for q n . Plugging this ˆ V into (53) yields the corrected effective first-stage F -statistic. Efficient Shift-Share GMM
In overidentified settings (
J > K ), it is natural to consider whichestimators are most efficient; for quasi-experimental SSIV, this can be answered by combining theasymptotic results of Adão et al. (2019b) with the classic generalized methods of moments (GMM)theory of Hansen (1982). Here we show how standard shock-level IV procedures – such as 2SLS – mayyield efficient coefficient estimates ˆ β ∗ , depending on the variance structure of multiple quasi-randomlyassigned shocks.We first note that the equivalence result (50) applies to SSIV-GMM estimators as well: ˆ β = arg min b (cid:0) y ⊥ − x ⊥ b (cid:1) (cid:48) ezW z (cid:48) e (cid:0) y ⊥ − x ⊥ b (cid:1) = arg min b (cid:0) ¯ y ⊥ − ¯ x ⊥ b (cid:1) (cid:48) SgW g (cid:48) S (cid:0) ¯ y ⊥ − ¯ x ⊥ b (cid:1) , (56) Our package extends the weakivtest command developed by Pflueger and Wang (2015). W is an J × J moment-weighting matrix. This leads to an IV estimator with c = ¯ x ⊥(cid:48) SgW .For 2SLS estimation, for example, W = ( z ⊥(cid:48) ez ⊥ ) − . Under appropriate regularity conditions, theefficient choice of W ∗ consistently estimates the inverse asymptotic variance of z (cid:48) e (cid:0) y ⊥ − x ⊥ β (cid:1) = g (cid:48) S ¯ ε + o p (1) . Generalizations of results in Adão et al. (2019b) can then be used to characterize this W ∗ when shocks are as-good-as-randomly assigned with respect to ¯ ε . Given an estimate ˆ W ∗ , anefficient coefficient estimate ˆ β ∗ is given by shock-level IV regressions (50) that set c ∗ = ¯ x ⊥(cid:48) Sg ˆ W ∗ . A χ J − K test statistic based on the minimized objective in (56) can be used for specification testing.As an example, suppose shocks are conditionally homoskedastic with the same variance-covariancematrix across n , Var [ g n | ¯ ε n , s n ] = G for a constant J × J matrix G . Then the optimal ˆ β ∗ is obtainedby a shock-level 2SLS regression of ¯ y ⊥ n on all ¯ x ⊥ kn (instrumented by g jn and weighted by s n ). We showthis in the case of no controls (and mean-zero shocks) for notational simplicity. Then, Var (cid:2) g (cid:48) S (cid:0) ¯ y ⊥ − ¯ x ⊥ β (cid:1)(cid:3) = E [¯ ε (cid:48) Sgg (cid:48) S ¯ ε ]= tr ( E [¯ ε (cid:48) SGS ¯ ε ])= k G (57)for k = tr ( E [ S ¯ ε ¯ ε (cid:48) S ]) . The optimal weighting matrix thus should consistently estimate G , which issatisfied by ˆ G = g (cid:48) Sg . Under appropriate regularity conditions, a feasible optimal GMM estimate isthus given by ˆ β ∗ = (¯ x ⊥(cid:48) Sg ˆ G − g (cid:48) S ¯ x ⊥ ) − (¯ x ⊥(cid:48) Sg ˆ G − g (cid:48) S ¯ y ⊥ )= (cid:16)(cid:0) P g ¯ x ⊥ (cid:1) (cid:48) S ¯ x ⊥ (cid:17) − (cid:0) P g ¯ x ⊥ (cid:1) (cid:48) S ¯ y ⊥ , (58)where P g = g ( g (cid:48) Sg ) − g (cid:48) S is an s n -weighted shock projection matrix. This is the formula for an s n -weighted IV regression of ¯ y ⊥ n on the fitted values from projecting the ¯ x ⊥ kn on the shocks, cor-responding to the 2SLS regression above. Straightforward extensions of this equivalence betweenoptimally-weighted estimates of β and shock-level overidentified IV procedures follow in the case ofheteroskedastic or clustered shocks, in which case the 2SLS estimator (58) is replaced by the estimatorof White (1982). We emphasize that these shock-level estimators are generally different than 2SLSor White (1982) estimators at the level of original observations, which are optimal under conditionalhomoskedasticity and independence assumptions placed on the residual ε (cid:96) – assumptions which aregenerally violated in our quasi-experimental framework. Many Shocks in Autor et al. (2013)
Appendix Table C4 illustrates different shock-level overi-dentified IV estimators in the setting of Autor et al. (2013), introduced in Section 5.1. ADH constructtheir shift-share instrument based on the growth of Chinese imports in eight economies comparable54o the U.S., together. We separate them to produce eight sets of industry shocks g jn , each reflectingthe growth of Chinese imports in one of those countries. As in Section 5, the outcome of interestis a commuting zone’s growth in total manufacturing employment with the single treatment variablemeasuring a commuting zone’s local exposure to the growth of imports from China (see footnote 26for precise variable definitions). The vector of controls coincides with that of column 3 of Table 4,isolating within-period variation in manufacturing shocks. Per Section 4, exposure-robust standarderrors are obtained by controlling for period main effects in the shock-level IV procedures, and wereport corrected first stage F -statistics constructed as detailed above.Column 1 reports estimates of the ADH coefficient β using the industry-level two-stage leastsquares procedure (58). At -0.238, this estimate it is very similar to the just-identified estimate incolumn 3 of Table 4. Column 2 shows that we also obtain a very similar coefficient of -0.247 withan industry-level limited information maximum likelihood (LIML) estimator. Finally, in column 3we report a two-step optimal IV estimate of β using an industry-level implementation of the White(1982) estimator. Both the coefficient and standard error fall somewhat, with the latter consistentwith the theoretical improvement in efficiency relative to columns 1 and 2. From this efficient estimatewe obtain an omnibus overidentification test statistic of 10.92, distributed as chi-squared with sevendegrees of freedom under the null of correct specification. This yields a p -value for the test of jointorthogonality of all eight ADH shocks of 0.142. Table C4 also reports the corrected effective first-stage F -statistic which measures the strength of the relationship between the endogenous variableand the eight shift-share instruments across regions. At 15.10 it is substantially lower than with oneinstrument in column 3 of Table 4 but still above the conventional heuristic threshold of 10. A.10 Finite-Sample Performance of SSIV: Monte-Carlo Evidence
In this appendix we study the finite-sample performance of the SSIV estimator via Monte-Carlosimulation. We base this simulation on the data of Autor et al. (2013), as described in Section 5.For comparison, we also simulate more conventional shock-level IV estimators, similar to those usedin Acemoglu et al. (2016), which also estimate the effects of import competition with China on U.S.employment. We begin by describing the design of these simulations and the benchmark Monte-Carlo results. We then explore how the simulation results change with various deviations from thebenchmark: with different levels of industry concentration, different numbers of industries and regions,and with many shock instruments. Besides showing the general robustness of our framework, theseextensions allow us to see how informative some conventional rules of thumb are on the finite-sampleperformance of shift-share estimators.
Simulation design
We base our benchmark data-generating process for SSIV on the specificationin column 3 of Table 4. The outcome variable y (cid:96)t corresponds to the change in manufacturing em-55loyment as a fraction of working-age population of region (cid:96) in period t , treatment x (cid:96)t is a measure ofregional import competition with China, and the shift-share instrument is constructed by combiningthe industry-level growth of China imports in eight developed economies, g nt , with lagged regionalemployment weights of different industries s (cid:96)nt . We also include pre-treatment controls w (cid:96)t as incolumn 3 of Table 4 and and estimate regressions with regional employment weights e (cid:96)t ; see Section5 for more detail on the Autor et al. (2013) setting.In a first step we obtain an estimated SSIV second and first stage of y (cid:96)t = ˆ βx (cid:96)t + ˆ γ (cid:48) w (cid:96)t + ˆ ε (cid:96)t , (59) x (cid:96)t = ˆ πz (cid:96)t + ˆ ρ (cid:48) w (cid:96)t + ˆ u (cid:96)t . (60)We then generate 10,000 simulated samples by drawing shocks g ∗ nt , as detailed below, and constructingthe simulated shift-share instrument z ∗ (cid:96)t = (cid:80) n s (cid:96)nt g ∗ nt and treatment x ∗ (cid:96)t = ˆ πz ∗ (cid:96)t + ˆ u (cid:96)t . Imposing atrue causal effect of β ∗ = 0 , we use the same y ∗ (cid:96)t ≡ ˆ ε (cid:96)t as the outcome in each simulation (note thatit is immaterial whether we include ˆ π (cid:48) w (cid:96)t and ˆ ρ (cid:48) w (cid:96)t , since all our specifications control for w (cid:96)t ). Bykeeping ˆ ε (cid:96)t and ˆ u (cid:96)t fixed, we study the finite sample properties of the estimator that arises from therandomness of shocks, which is the basis of the inferential framework of Adão et al. (2019b); we alsoavoid having to take a stand on the joint data generating process of ( ε (cid:96)t , u (cid:96)t ) , which this inferenceframework does not restrict.We estimate SSIV specifications that parallel (59)-60) from the simulated data y ∗ (cid:96)t = β ∗ x ∗ (cid:96)t + γ ∗ w (cid:96)t + ε ∗ (cid:96)t , (61) x ∗ (cid:96)t = π ∗ z ∗ (cid:96)t + ρ ∗ w (cid:96)t + u ∗ (cid:96)t . (62)using the original weights e (cid:96)t and controls w (cid:96)t . We then test the (true) hypothesis β ∗ = 0 using eitherthe heteroskedasticity-robust standard errors from the equivalent industry-level regression or theirversion with the null imposed, as in Section 4.1. As in column 3 of Table 4, we control for periodindicators as q nt in the industry-level regression.Our comparison estimator is a conventional industry-level IV inspired by Acemoglu et al. (2016).However, we try to keep the IV regression as similar to the SSIV as possible, thus diverging fromAcemoglu et al. (2016) in some details. Specifically, the outcome y nt is the industry employmentgrowth as measured by these authors. It is defined for 392 out of the 397 industries in Autor etal. (2013), so we drop the remaining five industries in each period. The endogenous regressor x nt ≡ g USnt (growth of U.S. imports from China per worker) and the instrument g nt (growth of China imports Note that there is no need for clustering since we generate the shocks independently across industries in all simu-lations. We have verified, however, that allowing for correlation in shocks within industry groups and using clusteredstandard errors yields similar results. q nt andtaking identical regression importance weights s nt .The Monte-Carlo strategy for the conventional shock-level IV parallels the one for SSIV; we obtainan estimated industry-level second and first stage of y nt = ˆ β ind x nt + ˆ γ (cid:48) q nt + ˆ ε nt , (63) x nt = ˆ π ind g nt + ˆ ρ (cid:48) q nt + ˆ u nt . (64)using the s nt importance weights. We then perform 10,000 simulations where we regenerate shocks g ∗ nt and regress y ∗ nt = ˆ ε nt (consistent with a true causal effect of β ind = 0 , given that we control for q nt ) on x ∗ nt = ˆ π ind g ∗ nt + ˆ u nt , instrumenting by g ∗ nt , controlling for q nt , and weighting by s nt . We test β ind = 0 by using robust standard errors in this IV regression or the version with the null imposed,which corresponds to a standard Lagrange Multiplier test for this true null hypothesis.In both simulations we report the rejection rate of nominal 5% level tests for β = 0 and β ind = 0 to gauge the quality of each asymptotic approximation. We do not report the bias of the estimatorsbecause they are all approximately unbiased (more precisely, the simulated median bias is at most 1%of the estimator’s standard deviation). However we return to the question of bias at the end of thesection, where we extend the analysis to having many instruments with a weak first stage. Main results
Table C6 reports the rejection rates for shift-share IV (columns 1 and 2) and conven-tional industry-level IV (columns 3 and 4) in various simulations. Specifically, column 1 correspondsto using exposure-robust standard errors from the equivalent industry-level IV, and column 2 imple-ments the version with the null hypothesis imposed. Columns 3 and 4 parallel columns 1 and 2 whenapplied to conventional IV: the former uses heteroskedasticity-robust standard errors and the lattertests β ind = 0 with the null imposed, which amounts to using the Lagrange multiplier test.The simulations in Panel A vary the data-generating process of the shocks. Following Adão etal. (2019b) in row (a) we draw the shocks iid from a normal distribution with the variance matchedto the sample variance of the shocks in the data after de-meaning by year. The rejection rate is closeto the nominal rate of 5% for both SSIV and conventional IV (7.6% and 6.8%, respectively), and inboth cases it becomes even closer when the null is imposed (5.2% and 5.0%).This simulation may not approximate the data-generating process well because of heteroskedastic-ity: smaller industries have more volatile shocks. To match unrestricted heteroskedasticity, in row This is established by unreported regressions of | g nt | on s nt , for year-demeaned g nt from ADH, with or withoutweights. The negative relationship is significant at conventional levels. g ∗ nt = g nt ν ∗ nt by multiplying the year-demeaned observed shocks g nt by ν ∗ nt iid ∼ N (0 , (Liu 1988). This approach also provides a better approximation for the marginaldistribution of shocks than the normality assumption. Here the relative performance of SSIV is evenbetter: the rejection rate is 8.0% vs. 14.2% for conventional IV.We now depart from the row (b) simulation in several directions, as a case study for the sensitivityof the asymptotic approximation to different features of the SSIV setup. Specifically, we study therole of the Herfindahl concentration index across industries, the number of regions and industries,and the many weak instrument bias. We uniformly find that the performance of the SSIV estimatoris similar to that of industry-level IV. Our results also suggest that the Herfindahl index is a usefulstatistic for measuring the effective number of industries in SSIV, and the first-stage F -statistic isinformative about the weak instrument bias, as usual. The Role of Industry Concentration
Since Assumption 2 requires small concentration of in-dustry importance weights, measured using the Herfindahl index (cid:80) n,t s nt / (cid:16)(cid:80) n,t s nt (cid:17) , Panel B ofTable C6 studies how increasing the skewness of s nt towards the bigger industries affects coverage ofthe tests. For conventional IV this simply amounts to reweighting the regression. Specifically, for aparameter α > , we use weights ˜ s nt = s αnt · (cid:80) n (cid:48) ,t (cid:48) s n (cid:48) t (cid:48) (cid:80) n (cid:48) ,t (cid:48) s αn (cid:48) t (cid:48) . We choose the unique α to match the target level of (cid:94) HHI by solving, numerically, (cid:80) n,t (˜ s nt ) (cid:16)(cid:80) n,t ˜ s nt (cid:17) = (cid:94) HHI. (65)Matching the Herfindahl index in SSIV is more complicated since we need to choose how exactlyto amend shares ˜ s (cid:96)nt and regional weights ˜ e (cid:96)t that would yield ˜ s nt from (65). We proceed as follows:we consider the lagged level of manufacturing employment by industry E (cid:96)nt = e (cid:96)t s (cid:96)nt and the totalregional non-manufacturing employment E (cid:96) t = e (cid:96)t (1 − (cid:80) n s (cid:96)nt ) . We then define ˜ E (cid:96)nt = E (cid:96)nt · ˜ s nt s nt for manufacturing industries (and leave non-manufacturing employment unchanged, ˜ E (cid:96) t = E (cid:96) t ).This increases employment in large manufacturing industries proportionately in all regions, while Note that in ADH (cid:80) Nn =1 s (cid:96)nt equals the lagged share of regional manufacturing employment, which is below one.We thus renormalize the shares when computing the Herfindahl. The interpretation of E (cid:96)nt as the lagged level is approximate since e (cid:96)t is measured at the beginning of period inADH, while s (cid:96)nt is lagged. ˜ s (cid:96)nt and weights ˜ e (cid:96)t accordingly: ˜ e (cid:96)t = N (cid:88) n =0 (cid:88) t ˜ E (cid:96)nt , ˜ s (cid:96)nt = ˜ E (cid:96)nt ˜ e (cid:96)t . Rows (c)–(e) of Table C6 Panel B implement this procedure for target Herfindahl levels of / , / , and / , respectively. For comparison, the Herfindahl in the actual ADH data is / . (Table 1, column 2). The table finds that even with the Herfindahl index of / (corresponding tothe “effective” number of shocks of 20 in both periods total) the rejection rate is still around 7%, a levelthat may be considered satisfactory. It also shows that the rejection rate grows when the Herfindahl iseven higher, at / , suggesting that the Herfindahl can be used as an indicative rule of thumb. Moreimportantly, the rejection rates are similar for SSIV and conventional industry-level IV, as before. Varying the Number of Industries and Regions
The asymptotic sequence we consider inSection 3.1 relies on both N and L growing. Here we study how the quality of the asymptoticapproximation depends on these parametersFirst, to consider the case of small N , we aggregate industries in a natural way: from 397 four-digit manufacturing SIC industries into 136 three-digit ones and further into 20 two-digit ones andreconstruct the endogenous right-hand side variable and the instrument using aggregated data. Rows(f) and (g) of Table C6 Panel C report simulation results based on the aggregated data. They show thatrejection rates are similar to the case of detailed industries, and between SSIV and conventional IV.This does not mean that disaggregated data are not useful: the dispersion of the simulated distribution(not reported) increases with industry aggregation, reducing test power. However, standard errorscorrectly reflect this variability, resulting in largely unchanged test coverage rates.Second, to study the implications of having fewer regions L , we select a random subset of themin each simulation. The results are presented in Rows (h) and (i) of Panel C for L = 100 and ,compared to the original L = 722 , respectively. They show once again that rejection rates are notsignificantly affected (even though unreported standard errors expectedly increase).
Many Weak Instruments
In this final simulation we return to the question of SSIV bias. Sinceour previous simulations confirm that just-identified SSIV is median-unbiased, we turn to the case of Specifically, we aggregate imports from China to the U.S. and either developed economies as well as the numberof U.S. workers by manufacturing industry to construct the new g nt and g US nt . We then aggregate the shares s (cid:96)nt and s current (cid:96)nt to construct x (cid:96)t and z (cid:96)t (see footnote 26 for formulas). We do not change the regional outcome, controls, orimportance weights. For conventional IV, we additionally reconstruct the outcome (industry employment growth) byaggregating employment levels by year in the Acemoglu et al. (2016) data and measuring growth according to theirformulas. When we select regions, we always keep observations from both periods for each selected region. We keep thesecond- and first-stage coefficients from the full sample to focus on the noise that arises from shock randomness. F -statistics, when properly constructed, can serve as usefulheuristics.For clarity, we begin by describing the procedure for the conventional shock-level IV that is a smalldeparture from Column 3 of Table C6. For a given number of instruments J ≥ , in each simulationwe generate g ∗ jnt , j = 1 , . . . , J , independently across j using wild bootstrap (as in Table C6 Row(b)). We make only the first instrument relevant by setting x ∗ nt = ˆ π ind g ∗ nt + (cid:80) Jj =2 · g ∗ jnt + ˆ u nt .We then estimate the IV regression of y ∗ nt ≡ ˆ ε nt on x ∗ nt , instrumenting with g ∗ nt , . . . , g ∗ Jnt , controllingfor q nt , and weighting by s nt . We use robust standard errors and compute the effective first-stage F -statistic using the Montiel Olea and Pflueger (2013) method.The procedure for SSIV is more complex but as usual parallels the one for the conventional shock-level IV as much as possible. Given simulated shocks g ∗ jnt , we construct shift-share instruments z ∗ j(cid:96)t = (cid:80) (cid:96) s (cid:96)nt g ∗ jnt and make only the first of them relevant, x ∗ (cid:96)t = ˆ πz ∗ (cid:96)t + (cid:80) Jj =2 · z ∗ jnt + ˆ u (cid:96)t . Sincethe equivalence result from Section 2.2 need not hold for overidentified SSIV, we rely on the results inAppendix A.9: we estimate β ∗ from the industry-level regression of ¯ y ∗⊥ nt (based on y ∗ (cid:96)t = ˆ ε (cid:96)t as before)on ¯ x ∗⊥ nt by 2SLS, instrumenting by g ∗ nt , . . . , g ∗ Jnt , controlling for q nt and weighting by s nt . We computerobust standard errors from this regression to test β ∗ = 0 . For effective first-stage F -statistics, wefollow the procedure described in Appendix A.9 and implemented via our weakssivtest command inStata.Table C7 reports the result for J = 1 , , , , and , presenting the rejection rate correspondingto the 5% nominal, the median bias as a percentage of the simulated standard deviation, and themedian first-stage F -statistic. Panel A corresponds to SSIV and Panel B to the conventional shock-level IV. For higher comparability, we adjust the first-stage coefficient ˆ π ind in the latter in order tomake the F -statistics approximately match between the two panels. We find that the median bias isnow non-trivial and grows with J , at the same time as the F -statistic declines. However, the level ofbias is similar for the two estimators. The rejection rates tend to be higher for conventional IV thanSSIV, although they converge as J grows. For computational reasons we perform only , /J simulations when J > (but 10,000 for J = 1 as before). Appendix Proofs
B.1 Proposition 3 and Extensions
This section proves Proposition 3 and its extensions which allow for certain forms of mutual shockdependence. As previewed in footnote 5, we do so under weaker regularity conditions than in themain text: we assume that there is a sequence of vectors γ L such that (cid:107) ˆ γ − γ L (cid:107) = o p (1) , that max m | (cid:80) (cid:96) e (cid:96) w (cid:96)m | = O p (1) and that max m | (cid:80) (cid:96) e (cid:96) w (cid:96)m z (cid:96) | = O p (1) . These are implied by the regularityconditions of Proposition 3, while they also allow the dimension of ˆ γ to grow with L .We first verify that Proposition 2 holds under these weaker regularity conditions, such that if shockorthogonality holds and the SSIV relevance condition is satisfied then ˆ β is consistent. The relevantstep from the main text proof is (cid:88) n s n g n ¯ ε n − (cid:88) n s n g n ¯ ε ⊥ n = (cid:88) (cid:96) e (cid:96) z (cid:96) (cid:0) ε (cid:96) − ε ⊥ (cid:96) (cid:1) = (cid:32)(cid:88) (cid:96) e (cid:96) z (cid:96) w (cid:48) (cid:96) (cid:33) (ˆ γ − γ L ) p −→ , (66)since (cid:107) ˆ γ − γ L (cid:107) p −→ and max m | (cid:80) (cid:96) e (cid:96) w (cid:96)m z (cid:96) | = O p (1) .We next prove that (cid:80) n s n g n ¯ ε n is asymptotically mean-zero. Since (cid:80) n s (cid:96)n = 1 and (cid:80) (cid:96) e (cid:96) ε ⊥ (cid:96) = 0 , (cid:88) n s n ¯ ε n = (cid:88) (cid:96) e (cid:96) ε (cid:96) = (cid:88) (cid:96) e (cid:96) ( ε (cid:96) − ε ⊥ (cid:96) )= (cid:32)(cid:88) (cid:96) e (cid:96) w (cid:48) (cid:96) (cid:33) (ˆ γ − γ L ) p −→ , (67)when (cid:107) ˆ γ − γ L (cid:107) p −→ and max m | (cid:80) (cid:96) e (cid:96) w (cid:96)m | = O p (1) . Thus (cid:88) n s n g n ¯ ε n = (cid:88) n s n ( g n − µ )¯ ε n + o p (1) , (68)with E (cid:34)(cid:88) n s n ( g n − µ )¯ ε n (cid:35) = 0 (69)under Assumption 1.Finally, note that when E (cid:2) ( g n − µ ) | s n , ¯ ε n (cid:3) and E (cid:2) ¯ ε n | s n (cid:3) are uniformly bounded by finite B g B ε , respectively, we have by Assumption 2 and the Cauchy-Schwartz inequality Var (cid:34)(cid:88) n s n ( g n − µ ) ¯ ε n (cid:35) = E (cid:32)(cid:88) n s n ( g n − µ ) ¯ ε n (cid:33) = (cid:88) n E (cid:104) s n E (cid:104) E (cid:104) ( g n − µ ) | s n , ¯ ε n (cid:105) ¯ ε n | s n (cid:105)(cid:105) ≤ B g B ε E (cid:34)(cid:88) n s n (cid:35) → . (70)This implies l -convergence and so weak convergence: (cid:80) n s n ( g n − µ )¯ ε n p −→ , proving Proposition 3.Similar last steps apply when Assumption 2 is replaced by either Assumptions 5 or 6, maintaininganalogous regularity conditions. Under Assumption 5 we have, for N ( c ) = { n : c ( n ) = c } , Var (cid:34)(cid:88) n s n ( g n − µ ) ¯ ε n (cid:35) = E (cid:88) c (cid:88) n ∈ N ( c ) s n ( g n − µ ) ¯ ε n = E (cid:88) c s c E (cid:88) n ∈ N ( c ) s n s c ( g n − µ ) ¯ ε n | s c = E (cid:88) c s c (cid:88) n,n (cid:48) ∈ N ( c ) s n s c s n (cid:48) s c E [( g n − µ ) ( g n (cid:48) − µ ) ¯ ε n ¯ ε n (cid:48) | s c ] ≤ B g B ε E (cid:34)(cid:88) c s c (cid:35) → , (71)when E (cid:2) ( g n − µ ) | { ¯ ε n (cid:48) } n (cid:48) ∈ N ( c ( n )) , s c (cid:3) ≤ B g and E (cid:2) ¯ ε n | s c (cid:3) ≤ B ε uniformly. Here the last line usedthe Cauchy-Schwartz inequality twice: to establish, for n, n (cid:48) ∈ N ( c ) , E [( g n − µ ) ( g n (cid:48) − µ ) | ¯ ε n , ¯ ε n (cid:48) , s c ] ≤ (cid:112) E [( g n − µ ) | ¯ ε n , ¯ ε n (cid:48) , s c ] E [( g n (cid:48) − µ ) | ¯ ε n , ¯ ε n (cid:48) , s c ] ≤ B g (72)and E [ | ¯ ε n | | ¯ ε n (cid:48) | | s c ] ≤ (cid:113) E [¯ ε n | s c ] E [¯ ε n (cid:48) | s c ] ≤ B ε . (73)62f we instead replace Assumption 2 with Assumption 6, we have Var (cid:34)(cid:88) n s n ( g n − µ ) ¯ ε n (cid:35) = E (cid:32)(cid:88) n s n ( g n − µ ) ¯ ε n (cid:33) = (cid:88) n (cid:88) n (cid:48) E [ s n s n (cid:48) E [( g n − µ )( g n (cid:48) − µ ) | s n , ¯ ε n , s n (cid:48) , ¯ ε n (cid:48) ] ¯ ε n ¯ ε n (cid:48) ] ≤ B L (cid:88) n (cid:88) n (cid:48) f ( | n (cid:48) − n | ) E [ | s n ¯ ε n s n (cid:48) ¯ ε n (cid:48) | ]= B L (cid:32)(cid:88) n E (cid:2) ( s n ¯ ε n ) (cid:3) f (0) + 2 N − (cid:88) r =1 N − r (cid:88) n =1 E [ | s n + r ¯ ε n + r | · | s n ¯ ε n | ] f ( r ) (cid:33) ≤ (cid:32) B L (cid:88) n E (cid:2) s n E (cid:2) ¯ ε n | s n (cid:3)(cid:3)(cid:33) (cid:32) f (0) + 2 N − (cid:88) r =1 f ( r ) (cid:33) ≤ B ε (cid:32) f (0) + 2 N − (cid:88) r =1 f ( r ) (cid:33) (cid:32) B L E (cid:34)(cid:88) n s n (cid:35)(cid:33) → , (74)provided E (cid:2) ¯ ε n | s n (cid:3) < B ε uniformly. Here the second-to-last line follows because for any sequence ofnumbers a , . . . , a N and any r > , (cid:88) n a n ≥ (cid:32) N − r (cid:88) n =1 a n + N − r (cid:88) n =1 a n + r (cid:33) = 12 N − r (cid:88) n =1 ( a n − a n + r ) + N − r (cid:88) n =1 a n a n + r ≥ N − r (cid:88) n =1 a n a n + r , (75)and the same is true in expectation if a n = | s n ¯ ε n | are random variables. We note that allowing B L to grow in the asymptotic sequence imposes much weaker conditions on the correlation structure ofshocks. For example, with shock importance weights s n approximately equal, i.e. (cid:80) n s n = O p (1 /N ) ,it is enough to have | Cov [ g n , g n (cid:48) | ¯ ε n , ¯ ε n (cid:48) , s n , s n (cid:48) ] | ≤ B N − α for any α > : in this case one can satisfyAssumption 6 by setting B L = B N − α/ and f ( r ) = r − − α/ . B.2 Proposition 4
This section proves Proposition 4, again allowing for a growing number of controls (as in the aboveproof) under the following regularity conditions: that (cid:107) ˆ γ − γ L (cid:107) = o p (1) , max m | (cid:80) (cid:96) e (cid:96) w (cid:96)m w ∗(cid:48) (cid:96) µ | = O p (1) , and max m | (cid:80) (cid:96) e (cid:96) w (cid:96)m z (cid:96) | = O p (1) . Note that these are the same as in the proof of Proposition3 when q n just includes a constant (so w ∗ (cid:96) = 1 ), and Proposition 2 can be similarly shown to hold.63s in the proof to Proposition 3, we first note that (cid:88) n s n g n ¯ ε n = (cid:88) n s n ( g n − q (cid:48) n µ ) ¯ ε n + (cid:32)(cid:88) n s n q (cid:48) n ¯ ε n (cid:33) µ. (76)By a straightforward extension of the proof to Proposition 2, the first term of this expression is o p (1) when Var [ g ∗ n | ¯ ε n , q n , s n ] and E (cid:2) ¯ ε n | q n , s n (cid:3) are uniformly bounded and Assumption 4 holds.Moreover, when w ∗ (cid:96) is an included control (cid:80) (cid:96) e (cid:96) w ∗(cid:48) (cid:96) ¯ ε ⊥ (cid:96) = 0 , such that (cid:32)(cid:88) n s n q (cid:48) n ¯ ε n (cid:33) µ = (cid:32)(cid:88) (cid:96) e (cid:96) w ∗(cid:48) (cid:96) ε (cid:96) (cid:33) µ = (cid:32)(cid:88) (cid:96) e (cid:96) w ∗(cid:48) (cid:96) (cid:0) ε (cid:96) − ¯ ε ⊥ (cid:96) (cid:1)(cid:33) µ = (ˆ γ − γ L ) (cid:48) (cid:32)(cid:88) (cid:96) e (cid:96) w (cid:96) w ∗(cid:48) (cid:96) (cid:33) µ p −→ , (77)by the assumed regularity conditions. B.3 Proposition 5
This section first shows that shock-level IV coefficients obtained from estimating (12) are numericallyequivalent to the SSIV estimate ˆ β , and that its heteroskedasticity-robust standard error is numericallyequivalent to the baseline IV standard error of Adão et al. (2019b) when there are no controls in w (cid:96) . We then prove that these two standard errors asymptotically coincide more generally under theassumptions that Adão et al. (2019b) use for valid conditional SSIV inference, and that the shock-level heteroskedasticity-robust standard error remains valid without the conditions of L > N andnon-collinear exposure shares which Adão et al. (2019b) require. We conclude with a discussion ofthe likely finite-sample difference between the two standard error formulas.To establish the equivalence of IV coefficients, note that when (cid:80) n s (cid:96)n q n is included in w (cid:96) (cid:88) n s n q n ¯ y ⊥ n = (cid:88) (cid:96) e (cid:96) y ⊥ (cid:96) (cid:32)(cid:88) n s (cid:96)n q n (cid:33) = 0 (78)and similarly for (cid:80) n s n q n ¯ x ⊥ n . The s n -weighted regression of ¯ y ⊥ n and ¯ x ⊥ n on q n thus produces acoefficient vector that is numerically zero, implying the s n -weighted and g n -instrumented regressionof ¯ y ⊥ n on ¯ x ⊥ n is unchanged with the addition of q n controls. Proposition 1 shows that the IV coefficientfrom this regression is equivalent to the SSIV estimate ˆ β .To establish standard error equivalence in the case without controls, note that the conventionalheteroskedasticity-robust standard error from for the s n -weighted shock-level IV regression of ¯ y ⊥ n on64 x ⊥ n and a constant, instrumented by g n , is given by (cid:98) se equiv = (cid:112)(cid:80) n s n ˆ ε n ˆ g n | (cid:80) n s n ¯ x ⊥ n g n | , (79)where ˆ ε n = ¯ y ⊥ n − ˆ β ¯ x ⊥ n is the estimated shock-level regression residual (where we used the fact thatthe estimated constant in that regression is numerically zero) and ˆ g n = g n − (cid:80) n s n g n is the s n -weighted demeaned shock. By Proposition 1 ˆ ε n coincides with the share-weighted aggregate of theSSIV estimated residuals ˆ ε (cid:96) = y ⊥ (cid:96) − ˆ βx ⊥ (cid:96) : ˆ ε n = (cid:80) (cid:96) e (cid:96) s (cid:96)n y ⊥ (cid:96) (cid:80) (cid:96) e (cid:96) s (cid:96)n − ˆ β · (cid:80) (cid:96) e (cid:96) s (cid:96)n x ⊥ (cid:96) (cid:80) (cid:96) e (cid:96) s (cid:96)n = (cid:80) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) (cid:80) (cid:96) e (cid:96) s (cid:96)n . (80)The squared numerator of (79) can thus be rewritten (cid:88) n s n ˆ ε n ˆ g n = (cid:88) n (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) (cid:33) ˆ g n . (81)The expression in the denominator of (79) estimates the magnitude of the shock-level first-stagecovariance, which matches the e (cid:96) -weighted sample covariance of x (cid:96) and z (cid:96) : (cid:88) n s n ¯ x ⊥ n g n = (cid:88) n (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n x ⊥ (cid:96) (cid:33) g n = (cid:88) (cid:96) e (cid:96) x ⊥ (cid:96) z (cid:96) . (82)Thus (cid:98) se equiv = (cid:113)(cid:80) n ( (cid:80) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) ) ˆ g n (cid:12)(cid:12)(cid:80) (cid:96) e (cid:96) x ⊥ (cid:96) z (cid:96) (cid:12)(cid:12) . (83)We now compare this expression to the standard error formula from Adão et al. (2019b). Absentcontrols and incorporating the e (cid:96) importance weights, equation (39) in their paper yields (cid:98) se AKM = (cid:113)(cid:80) n ( (cid:80) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) ) ¨ g n (cid:12)(cid:12)(cid:80) (cid:96) e (cid:96) x ⊥ (cid:96) z (cid:96) (cid:12)(cid:12) , (84)where ¨ g n denotes the coefficients from regressing the demeaned instrument z (cid:96) − (cid:80) (cid:96) e (cid:96) z (cid:96) on all shares s (cid:96)n , without a constant. It thus remains to show that ¨ g n = ˆ g n . Note that (cid:88) (cid:96) e (cid:96) z (cid:96) = (cid:88) (cid:96) e (cid:96) (cid:88) n s (cid:96)n g n = (cid:88) n s n g n , (85)so that, with (cid:80) n s (cid:96)n = 1 , z (cid:96) − (cid:88) (cid:96) e (cid:96) z (cid:96) = (cid:88) (cid:96) s (cid:96)n g n − (cid:88) n s n g n = (cid:88) (cid:96) s (cid:96)n ˆ g n . (86)65his means that the auxiliary regression in Adão et al. (2019b) has exact fit and produces ¨ g n = ˆ g n .With controls, (cid:98) se equiv and (cid:98) se AKM differ only in respect to the construction of shock residuals.The former sets ˆ g n = g n − q (cid:48) n ˆ µ where ˆ µ = ( (cid:80) n s n q n q (cid:48) n ) − (cid:80) n s n q n g n is the coefficient vector fromthe auxiliary projection of the instrument in equation (12) on the control vector q n . The Adãoet al. (2019b) standard error formula sets ¨ g n to the N coefficients from regressing the residualizedinstrument z ⊥ (cid:96) on all shares s (cid:96)n , without a constant; note that to compute this requires L > N andthat the matrix of exposure shares s (cid:96)n is full rank.We establish the general asymptotic equivalence of (cid:98) se equiv and (cid:98) se AKM under assumptions whichmirror those developed by Adão et al. (2019b) to show that (cid:98) se AKM captures the conditional asymp-totic variance of ˆ β . This variance is conditioned on I L = (cid:8) { q n } n , { u (cid:96) , (cid:15) (cid:96) , η (cid:96) , { s (cid:96)n , π (cid:96)n } n , e (cid:96) } (cid:96) (cid:9) , where w (cid:96) = [ (cid:80) n s (cid:96)n q (cid:48) n , u (cid:48) (cid:96) ] (cid:48) and x (cid:96) = (cid:80) n s (cid:96)n π (cid:96)n g n + η (cid:96) ; we note that the resulting confidence intervalsare asymptotically valid unconditionally, since if P r ( β ∈ (cid:99) CI | I L ) = α then P r ( β ∈ (cid:99) CI ) = E (cid:104) E (cid:104) [ β ∈ (cid:99) CI | I L (cid:105)(cid:105) = α by the law of iterated expectations. We follow Adão et al. (2019b) inimplicitly treating the set of shares s (cid:96)n (and, for us, importance weights e (cid:96) ) as non-stochastic. Alongwith Assumption 7, we consider two additional conditions: Assumption A1 : The g n are mutually independent given I L , max n s n → , and max n s n (cid:80) n (cid:48) s n (cid:48) → ; Assumption A2 : E (cid:2) | g n | v | I L (cid:3) is uniformly bounded for some v > and (cid:80) (cid:96) e (cid:96) (cid:80) n s (cid:96)n Var [ g n | I L ] π (cid:96)n (cid:54) =0 . The support of π (cid:96)n is bounded, the fourth moments of (cid:15) (cid:96) , η (cid:96) , ˜ u (cid:96) , and ˜ q n existand are uniformly bounded, and (cid:80) (cid:96) e (cid:96) w (cid:96) w (cid:48) (cid:96) p −→ Ω ww for positive definite Ω ww . For γ = E [ (cid:80) (cid:96) e (cid:96) w (cid:96) w (cid:48) (cid:96) ] − E [ (cid:80) (cid:96) e (cid:96) w (cid:96) (cid:15) (cid:96) ] and ˆ γ = ( (cid:80) (cid:96) e (cid:96) w (cid:96) w (cid:48) (cid:96) ) − (cid:80) (cid:96) e (cid:96) w (cid:96) (cid:15) (cid:96) , ˆ γ p → γ .Here Assumption A1 corresponds to Assumption 2 in Adão et al. (2019b), and Assumption A2 includesthe relevant conditions from their Assumptions 4 and A.3. Together, Assumptions 7, A1, and A2 implyour Assumptions 3 and 4. Under these assumptions, Proposition A.1 of Adão et al. (2019b) showsthat √ r L ( ˆ β − β ) = N (cid:18) , r L V L π (cid:19) + o p (1) (87)where r L = 1 / (cid:0)(cid:80) n s n (cid:1) and V L = (cid:80) n ( (cid:80) (cid:96) e (cid:96) s (cid:96)n ε (cid:96) ) Var [ g n | I L ] , provided r L V L converges to a non-random limit. To establish the asymptotic validity of (cid:98) se AKM , i.e. that r L (cid:16)(cid:80) n ( (cid:80) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) ) ¨ g n − V L (cid:17) p −→ , Adão et al. (2019b) further assume that L ≥ N , the matrix of s (cid:96)n is always full rank, and additionalregularity conditions (see their Proposition 5).We establish r L (cid:16)(cid:80) n ( (cid:80) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) ) ˆ g n − V L (cid:17) p −→ for ˆ g n = g n − q (cid:48) n ˆ µ , and thus the asymptoticvalidity of (cid:98) se equiv under Assumptions 7, A1, and A2, without imposing the additional regularityconditions in Adão et al. (2019b)’s Proposition 5 and thereby allowing for N > L or for collinear66xposure shares. To start, write g ∗ n = g n − q (cid:48) n µ and decompose r L (cid:88) n (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) (cid:33) ˆ g n − V L = r L (cid:88) n (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ε (cid:96) (cid:33) g ∗ n − V L + r L (cid:88) n (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) (cid:33) − (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ε (cid:96) (cid:33) g ∗ n + r L (cid:88) n (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) (cid:33) (cid:0) ˆ g n − g ∗ n (cid:1) . (88)Adão et al. (2019b) show that the second term of this expression is o p (1) under our assumptions, usingthe fact (their Lemma A.3, again generalized to include importance weights) that for a triangular array { A L , . . . , A LL , B L , . . . , B LL , C L , . . . , C LN L } ∞ L =1 with E (cid:2) A L(cid:96) | {{ s (cid:96) (cid:48) n } n , e (cid:96) (cid:48) } (cid:96) (cid:48) (cid:3) , E (cid:2) B L(cid:96) | {{ s (cid:96) (cid:48) n } n , e (cid:96) (cid:48) } (cid:96) (cid:48) (cid:3) ,and E (cid:2) C Ln | {{ s (cid:96) (cid:48) n } n , e (cid:96) (cid:48) } (cid:96) (cid:48) (cid:3) uniformly bounded, r L (cid:88) (cid:96) (cid:88) (cid:96) (cid:48) (cid:88) n e (cid:96) e (cid:96) (cid:48) s (cid:96)n s (cid:96) (cid:48) n A L(cid:96) B L(cid:96) (cid:48) C Ln = O p (1) . (89)Here with D (cid:96) = ( z (cid:96) , w (cid:48) (cid:96) ) (cid:48) , θ = ( β, γ (cid:48) ) (cid:48) , and ˆ θ = ( ˆ β, ˆ γ (cid:48) ) (cid:48) we can write (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) (cid:33) = (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ε (cid:96) (cid:33) + 2 (cid:88) (cid:96) (cid:88) (cid:96) (cid:48) e (cid:96) e (cid:96) (cid:48) s (cid:96)n s (cid:96) (cid:48) n D (cid:48) (cid:96) (cid:16) θ − ˆ θ (cid:17) ε (cid:96) (cid:48) + (cid:88) (cid:96) (cid:88) (cid:96) (cid:48) e (cid:96) e (cid:96) (cid:48) D (cid:48) (cid:96) ( θ − ˆ θ ) D (cid:48) (cid:96) (cid:48) ( θ − ˆ θ ) , (90)and both D (cid:96) and ε (cid:96) have bounded fourth moments by the assumption of bounded fourth moments of (cid:15) (cid:96) , η (cid:96) , u (cid:96) , and p n in Assumption A2. Thus by the lemma r L (cid:88) n (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) (cid:33) − (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ε (cid:96) (cid:33) g ∗ n =2 (cid:16) θ − ˆ θ (cid:17) (cid:48) (cid:32) r L (cid:88) (cid:96) (cid:88) (cid:96) (cid:48) (cid:88) n e (cid:96) e (cid:96) (cid:48) s (cid:96)n s (cid:96) (cid:48) n g ∗ n D (cid:96) ε (cid:96) (cid:48) (cid:33) + (cid:16) θ − ˆ θ (cid:17) (cid:48) (cid:32) r L (cid:88) (cid:96) (cid:88) (cid:96) (cid:48) (cid:88) n e (cid:96) e (cid:96) (cid:48) s (cid:96)n s (cid:96) (cid:48) n g ∗ n D (cid:96) D (cid:48) (cid:96) (cid:48) (cid:33) (cid:16) θ − ˆ θ (cid:17) = (cid:16) θ − ˆ θ (cid:17) (cid:48) O p (1) + (cid:16) θ − ˆ θ (cid:17) (cid:48) O p (1) (cid:16) θ − ˆ θ (cid:17) , (91)which is o p (1) by the consistency of ˆ θ (implied by Assumptions 7, A1, and A2). Adão et al. (2019b)further show the first term of equation (88) is o p (1) , without using the additional regularity conditionsof Proposition 5.It thus remains for us to show the third term of (88) is also o p (1) . Note that ˆ g n = ( g n − q (cid:48) n ˆ µ ) = g ∗ n + ( q (cid:48) n (ˆ µ − µ )) − g ∗ n q (cid:48) n (ˆ µ − µ ) , (92)67o that r L (cid:88) n (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) (cid:33) (cid:0) ˆ g n − g ∗ n (cid:1) = r L (cid:88) n (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) (cid:33) ( q (cid:48) n (ˆ µ − µ ) − g ∗ n ) q (cid:48) n (ˆ µ − µ )= r L (cid:88) n (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ε (cid:96) (cid:33) ( q (cid:48) n (ˆ µ − µ ) − g ∗ n ) q (cid:48) n (ˆ µ − µ ) (93) + r L (cid:88) n (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ˆ ε (cid:96) (cid:33) − (cid:32)(cid:88) (cid:96) e (cid:96) s (cid:96)n ε (cid:96) (cid:33) ( q (cid:48) n (ˆ µ − µ ) − g ∗ n ) q (cid:48) n (ˆ µ − µ ) . Using the previous lemma, the first term of this expression is O p (1)(ˆ µ − µ ) since ε (cid:96) , p n , and g ∗ n havebounded fourth moments under Assumption A2. The second term is similarly O p (1)(ˆ µ − µ ) by thelemma and the decomposition used in equation (91). Noting that ˆ µ − µ = ( (cid:80) n s n q n q (cid:48) n ) − (cid:80) n s n q n g ∗ n p −→ under the assumptions completes the proof.This characterization also suggests that in finite samples it is likely that (cid:98) se equiv < (cid:98) se AKM . To seethis, consider versions of the two standard error formulas obtained under shock homoskedasticity (i.e.
Var [ g n | I L ] = σ g ): ˆ se equiv = (cid:112) ( (cid:80) n s n ˆ ε n ) ( (cid:80) n s n ˆ g n ) | (cid:80) n s n ¯ x ⊥ n g n | (94) ˆ se AKM = (cid:112) ( (cid:80) n s n ˆ ε n ) ( (cid:80) n s n ¨ g n ) (cid:12)(cid:12)(cid:80) (cid:96) e (cid:96) x ⊥ (cid:96) z (cid:96) (cid:12)(cid:12) , (95)which differ by a factor of (cid:112)(cid:80) n s n ˆ g n / (cid:80) n s n ¨ g n . When the SSIV controls have an exact shift-sharestructure, w (cid:96) = (cid:80) n s (cid:96)n q n , the share projection producing ¨ g n has exact fit such that one can represent ¨ g n = g n − q (cid:48) n ˆ µ AKM for some ˆ µ AKM . In this case the s n -weighted sum of squares of shock residualsis lower in our equivalent regression by construction of ˆ µ : (cid:80) n s n ˆ g n ≤ (cid:80) n s n ¨ g n (with strict inequalitywhen ˆ µ AKM (cid:54) = ˆ µ ). Similarly, when w (cid:96) instead contains controls that are included for efficiency onlyand are independent of the shocks, projection of z (cid:96) on the shares produces a noisy estimate of g n − ¯ g ,which again has a higher sum of squares. In practice, we find that the heteroskedastic standard errorsof Adão et al. (2019b) are also larger in the application in Section 5. B.4 Proposition A1
We consider each expectation in equation (15) in turn. For each n , write κ n ( g − n , ε (cid:96) , η (cid:96) ) = lim g n →−∞ y ( x ([ g n ; g − n ] , η (cid:96) ) , . . . , x M ([ g n ; g − n ] , η (cid:96)M ) , ε (cid:96) ) (96)68uch that s (cid:96)n e (cid:96) g n y (cid:96) = s (cid:96)n e (cid:96) g n κ n ( g − n , ε (cid:96) , η (cid:96) ) (97) + s (cid:96)n e (cid:96) g n (cid:90) g n −∞ ∂∂g n y ( x ([ γ ; g − n ] , η (cid:96) ) , . . . x M ([ γ ; g − n ] , η (cid:96)M ) , ε (cid:96) ) dγ. By as-good-as-random shock assignment, the expectation of the first term is E [ s (cid:96)n e (cid:96) g n κ n ( g − n , ε (cid:96) , η (cid:96) )] = E [ s (cid:96)n e (cid:96) E [ g n | s, e, g − n , ε, η (cid:96) ] κ n ( g − n , ε (cid:96) , η (cid:96) )] = 0 , (98)while the expectation of the second is E (cid:20) s (cid:96)n e (cid:96) g n (cid:90) g n −∞ ∂∂g n y ( x ([ γ ; g − n ] , η (cid:96) ) , . . . x M ([ γ ; g − n ] , η (cid:96)M ) , ε (cid:96) ) dγ (cid:21) = E (cid:20) s (cid:96)n e (cid:96) (cid:90) ∞−∞ (cid:90) g n −∞ g n ∂∂g n y ( x ([ γ ; g − n ] , η (cid:96) ) , . . . x M ([ γ ; g − n ] , η (cid:96)M ) , ε (cid:96) ) dγdF n ( g n | I ) (cid:21) = E (cid:20) s (cid:96)n e (cid:96) (cid:90) ∞−∞ ∂∂g n y ( x ([ γ ; g − n ] , η (cid:96) ) , . . . x M ([ γ ; g − n ] , η (cid:96)M ) , ε (cid:96) ) (cid:90) ∞ γ g n dF n ( g n | I ) dγ (cid:21) (99)where F n ( · | I ) denotes the conditional distribution of g n . Thus E [ s (cid:96)n e (cid:96) g n y (cid:96) ] = E (cid:20) s (cid:96)n e (cid:96) (cid:90) ∞−∞ ∂∂g n y ( x ([ γ ; g − n ] , η (cid:96) ) , . . . x M ([ γ ; g − n ] , η (cid:96) ) , ε (cid:96) ) µ n ( γ | I ) dγ (cid:21) = (cid:88) m E (cid:20)(cid:90) ∞−∞ s (cid:96)n e (cid:96) α (cid:96)m π (cid:96)mn ([ γ ; g − n ]) µ n ( γ | I ) ˜ β (cid:96)mn ( γ ) dγ (cid:21) (100)where µ n ( γ | I ) ≡ (cid:90) ∞ γ g n dF n ( g n | I ) . = ( E [ g n | g n ≥ γ, I ] − E [ g n | g n < γ, I ]) P r ( g n ≥ γ | I ) (1 − P r ( g n ≥ γ | I )) ≥ a.s. (101)Similarly E [ s (cid:96)n e (cid:96) g n α (cid:96)m x (cid:96)m ] = (cid:88) m E (cid:20)(cid:90) ∞−∞ s (cid:96)n e (cid:96) α (cid:96)m π (cid:96)mn ([ γ ; g − n ]) µ n ( γ | I ) dγ (cid:21) . (102)Combining equations (100) and (102) completes the proof, with ω (cid:96)mn ( γ ) = s (cid:96)n e (cid:96) α (cid:96)m µ n ( γ | I ) π (cid:96)mn ([ γ ; g − n ]) ≥ a.s. (103)69 .5 Proposition A2 To prove (21), we aggregate (20) across industries within a region using E (cid:96)n weights: y (cid:96) = ( β − β ) x (cid:96) + ε (cid:96) , (104)where ε (cid:96) = (cid:80) n s (cid:96)n ε (cid:96)n . The shift-share instrument z (cid:96) is relevant because E (cid:34)(cid:88) (cid:96) e (cid:96) x (cid:96) z (cid:96) (cid:35) = (cid:88) (cid:96) e (cid:96) E (cid:34)(cid:88) n s (cid:96)n (¯ πg n + η (cid:96)n ) · (cid:88) n (cid:48) s (cid:96)n (cid:48) g n (cid:48) (cid:35) = (cid:88) (cid:96),n e (cid:96) s (cid:96)n ¯ πσ g ≥ ¯ H L ¯ πσ g , (105)while exclusion holds because E (cid:34)(cid:88) (cid:96) e (cid:96) z (cid:96) ε (cid:96) (cid:35) = (cid:88) (cid:96) e (cid:96) E (cid:34)(cid:88) n s (cid:96)n ε (cid:96)n · (cid:88) n (cid:48) s (cid:96)n (cid:48) g n (cid:48) (cid:35) = 0 . (106)Thus by an appropriate law of large numbers, ˆ β = β − β + o p (1) .To study ˆ β ind , we aggregate (20) across regions (again with E (cid:96)n weights): y n = β x n − β (cid:88) (cid:96) ω (cid:96)n (cid:88) n (cid:48) s (cid:96)n (cid:48) x (cid:96)n (cid:48) + ε n , (107)for ε n = (cid:80) (cid:96) ω (cid:96)n ε (cid:96)n . The resulting IV estimate yields ˆ β ind − β = (cid:80) n s n y n g n (cid:80) n s n x n g n − β = (cid:80) n s n ( − β (cid:80) (cid:96) ω (cid:96)n (cid:80) n (cid:48) s (cid:96)n (cid:48) x (cid:96)n (cid:48) + ε n ) g n (cid:80) n s n x n g n . (108)The expected denominator of ˆ β ind is non-zero: E (cid:34)(cid:88) n s n x n g n (cid:35) = (cid:88) n s n E (cid:34)(cid:88) (cid:96) ω (cid:96)n (¯ πg n + η (cid:96)n ) g n (cid:35) = (cid:88) n s n ω (cid:96)n ¯ πσ = (cid:88) n E n E · E (cid:96)n E ¯ πσ = ¯ πσ , (109)70hile the expected numerator is E (cid:34)(cid:88) n s n (cid:32) − β (cid:88) (cid:96) ω (cid:96)n (cid:88) n (cid:48) s (cid:96)n (cid:48) x (cid:96)n (cid:48) + ε n (cid:33) g n (cid:35) = − β (cid:88) n,(cid:96) s n ω (cid:96)n s (cid:96)n ¯ πσ = − β H L ¯ πσ , (110)where the last equality follows because (cid:88) n,(cid:96) s n ω (cid:96)n s (cid:96)n = (cid:88) n,(cid:96) E n E E (cid:96)n E n E (cid:96)n E (cid:96) = (cid:88) n,(cid:96) E (cid:96) E (cid:18) E (cid:96)n E (cid:96) (cid:19) = (cid:88) n,(cid:96) e (cid:96) s (cid:96)n = H L . (111)Thus by an appropriate law of large numbers, ˆ β ind = β − β H L + o p (1) . (112) B.6 Proposition A3
By the appropriate law of large numbers, ˆ β = E [ (cid:80) (cid:96) E (cid:96) ( (cid:80) n s (cid:96)n y (cid:96)n ) ( (cid:80) n (cid:48) s (cid:96)n (cid:48) g n (cid:48) )] E [ (cid:80) (cid:96) E (cid:96) ( (cid:80) n s (cid:96)n x (cid:96)n ) ( (cid:80) n (cid:48) s (cid:96)n (cid:48) g n (cid:48) )] + o p (1)= (cid:80) (cid:96),n E (cid:96) s (cid:96)n π (cid:96)n σ n β (cid:96)n (cid:80) (cid:96),n E (cid:96) s (cid:96)n π (cid:96)n σ n + o p (1)= (cid:80) (cid:96),n E (cid:96)n s (cid:96)n π (cid:96)n σ n β (cid:96)n (cid:80) (cid:96),n E (cid:96)n s (cid:96)n π (cid:96)n σ n + o p (1) (113)while ˆ β ind = (cid:80) n E n y n g n (cid:80) n E n x n g n = E [ (cid:80) n E n ( (cid:80) (cid:96) ω (cid:96)n y (cid:96)n ) g n ] E [ (cid:80) n E n ( (cid:80) (cid:96) ω (cid:96)n x (cid:96)n ) g n ] + o p (1)= (cid:80) (cid:96),n E n ω (cid:96)n π (cid:96)n σ n β (cid:96)n (cid:80) (cid:96),n E n ω (cid:96)n π (cid:96)n σ n + o p (1)= (cid:80) (cid:96),n E (cid:96)n π (cid:96)n σ n β (cid:96)n (cid:80) (cid:96),n E (cid:96)n π (cid:96)n σ n + o p (1) . (114)71 .7 Proposition A4 By definition of ¯ ε n , ¯ ε n = (cid:80) (cid:96) e (cid:96) s (cid:96)n ( (cid:80) n (cid:48) s (cid:96)n (cid:48) ν n (cid:48) + ˇ ε (cid:96) ) (cid:80) (cid:96) e (cid:96) s (cid:96)n ≡ (cid:88) n (cid:48) α nn (cid:48) ν n (cid:48) + ¯ˇ ε n , (115)for α nn (cid:48) = (cid:80) (cid:96) e (cid:96) s (cid:96)n s (cid:96)n (cid:48) (cid:80) (cid:96) e (cid:96) s (cid:96)n and ¯ˇ ε n = (cid:80) (cid:96) e (cid:96) s (cid:96)n ˇ ε (cid:96) (cid:80) (cid:96) e (cid:96) s (cid:96)n . Therefore, Var [¯ ε n ] = (cid:88) n (cid:48) σ n (cid:48) α nn (cid:48) + Var [¯ˇ ε n ] ≥ σ ν α nn , (116)and max n Var [¯ ε n ] ≥ σ ν max n α nn . (117)To establish a lower bound on this quantity, observe that the s n -weighted average of α nn satisfies: (cid:88) n s n α nn = (cid:88) n s n (cid:80) (cid:96) e (cid:96) s (cid:96)n s n = H L . (118)Since (cid:80) n s n = 1 , it follows that max n α nn ≥ H L and therefore max n Var [¯ ε n ] ≥ σ ν H L . Since H L → ¯ H > , we conclude that, for sufficiently large L , max n Var [¯ ε n ] is bounded from below by any positive δ < σ ν ¯ H . B.8 Proposition A5
We prove each part of this proposition in turn.1. Expanding the exclusion condition yields: E (cid:34)(cid:88) (cid:96) e (cid:96) ε (cid:96) ψ (cid:96),LOO (cid:35) = (cid:88) (cid:96) E (cid:34) e (cid:96) ε (cid:96) (cid:88) n s (cid:96)n (cid:80) (cid:96) (cid:48) (cid:54) = (cid:96) ω (cid:96) (cid:48) n ψ (cid:96) (cid:48) n (cid:80) (cid:96) (cid:48) (cid:54) = (cid:96) ω (cid:96) (cid:48) n (cid:35) = (cid:88) (cid:96) e (cid:96) (cid:88) n s (cid:96)n (cid:80) (cid:96) (cid:48) (cid:54) = (cid:96) ω (cid:96) (cid:48) n E [ ε (cid:96) ψ (cid:96) (cid:48) n ] (cid:80) (cid:96) (cid:48) (cid:54) = (cid:96) ω (cid:96) (cid:48) n = 0 . (119)2. The assumption of part (1) is satisfied here, so E [ (cid:80) (cid:96) e (cid:96) ε (cid:96) ψ (cid:96),LOO ] = 0 . We now establish that E (cid:104) ( (cid:80) (cid:96) e (cid:96) ε (cid:96) ψ (cid:96),LOO ) (cid:105) → , which implies the exclusion condition (cid:80) (cid:96) e (cid:96) ε (cid:96) ψ (cid:96),LOO p → and thus72onsistency of the LOO SSIV estimator provided it has a first stage: E (cid:32)(cid:88) (cid:96) e (cid:96) ε (cid:96) ψ (cid:96),LOO (cid:33) = (cid:88) (cid:96) ,(cid:96) ,n ,n ,(cid:96) (cid:48) (cid:54) = (cid:96) ,(cid:96) (cid:48) (cid:54) = (cid:96) e (cid:96) e (cid:96) s (cid:96) n s (cid:96) n ω (cid:96) (cid:48) n (cid:80) (cid:96) (cid:54) = (cid:96) ω (cid:96)n ω (cid:96) (cid:48) n (cid:80) (cid:96) (cid:54) = (cid:96) ω (cid:96)n E (cid:2) ε (cid:96) ε (cid:96) ψ (cid:96) (cid:48) n ψ (cid:96) (cid:48) n (cid:3) ≤ (cid:88) ( (cid:96) ,(cid:96) ,(cid:96) (cid:48) ,(cid:96) (cid:48) ) ∈J ,n ,n e (cid:96) e (cid:96) s (cid:96) n s (cid:96) n ω (cid:96) (cid:48) n (cid:80) (cid:96) (cid:54) = (cid:96) ω (cid:96)n ω (cid:96) (cid:48) n (cid:80) (cid:96) (cid:54) = (cid:96) ω (cid:96)n · B → . (120)Here the second line used the first regularity condition, which implies that E (cid:2) ε (cid:96) ε (cid:96) ψ (cid:96) (cid:48) n ψ (cid:96) (cid:48) n (cid:3) =0 whenever there is at least one index among { (cid:96) , (cid:96) , (cid:96) (cid:48) , (cid:96) (cid:48) } which is not equal to any of the others,i.e. for all ( (cid:96) , (cid:96) , (cid:96) (cid:48) , (cid:96) (cid:48) ) (cid:54)∈ J .3. We show that under the given assumptions on s (cid:96)n , e (cid:96) , and ω (cid:96)n , the expression in (37) is boundedby N/L : (cid:88) ( (cid:96) ,(cid:96) ,(cid:96) (cid:48) ,(cid:96) (cid:48) ) ∈J ,n ,n e (cid:96) e (cid:96) s (cid:96) n s (cid:96) n ω (cid:96) (cid:48) n (cid:80) (cid:96) (cid:54) = (cid:96) ω (cid:96)n ω (cid:96) (cid:48) n (cid:80) (cid:96) (cid:54) = (cid:96) ω (cid:96)n = (cid:88) ( (cid:96) ,(cid:96) ,(cid:96) (cid:48) ,(cid:96) (cid:48) ) ∈J L ω (cid:96) (cid:48) n ( (cid:96) ) (cid:80) (cid:96) (cid:54) = (cid:96) ω (cid:96)n ( (cid:96) ) ω (cid:96) (cid:48) n ( (cid:96) ) (cid:80) (cid:96) (cid:54) = (cid:96) ω (cid:96)n ( (cid:96) ) = 1 L (cid:88) ( (cid:96) ,(cid:96) ,(cid:96) (cid:48) ,(cid:96) (cid:48) ) ∈J n ( (cid:96) (cid:48) )= n ( (cid:96) ) ,n ( (cid:96) (cid:48) )= n ( (cid:96) ) L n ( (cid:96) ) − L n ( (cid:96) ) −
1= 1 L (cid:88) n L n ( L n − L n − ≤ NL . (121)Here the second line plugs in the expressions for s (cid:96)n and e (cid:96) , and the third line plugs in ω (cid:96)n . Thelast line uses the fact that any tuple ( (cid:96) , (cid:96) , (cid:96) (cid:48) , (cid:96) (cid:48) ) ∈ J such that n ( (cid:96) (cid:48) ) = n ( (cid:96) ) and n ( (cid:96) (cid:48) ) = n ( (cid:96) ) has all four elements exposed to the same shock n . Moreover, it is easily verified that all of thesetuples have a structure ( (cid:96) A , (cid:96) B , (cid:96) A , (cid:96) B ) or ( (cid:96) A , (cid:96) B , (cid:96) B , (cid:96) A ) for any (cid:96) A (cid:54) = (cid:96) B exposed to the sameshock. Therefore, there are L n ( L n − of them for each n . Finally, L n L n − ≤ as L n ≥ .73 .9 Proposition A6 National industry employment satisfies E n = (cid:80) (cid:96) E (cid:96)n ; log-linearizing this immediately implies (46).To solve for g (cid:96)n , log-linearize (43), (44), and (45): ˆ E (cid:96) = φ ˆ W (cid:96) + ε (cid:96) , (122) g (cid:96)n = g ∗ n + ˆ ξ (cid:96)n − σ ˆ W (cid:96) , (123) ˆ E (cid:96) = (cid:88) n s (cid:96)n g (cid:96)n . (124)Solving this system of equations yields ˆ W (cid:96) = 1 σ + φ (cid:32)(cid:88) n s (cid:96)n (cid:16) g ∗ n + ˆ ξ (cid:96)n (cid:17) − ε (cid:96) (cid:33) (125)and expression (47). 74 Appendix Figures and Tables
Figure C1: Industry-level variation in ADH
First stage Reduced form - . - . - . . . . A v g R eg i ona l C h i na S ho ck R e s i dua l - . - . - . . . . A v g M anu f a c t u r i ng E m p l o y m en t G r o w t h R e s i dua l Notes: The figure shows binned scatterplots of shock-level outcome and treatment residuals, ¯ y ⊥ nt and ¯ x ⊥ nt , correspondingto the SSIV specification in column 3 of Table 4. The shocks, g nt , are residualized on year indicators (with the full-sample mean added back) and grouped into fifty weighted bins, with each bin representing around 2% of total shareweight s nt . Lines of best fit, indicated in red, are weighted by the same s nt . The slope coefficients equal . × − and − . × − , respectively, with the ratio (-0.267) equalling the SSIV coefficient in column 3 of Table 4. (1) (2) (3) (4)Unemployment growth 0.221 0.217 0.063 0.100(0.049) (0.046) (0.060) (0.083)NILF growth 0.553 0.534 0.098 0.126(0.185) (0.183) (0.133) (0.134)Log weekly wage growth -0.759 -0.607 0.227 0.133(0.258) (0.226) (0.242) (0.281)CZ-level controls ( w (cid:96)t )Autor et al. (2013) baseline (cid:8) (cid:8) (cid:8) (cid:8) Start-of-period mfg. share (cid:8)
Lagged mfg. share (cid:8) (cid:8) (cid:8)
Period-specific lagged mfg. share (cid:8) (cid:8)
Period-specific lagged 10-sector shares (cid:8)
Notes: This table extends the analysis of Table 4 to different regional outcomes in Autor et al. (2013): un-employment growth, labor force non-participation (NILF) growth, and log average weekly wage growth. Thespecifications in columns 1–4 are otherwise the same as those in Table 4. Exposure-robust standard errorsare computed using equivalent industry-level IV regressions and allow for clustering of shocks at the level ofthree-digit SIC codes . 76able C2: Alternative Exposure-Robust Standard Errors
Effects Pre-trends(1) (2) (3) (4) (5) (6)Coefficient -0.596 -0.489 -0.267 -0.252 -0.028 0.142Table 4 standard error (SE) (0.114) (0.100) (0.099) (0.136) (0.092) (0.090)State-clustered SE (0.099) (0.086) (0.086) (0.095) (0.105) (0.111)Adão et al. (2019b) SE (0.126) (0.116) (0.113) (0.152) (0.125) (0.121)Confidence interval [-1.059, [-0.832, [-0.568, [-0.709, [-0.254, [-0.047,with the null imposed -0.396] -0.309] -0.028] 0.012] 0.244] 0.418]
Notes: This table extends the analysis of Table 4 by reporting conventional state-clustered SE, the Adão etal. (2019b) SIC3-clustered standard errors, and confidence intervals based on the equivalent industry-level IVregression with the null imposed, as discussed in Section 4.1. For comparison we repeat the coefficient estimatesand exposure-robust standard errors from Table 4, also obtained by the equivalent shock-level regression. (1) (2) (3) (4)Mfg. emp. Unemp. NILF WagesCoefficient (1990s) -0.491 0.329 1.209 -0.649(0.266) (0.155) (0.347) (0.571)Coefficient (2000s) -0.225 0.014 -0.109 0.391(0.103) (0.083) (0.123) (0.288)
Notes: This table estimates the shift-share IV regression from column 3 of Table 4 and Table C1,for different outcomes, allowing the treatment coefficient to vary by period. This specificationuses two endogenous treatment variables (treatment interacted with period indicators) and twocorresponding shift-share instruments. Controls include lagged manufacturing shares interactedwith period dummies, period fixed effects, Census division fixed effects, and start-of-period con-ditions (% college educated, % foreign-born, % employment among women, % employment inroutine occupations, and the average offshorability index). Exposure-robust standard errors,obtained by the equivalent shock-level regressions, allow for clustering at the level of three-digitSIC codes. (1) (2) (3)Coefficient -0.238 -0.247 -0.158(0.099) (0.105) (0.078)Shock-level estimator 2SLS LIML GMMEffective first stage F -statistic 15.10 χ (7) overid. test stat. [ p -value] 10.92 [0.142] Notes: Column 1 of this table reports an overidentified estimate ofthe coefficient corresponding to column 3 of Table 4, obtained with atwo-stage least squares regression of shock-level average manufacturingemployment growth residuals ¯ y ⊥ nt on shock-level average Chinese im-port competition growth residuals ¯ x ⊥ nt , instrumenting by the growthof imports (per U.S. worker) in eight non-U.S. countries from ADH g nk , controlling for period fixed effects q nt , and weighting by averageindustry exposure s nt . Column 2 reports the corresponding limitedinformation maximum likelihood estimate, while column 3 reports atwo-step optimal GMM estimate. Standard errors, the optimal GMMweight matrix, and the Hansen (1982) χ test of overidentifying re-strictions all allow for clustering of shocks at the SIC3 industry grouplevel. The first-stage F -statistic is computed by a shift-share version ofthe Montiel Olea and Pflueger (2013) method described in AppendixA.9. (1) (2)Leave-one-out estimator 1.277 1.300(0.150) (0.124)Conventional estimator 1.215 1.286(0.139) (0.121) H heuristic 1.32 10.50Population weights (cid:8) Region-by-period observations 2,166
Notes: Column 1 replicates column 2 of Table 3 from Goldsmith-Pinkhamet al. (2019), reporting two SSIV estimators of the inverse labor sup-ply elasticity, with and without the LOO adjustment. Regions are U.S.commuting zones; periods are 1980s, 1990s, and 2000s; all specificationsinclude controls for 1980 regional characteristics interacted with periodindicators (see Goldsmith-Pinkham et al. (2019) for more details). Stan-dard errors allow for clustering by commuting zones. Column 1 uses1980 population weights, while column 2 repeats the same analysis with-out population weights. The table also reports the H heuristic for theimportance of the leave-one-out adjustment proposed in Appendix A.7(equation (40)). SSIV Shock-level IVExposure-Robust SE Robust SENull not Null Null not NullImposed Imposed Imposed Imposed(1) (2) (3) (4)
Panel A: Benchmark Monte-Carlo Simulation (a) Normal shocks 7.6% 5.2% 6.8% 5.0%(b) Wild bootstrap (benchmark) 8.0% 4.9% 14.2% 4.0%
Panel B: Higher Industry Concentration (c) /HHI = 50 /HHI = 20 /HHI = 10 Panel C: Smaller Numbers of Industries or Regions (f) N = 136 (SIC3 industries) 5.4% 4.5% 7.7% 4.3%(g) N = 20 (SIC2 industries) 7.7% 3.7% 7.9% 3.2%(h) L = 100 (random regions) 9.7% 4.5% N/A(i) L = 25 (random regions) 10.4% 4.3% N/A Notes: This table summarizes the result of the Monte-Carlo analysis described in Appendix A.10, reportingthe rejection rates for a nominal 5% level test of the true null that β ∗ = 0 . In all panels, columns 1 and 2are simulated from the SSIV design based on Autor et al. (2013), as in column 3 of Table 4, while columns3 and 4 are based on the conventional industry-level IV in Acemoglu et al. (2016). Column 1 uses exposure-robust standard errors from the equivalent industry-level IV and column 2 implements the version with the nullhypothesis imposed. Columns 3 and 4 parallel columns 1 and 2 when applied to conventional IV. In Panel A, thesimulations approximate the data-generating process using a normal distribution in row (a), with the variancematched to the sample variance of the shocks in the data after de-meaning by year, while wild bootstrap isused in row (b), following Liu (1988). Panel B documents the role of the Herfindahl concentration index acrossindustries, varying 1/HHI from 50 to 10 in rows (c) to (e), compared with 191.6 for shift-share IV and 189.7 forconventional IV. Panel C documents the role of the number of regions and industries. We aggregate industriesfrom 397 four-digit manufacturing SIC industries into 136 three-digit industries in row (f) and further into 20two-digit industries in row (g). In rows (h) and (i), we select a random subset of region in each simulation. SeeAppendix A.10 for a complete discussion. F -statistics as a Rule of Thumb: Monte-Carlo Evidence Number of Instruments1 5 10 25 50(1) (2) (3) (4) (5)
Panel A: SSIV
5% rejection rate 8.0% 8.9% 11.5% 15.0% 23.0%Median bias, % of std. dev. 0.3% 14.6% 28.3% 43.2% 72.1%Median first-stage F Panel B: Conventional Shock-Level IV
5% rejection rate 13.6% 13.9% 14.9% 17.7% 22.0%Median bias, % of std. dev. -0.3% 10.1% 27.1% 57.0% 80.2%Median first-stage F Notes: This table reports the result of the Monte-Carlo analysis with many weak instruments, described inAppendix A.10. Panel A is simulated from the SSIV design based on Autor et al. (2013), as in column 3 ofTable 4, while Panel B is based on the conventional industry-level IV in Acemoglu et al. (2016). The five columnsgradually increase the number of shocks J = 1 , , , , and , with only one shock relevant to treatment. Thetable reports the rejection rates corresponding to a nominal 5% level test of the true null that β ∗ = 0 , themedian bias of the estimator as a percentage of the simulated standard deviation, and the median first-stage F -statistic obtained via the Montiel Olea and Pflueger (2013) method (extended to shift-share IV in Panel A,following Appendix A.9). See Appendix A.10 for a complete discussion. eferences Acemoglu, Daron, David H. Autor, David Dorn, Gordon H. Hanson, and Brendard Price.
Journal of LaborEconomics
Adão, Rodrigo, Costas Arkolakis, and Federico Esposito.
NBER Working Paper 25544.
Adão, Rodrigo, Michal Kolesár, and Eduardo Morales.
The Quarterly Journal of Economics
Angrist, Joshua D., Kate Graddy, and Guido W. Imbens.
The Review of Economic Studies
Angrist, Joshua D., Guido W. Imbens, and Alan B. Krueger.
Journal of Applied Econometrics
Autor, David H., David Dorn, and Gordon H. Hanson.
American Economic Review
Bartik, Timothy J.
Who Benefits from State and Local Economic Development Policies?
Kalamazoo, MI: W. E. Upjohn Institute for Employment Research.
Cameron, A. Colin, and Douglas L. Miller.
The Journal of Human Resources
Goldsmith-Pinkham, Paul, Isaac Sorkin, and Henry Swift.
Hansen, Lars Peter.
Econometrica
Jaeger, David A, Joakim Ruist, and Jan Stuhler.
Jones, Ronald W.
Journal of PoliticalEconomy
Liu, Regina Y.
The Annals of Statis-tics
Montiel Olea, Jose Luis, and Carolin Pflueger.
Journal of Business and Economic Statistics
Nagar, Anirudh L.
Econometrica
Pflueger, Carolin E., and Su Wang.
StataJournal
White, Halbert.