[PDF] Addressing Spatially Structured Interference in Causal Analysis Using Propensity Scores

Abstract

Environmental epidemiologists are increasingly interested in establishing causality between exposures and health outcomes. A popular model for causal inference is the Rubin Causal Model (RCM), which typically seeks to estimate the average difference in study units' potential outcomes. An important assumption under RCM is no interference; that is, the potential outcomes of one unit are not affected by the exposure status of other units. The no interference assumption is violated if we expect spillover or diffusion of exposure effects based on units' proximity to other units and several other causal estimands arise. Air pollution epidemiology typically violates this assumption when we expect upwind events to affect downwind or nearby locations. This paper adapts causal assumptions from social network research to address interference and allow estimation of both direct and spillover causal effects. We use propensity score-based methods to estimate these effects when considering the effects of the Environmental Protection Agency's 2005 nonattainment designations for particulate matter with aerodynamic diameter less than 2.5 micrograms per cubic meter (PM2.5) on lung cancer incidence using county-level data obtained from the Surveillance, Epidemiology, and End Results (SEER) Program. We compare these methods in a rigorous simulation study that considers both spatially autocorrelated variables, interference, and missing confounders. We find that pruning and matching based on the propensity score produces the highest probability coverage of the true causal effects and lower mean squared error. When applied to the research question, we found protective direct and spillover causal effects.

Full PDF

AAddressing Spatially Structured Interference in CausalAnalysis Using Propensity Scores

Keith W. Zirkle a , Marie-Ab`ele Bind b , Jenise L. Swall c , David C. Wheeler a a Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA b Department of Statistics, Harvard University, Cambridge, MA, USA c Department of Statistical Sciences and Operations Research, Virginia CommonwealthUniversity, Richmond, VA, USA

Abstract

Environmental epidemiologists are increasingly interested in establishing causal-ity between exposures and health outcomes. A popular model for causal inferenceis the Rubin Causal Model (RCM), which typically seeks to estimate the averagedi ﬀ erence in study units’ potential outcomes. An important assumption underRCM is no interference; that is, the potential outcomes of one unit are not af-fected by the exposure status of other units. The no interference assumption isviolated if we expect spillover or di ﬀ usion of exposure e ﬀ ects based on units’proximity to other units and several other causal estimands arise. Air pollutionepidemiology typically violates this assumption when we expect upwind eventsto a ﬀ ect downwind or nearby locations. This paper adapts causal assumptionsfrom social network research to address interference and allow estimation of bothdirect and spillover causal e ﬀ ects. We use propensity score-based methods toestimate these e ﬀ ects when considering the e ﬀ ects of the Environmental Protec-tion Agency’s 2005 nonattainment designations for particulate matter with aero-dynamic diameter < . µ m (PM . ) on lung cancer incidence using county-leveldata obtained from the Surveillance, Epidemiology, and End Results (SEER) Pro-gram. We compare these methods in a rigorous simulation study that considersboth spatially autocorrelated variables, interference, and missing confounders. Weﬁnd that pruning and matching based on the propensity score produces the high-est probability coverage of the true causal e ﬀ ects and lower mean squared error.When applied to the research question, we found protective direct and spillovercausal e ﬀ ects. Keywords:

Causal inference, interference, propensity scores, spillover e ﬀ ects,air pollution epidemiology, environmental exposure Preprint submitted to Statistics in Medicine January 26, 2021 a r X i v : . [ s t a t . A P ] J a n esearch reported in this publication was supported by the National Insti-tute of Environmental Health Sciences of the National Institutes of Health underAward Number T32ES007334, by VCU Massey Cancer Center’s Cancer Preven-tion and Control 2018 Scholarship, by the O ﬃ ce of the Director, National Instituteof Health under Award Number DP5OD021412, and by the John Harvard Distin-guished Science Fellows Program, within the FAS Division of Science of HarvardUniversity. The content is solely the responsibility of the authors and does notnecessarily represent the o ﬃ cial views of the National Institutes of Health.

1. Introduction

Many public health studies aim to estimate causal relationships between someexposure or treatment and health-related outcomes; however, most epidemiologicstudies are observational. Observational data present a distinct, but well-studiedchallenge to address causality. The ideal paradigm for causal analysis is random-ization that, on average, leads to similar treated and non-treated subjects, and thetreatment e ﬀ ect can be estimated unbiasedly. In observational studies, the treat-ment or exposure is not typically randomized and must be treated as a condition-ally randomized experiment (1) . Generally, the treatment assignment mechanism ischaracterized to emulate randomization and usually involves adjusting for multi-ple covariates known to a ﬀ ect the observed treatment assignment. In other words,the treatment assignment Z should be independent of the potential outcomes Y ( Z )given covariates X . This is called ignorability and is expressed as P ( Z | Y (0) , Y (1) , X ) = P ( Z | X ) . (1)Rubin’s Causal Model (RCM) is the most popular paradigm for deﬁning causale ﬀ ects (2) . Several popular methods exist that utilize RCM, including balancingscores, outcome regression, and principal stratiﬁcation. For every unit, the poten-tial outcomes are considered, i.e. the unit’s outcome under exposure to the treat-ment and the unit’s outcome not under exposure to the treatment. The di ﬀ erencein these potential outcomes is considered a causal e ﬀ ect where all other thingsare held constant except for exposure to the treatment. RCM relies on several as-sumptions whose validity varies study to study, including the stable unit treatmentvalue assumption (SUTVA), ﬁrst proposed by Cox in 1958 (3,4) . SUTVA statesthat (i) there are not multiple forms of a treatment (called “consistency”) and (ii)a unit’s exposure or treatment does not a ﬀ ect the outcome of other units (called2no interference”). Until recently, causal inference methods either struggled to orintroduced restrictive assumptions to handle data that may have interference.We typically expect interference in studies when the outcome or exposure isrelated to other observations. Readily available examples of interference includeinfectious diseases (5) , social networks (6,7) , educational programs (8) , crime preven-tion strategies (9) , and air pollution (10) . This interference cannot be ignored with-out making erroneous inferences (11,12) . More often, the interference is more thana nuisance. Di ﬀ erent causal estimands arise under interference, including a di-rect e ﬀ ect and an indirect (spillover) e ﬀ ect. This spillover e ﬀ ect between units orgroups of units can be of research interest (13) .In the last decade, causal methods that handle interference have grown (8,9,10,11) .Most methods make assumptions about the interference structure to allow for rel-evant estimands. These assumptions are not always appropriate for our applica-tion. Hong and Raudenbush proposed unit-speciﬁc interference as low or high;they assumed potential outcomes are a function of a subject’s treatment status andthe interference status (8) . Sobel proposed a partial interference assumption wheresubjects are grouped into classes and no interference is assumed between subjectsin di ﬀ erent classes (12) . Zigler et al. further extended Sobel’s partial interferenceassumption into an assignment group interference assumption (AGIA) where lo-cations within a regulated area do not a ﬀ ect locations in an unregulated area (10) .They found AGIA may not hold if interference is expected to occur between ex-posed and control units. Random e ﬀ ects with spatial structure have also beenused to account for interference (14) , but may be insu ﬃ cient for capturing spilloverbetween subjects.We motivate the methods presented in this paper by considering air pollutionstudies. Spatial interference exists in air pollution applications by air pollution’svery nature. We expect upwind events to a ﬀ ect downwind locations and, thus,expect interference to be directional based on wind patterns. We expect to ﬁnd adirect causal e ﬀ ect of air pollution (or air pollution regulation) in an area exposeddirectly to treatment (e.g. regulation) and may expect a spillover e ﬀ ect in down-wind areas. Researchers are increasingly interested in these e ﬀ ects because theyallow for more strategic initiatives and program implementations and also mayprovide stronger evidence for regulatory control (15) .In this paper, we propose methods for estimating air pollution interference byelucidating spatial relationships and adapting certain causal assumptions to a spa-tial setting where there is expected spillover of an exposure. We characterize botha treatment assignment and interference mechanism and adjust for confoundersfor both using propensity score modeling. Our assumptions di ﬀ er from previous3ork in air pollution epidemiology, speciﬁcally Zigler et al., by allowing interfer-ence to occur between exposed and control units and also adjusting for confound-ing of the interference mechanism (10) . To evaluate our approach, we compare themethods when applied to simulated datasets where we introduce both spatiallyautocorrelated covariates and treatment assignment. We ﬁnally apply the methodsto an air pollution dataset.

2. Methods

Consider a study population of i = , . . . , N contiguous areal units at a cer-tain time. Let Y i be a count outcome such as number of deaths per unit. Denotetreatment Z i ∈ { , } where Z i = i receives the treatment and Z i = Z denote the treatment allocation for all units. We illustrate theseconcepts in Figure 1 for a population of N = Z = { , , , } for the study area representedin panel A. Under no interference, each unit has two potential outcomes Y i ( Z i ) ex-pressed as [ Y i (1) , Y i (0)]. We only ever observe one of these outcomes dependenton the observed value of Z i . We deﬁne a causal e ﬀ ect as Y i (1) − Y i (0). In Figure 1,this would be akin to ﬁnding the di ﬀ erence in outcomes for area i = Y i ( Z ) = Y i ( Z i , Z − i )where Z i is the individual treatment for unit i and Z − i is the treatment vector forall other units in the population. We may reasonably expect interference is limitedto a spatial neighborhood structure. We denote this structure through an N × N network matrix A where element A i j > i and j . We note that A i j may not equal A ji if the relationship is directional. Forexample, we expect air pollution interference to vary with or to be a function ofwind direction so that pollution in unit i blows downwind to unit j , but not viceversa.Let N i represent all units that may a ﬀ ect unit i , which we call a neighborhood.While other units in a population may a ﬀ ect unit i , this is computationally inten-sive and challenging to interpret so we often restrict analysis to only ﬁrst-order,or also called adjacent, neighbors. Let N i represent the neighborhood, and let A represent a network matrix for these neighbors. Then A i j > i and j areﬁrst-order neighbors, or immediately adjacent. Denote the size of the neighbor-hood of unit i as m i = (cid:80) nj = A i j = | N i | . 4 B C D Figure 1: An example study population of N = i = i =

3, wedeﬁne the individual direct causal e ﬀ ect to be the contrast in outcomes between panels A and CSimilarly, for area i =

3, we deﬁne individual indirect causal e ﬀ ects to be the contrast in outcomesbetween panels A and B and between panels C and D. The di ﬀ erence between these two individualindirect causal e ﬀ ects is the individual treatment status Z ( Z = Z = In Figure 1, each area has three ﬁrst-order neighbors: N i = contains areas 2, 3,and 4, N i = contains units 1, 3, and 4 as ﬁrst-order neighbors, etc. We may express A = I × because every area is adjacent with each other. For all the areas, m i = N i , Z N i = [ z j ] j ∈ N i ∈ { , } m i . The set of treated units in neighborhood N i are N z i , and weconsider interference to be a function of N z i . We will call this a functional neigh-borhood interference assumption, which is similar to assumptions proposed byseveral others (9,16,8,17,18) . We deﬁne our potential outcomes as Y i ( Z i , f ( Z N i )) . (Deﬁnition 1)In other words, a unit-level potential outcome is dependent both on its own treat-ment assignment and some function of the treatment assigned to its ﬁrst-orderneighbors. In Figure 1, we may write the potential outcomes for area 3 in panel Aas Y ( Z = , f ( { , , } )) where { , , } represents the treatment allocation of areas1, 2, and 4 (the neighbors of area 3).Until this point, we assume that interference occurs between units i and j basedon proximity, e.g. neighborhood structure. We also suggest that the relationshipcan be directional, e.g. based on wind direction in air pollution studies. Fromthis, we argue that there exists both a treatment assignment mechanism and aninterference mechanism, which can be characterized as the network A . Deﬁningan interference mechanism is not well established in the literature and will becontextual to the data and the expected “spillover” of a treatment or exposure.Information on the spatial neighborhood structure may inform this network aswell as the observed outcomes (6) .Using typical causal inference estimand language where we estimate e ﬀ ectseither from randomized experiment data or meet other causal assumptions neces-sary for causal analysis with observational data, we deﬁne a direct causal e ﬀ ectas: DE i ( z ) = Y i ( Z i = , f ( Z N i = z )) − Y i ( Z i = , f ( Z N i = z )) (2)where z represents an observed treatment vector. Kao termed this a “primary”treatment causal e ﬀ ect in social inﬂuence networks (6) . In Figure 1, we wouldcompare the outcomes for area 3 in panels A and C to estimate an individualdirect causal e ﬀ ect. The average direct causal e ﬀ ect can be deﬁned as: DE = N N (cid:88) i = DE i ( z ) . (3)We deﬁne an indirect causal e ﬀ ect as: IE i ( z ) = Y i ( Z i = z , f ( Z N i = z )) − Y i ( Z i = z , f ( Z N i = )) . (4)6his indirect e ﬀ ect is deﬁned based on which k th-order neighbors are assumedto inﬂuence unit i . First-order neighbors are areas immediately adjacent to area i , second-order neighbors are areas adjacent to the ﬁrst-order neighbors of area i ;this logic continues for k th order neighbors. We assume the direct and indirecte ﬀ ects to be additive, so that Z i = z does not a ﬀ ect indirect e ﬀ ect estimation. Wealso note that this indirect e ﬀ ect is not comparable to the indirect e ﬀ ects estimatedin mediation analysis. Kao termed these “peer inﬂuence” causal e ﬀ ects (6) . InFigure 1, we can estimate the individual indirect e ﬀ ect by comparing the outcomesfor area 3 between panels A and B or between panels C and D, where area 3’streatment is held constant in both these comparisons. We deﬁne the average totalindirect e ﬀ ect of a unit’s k th-order neighbors as: IE = N N (cid:88) i = Y i ( Z i = z , f ( Z N i = z )) − Y i ( Z i = z , f ( Z N i = )) . (5)If we consider an outcome Y that is the number of outcome events occurringin an area, e.g. number of deaths, number of cancer cases, etc., then we can modelthe individual potential outcomes for area i as: Y i ( Z i , f ( Z N i )) ∼ Poisson( θ i E i ) (6)where θ i is the relative risk of event Y happening in unit i and E i is the expectednumber of events in unit i . We further model the relative risk as:log( θ i ) = τ · Z i + γ · f ( Z N i ) + P (cid:88) p = β p X ip + (cid:15) i (7)where X represent P confounders and (cid:15) ∼ N (0 , σ ) is iid random e ﬀ ect. We notethat τ = DE and γ = IE . We also assume no interaction between the direct andspillover e ﬀ ects. In RCM without interference, we typically assume P ( Z | Y (0) , Y (1) , X ) = P ( Z | X ).In observational studies, it is often assumed that covariates X are su ﬃ cient to ad-just for confounding in the treatment-outcome relationship. In other words, thepotential outcomes are independent of the treatment assignment conditional onthe covariates. With interference, we must assume that the potential outcomes7 ( z ) are independent of the treatment assignment Z given the covariates Z and the inﬂuence network A . That is, Y ( z ) (cid:121) Z | X , A . (A1)We make the additional assumption that the potential outcomes are independentof the inﬂuence network conditional on the covariates, or Y ( z ) (cid:121) A | X . (A2)Kao called these assumptions the unconfounded treatment assignment and net-work inﬂuence assumptions under network interference (6) . We collectively callassumptions A1 and A2 ignorability under interference.Practically, ignorability under interference entails deﬁning both a treatmentassignment mechanism and an interference mechanism. Covariates relevant toboth mechanisms need to be identiﬁed. Characterizing a treatment assignmentmechanism is well established in the literature (19) , but the interference mechanismis a novel concept and should be viewed as “the vehicle through which exposuresto peer treatments are delivered” (6) . That is, the interference mechanism is howthe treatment status of one unit a ﬀ ects the other units. The covariates necessaryto meet unconfoundedness for both mechanisms will depend on the data and re-search question, and the relevant covariates may overlap. We partition the relevantcovariates X as [ X Z , X A ] to reﬂect this.Rubin ﬁrst outlined the use of Bayesian inference for estimating causal ef-fects (20) . Under the Bayesian framework, potential outcomes are treated as ran-dom variables and can be partitioned as [ Y obs , Y mis ] for the observed and missingpotential outcomes. Bayesian imputation is used to compute the posterior distri-bution for the missing potential outcomes and causal estimands of interest. Ignor-ability under interference simpliﬁes modeling and inference by dropping Z and A from the posterior distribution. While ignorability under interference allows us to estimate causal e ﬀ ects withinterference, we must still address other challenges to estimating causal e ﬀ ectsthat occur in observational studies such as covariate imbalances. Because treat-ment is not randomized in an observational study, we expect treated units to dis-play di ﬀ erent characteristics than the control units. This covariate imbalance mustbe addressed in order to estimate causal e ﬀ ects. Propensity scores are typically es-timated as predicted values from a logistic regression model where the outcome is8 binary variable indicating which units receive treatment Z . The logistic regres-sion model should adjust for the covariates X relevant to the treatment assignmentand interference mechanisms. Treated and control units are likely to be similarif they have the same propensity score value. By controlling for covariates rel-evant both to the treatment assignment and to the interference mechanism, weexpect units to be similar both in the probability of receiving treatment and in theprobability of spillover exposure based on those covariate values. We illustratethis by assuming that the treatment assignment Z is independent of the potentialoutcomes given the covariates [ X Z , X A ] (A1), or P ( Z = | Y ( Z ) , X A , X Z ) = P ( Z | X A , X Z ),and that the inﬂuence network A is independent of the potential outcomes giventhe covariates X A (A2), or P ( A | Y ( Z ) , X A ) = P ( A | X A ).Then it follows that P ( Z = | Y ( Z ) , X A , X Z , PS ( X A , X Z )) = P ( Z = | PS ( X A , X Z ))where PS ( X A , X Z ) is the propensity score that accounts for the covariates [ X Z , X A ].The proof extends from Imbens and Rubin (19) and uses iterated expectations: P ( Z = | Y ( Z ) , X A , X Z , PS ( X A , X Z )) = E Z ( Z | Y ( Z ) , X A , X Z , PS ( X A , X Z )) = E [ E Z { Z | Y ( Z ) , X A , X Z , PS ( X A , X Z ) }| Y ( Z ) , PS ( X A , X Z )] = E [ E Z { Z | X A , X Z , PS ( X A , X Z ) }| Y ( Z ) , PS ( X A , X Z )] = E [ E Z { Z | PS ( X A , X Z ) }| Y ( Z ) , PS ( X A , X Z )] = E Z | PS ( X A , X Z ) } = P ( Z = | PS ( X A , X Z )) . In other words, given a vector of covariates that ensure unconfoundedness [ X Z , X A ],adjusting for the treatment di ﬀ erence in a propensity score model removes allbiases associated with di ﬀ erences in the covariates because, conditional on thepropensity score, the treatment assignment should be independent of the covari-ates. Rosenbaum and Rubin showed that the di ﬀ erence in outcomes for treatedand control units at the same propensity score value is an unbiased average treat-ment e ﬀ ect (21) . Propensity score use to address confounding in spatial settings hasbeen limited (22) . 9nce propensity scores are estimated for a study population, we consider threemethods to ensure balance between treated and control units: i.) pruning; ii.)grouping; and iii.) matching. Pruning involves omitting units where there is nopropensity score overlap; in other words, the omitted units have no comparablefeatures to the comparison group. Grouping, or stratifying, involves identifyingsubgroups of treated and control units with similar propensity score values afterpruning (21,8) . Matching occurs when units are matched 1 : n where n is the num-ber of matched units to a particular unit (23) , e.g. we may match one treated unitto two control units so the analysis dataset will include 3 × n units ( n treated unitsand 2 × n control units). In matching, both n and a caliper need to be set. Thecaliper avoids matching dissimilar units beyond a speciﬁed threshold. We expectthat if the propensity score model includes the covariates X necessary for ignora-bility under interference, then we may account for pertinent di ﬀ erences betweentreated and control units and improve the credibility of estimated causal e ﬀ ects.We compare propensity score methods to traditional outcome regression of the logrelative risk. In outcome regression, we keep all units from the analysis dataset,and we control for X as confounders with linear adjustment terms. We argue thatinference from data with limited overlap may extrapolate causal e ﬀ ects with noobserved information for certain confounders for certain units. Bayesian inferencemay handle some of this uncertainty, but propensity score methods reduce biased-ness in causal estimates. The tradeo ﬀ is that as our study population changes andwe shrink our sample size to reduce covariate imbalance between units, causalestimates can only be interpreted within that study population.

3. Simulation Studies

We used a simulation study to compare the e ﬀ ectiveness of propensity scoremethods to outcome regression when estimating causal e ﬀ ects. We performed thesimulation under a variety of scenarios that we describe below. In general, wesimulated data for N =

500 areal units based on real counties at the geographiccenter of the contiguous United States (Figure 2). We conducted 100 simulationsfor each scenario. For each of the 100 simulated datasets, we speciﬁed nine co-variates X = X , . . . , X for the 500 ﬁxed units. For each covariate, we randomlygenerated a mixture distribution of treated and control units using package distr in R with overlap speciﬁed in Table 1 (24,25) . We computed the empirical over-lap between the treated and control distributions for each covariate by generating10,000 values from each speciﬁed distribution using package overlapping . We re-port the average percentage overlap over the 1,000 generations in Table 1 (26) . Wealso incorporated spatial autocorrelation into each covariate by computing a si-multaneous autoregressive (SAR) weights matrix using function invIrW in library spdep and multiplying each generated variable by the matrix with a speciﬁedspatial dependence parameter (Table 1) (27,28) . Table 1: Simulated covariates where r and p are the failure and success probability rates for thenegative binomial distribution respectively, α and β are the shape and scale parameters for thegamma distribution respectively, µ and σ are the mean and standard deviation parameters for thenormal distribution respectively, and π is the probability parameter for the Bernoulli distribution. Covariate TreatedDistribution (X T ) ControlDistribution (X C ) SpatialAutocorrelation EmpiricalOverlap (%) X NB( r = . , p = .

5) NB( r = , p = .

4) 0.90 84.42 X Γ ( α = , β = Γ ( α = , β =

1) 0.70 79.66 X N( µ = , σ =

1) N( µ = , σ =

2) 0.50 95.62 X N( µ = , σ =

2) N( µ = , σ = X Γ ( α = , β = Γ ( α = , β =

3) 0.75 59.33 X Bern( π = .

5) Bern( π = .

9) 0.30 74.17 X N( µ = , σ = .

2) N( µ = , σ = X Γ ( α = , β = Γ ( α = , β =

2) 0.50 77.80 X Γ ( α = , β =

2) N( α = , β =

1) 0.60 48.32

Five of the simulated covariates, X , . . . , X , served as confounders for thetreatment assignment Z . We represent the mixture distributions for the confoundersas X All ; the treated distributions for the confounders as X T ; and the control distri-butions for the confounders as X C . We generated treatment Z as a binary variablewhere we modeled logit(Pr( Z i = In our simulations, we compared two treatment assignment models. First, wespeciﬁed Z as a Bernoulli trial and modeled the logit of Z using the confounders’mixture distributions X All , which we modeled aslogit(Pr( Z i = = X All , i α + u i + − . · e i (8)where α = [0 . , . , . , − . , − . u i ∼ Normal(0 ,

1) andwas multiplied by the SAR matrix with autocorrelation 0.9 to mimic unobserved11 igure 2: The study area of N =

500 areal units where the red units indicate the treated units fromone simulation. spatial confounding in the treatment assignment and the random e ﬀ ect e i iid ∼ Normal( − . , Z using the covariate distributionsof the treated units, X T , instead of X All , expressedlogit(Pr( Z i = = X T α + u i − . · e i . (9)12here all other values remained the same as in 8. Using only the distributionsof the confounders of the treated units to specify Z reﬂects real-life observationalstudies.As we assumed the outcome Y to be Poisson-distributed, we modeled the logrelative risk of θ i and assumed a population risk of 0.001 so that the expectedcount in each area was E i = . × Population i . We speciﬁed the outcome withand without interference. Under no interference, we generated relative risks foreach unit as log( θ i ) = τ · Z i . (10)Under interference, we included a spillover e ﬀ ect γ and speciﬁed the relative riskmodel as log( θ i ) = τ · Z i + γ · f ( Z − i ) (11)where f ( Z − i ) was the proportion of neighbors receiving treatment Z . In all in-stances, we speciﬁed the true direct causal e ﬀ ect as τ = ﬀ ect as γ = The aim of our simulation study was to evaluate the impact of several factorson estimating direct and spillover e ﬀ ects. In our simulations that had no inter-ference, i.e. we modeled the log relative risk of Y without interference (equation10), we aimed to only estimate the direct e ﬀ ect, ˆ τ . In simulations with interfer-ence, we modeled the log relative risk of Y as in equation 11 and estimated bothˆ τ and the spillover e ﬀ ect, ˆ γ , except in cases where we sought to assess the e ﬀ ectof ignoring interference. We compared estimation across four methods. The ﬁrstof these methods was standard outcome regression as previously described. Theremaining models were the propensity score-based methods: pruning, stratiﬁca-tion, and matching. For the propensity score-based methods, we consider twodi ﬀ erent models: (1) the propensity score model (PSM) and (2) the potential out-comes model, which we model as log ˆ θ i . No PSM was used for standard outcomeregression.In order to achieve ignorability under interference, the relevant covariates forthe treatment assignment mechanism consisted of the confounders X Z = X , . . . , X .The relevant information for the interference mechanism, X A , was simply ﬁrst-order proximity. We included this information explicitly when we modeled thespillover e ﬀ ect by only estimating the spillover e ﬀ ect in ﬁrst-order neighboringunits of other treated units. This is true for how we speciﬁed interference in thetrue outcome (equation 11). 13or the outcome regression method (which we label “Full” in the results, Ta-bles 4 and 5), we modeled the potential outcomes for the entire sample ( N = θ i ) = ˆ τ · Z i + P (cid:88) p = β p X ip . (12)Alternatively, for simulations that aimed to capture interference, we modeled thelog relative risk as log( ˆ θ i ) = ˆ τ · Z i + ˆ γ · f ( Z i ) + P (cid:88) p = β p X ip . (13)We determined which covariates X p to adjust for in the outcome regression modelby assessing the balance for a particular covariate between the treated and controlunit using an unequal variance Student’s t -test. We always adjusted based on thecovariates’ mixture distribution because in real life we cannot readily separate thetreated and control units’ covariate distributions.For the pruning method (labeled “Pruned” in Tables 4 and 5) , we estimatedthe propensity score for each unit via logistic regression aslogit(Pr( Z i = = X ˆ α (14)where the P covariates contained in X varied depending on the simulation setup(described in next section). We iteratively dropped units where there was nopropensity score overlap and re-ﬁt a PSM until all units were within the propen-sity score overlap. We used these N pruned units to model log( ˆ θ i ) where we adjustedonly for the covariates that were not balanced within the N pruned units using themixture distributions.For the stratiﬁed propensity score method (labeled “Stratiﬁed” in Tables 4 and5), we used the same N pruned units and categorized the treated and control unitsinto quintiles based on their propensity score. We adjusted for the strata in thepotential outcomes model along with any covariates that were not balanced withinthe strata, i.e. we modeledlog( ˆ θ i ) = ˆ τ · Z i + ˆ γ · f ( Z i ) + P (cid:88) p = ˆ β p X ip + (cid:88) d = ˆ υ d , (15)14here ˆ υ d is the linear adjustment for each propensity score strata. To determinecovariate selection, we calculated Student’s t -test for each covariate in each strataand evaluated whether the | max( t ) | > .

96 where t is the observed test statistic;if so, we included that particular covariate in the potential outcomes model. Wechose quintiles due to frequent use in the literature (21,29) .In the matching method (labeled “Matched” in Tables 4 and 5), we matchedtreated and control units 1:1 with a caliper of 1.0 standard deviations of the propen-sity score. That is, we matched only units where the di ﬀ erence in propensity scorevalues did not exceed more than one standard deviation of the PSM. Because notevery unit can be matched this way, the sample size was reduced to N matched .We estimated all models using Markov chain Monte Carlo (MCMC) in Open-BUGS and the R computing environment (30) . For each model, we ran two chainsfor 20,000 iterations with a burn-in of 10,000 and thinned the remaining sampleto every 15 iterations. We provided non-informative normal priors centered at 0with variances equal to 1,000 for each parameter such thatˆ τ, ˆ γ, ˆ β p , ˆ υ d ∼ Normal(0 , . (16)We assessed the chains’ convergence using the Gelman-Rubin Diagnostic (31) . Ifthe chains did not converge, then we iteratively ran the chains for another 20,000iterations with the same burn-in and thinning parameter until convergence wasachieved. We assessed the mean posterior estimate for the direct and spillovere ﬀ ects and the variance of the distribution in order to calculate the mean squarederror (MSE) and probability coverage for the estimates relative to the known truevalues. We considered the optimal method to be the one with highest probabilitycoverage and minimal MSE. Our simulations explored a variety of possible scenarios when estimating causale ﬀ ects with and without interference (Table 2). In scenarios 1 and 3a, we exploredhow interference a ﬀ ects estimation of the direct and spillover e ﬀ ects by modelingthe log relative risk of outcome Y without interference using equation 10. For allother scenarios, we incorporated interference by modeling Y as speciﬁed in equa-tion 11. We further explored the impact of di ﬀ erent treatment assignment modelsin scenarios 1, 2a, and 2b compared to all other scenarios. In scenarios 1, 2a, and2b, we generated treatment assignment Z as in equation 8 where the confounders’mixture distribution informed the treatment assignment generation. In the otherscenarios, only the confounder distributions of the treated units contributed to Z ,15s speciﬁed in equation 9. In all cases, the PSM relies on the confounders’ mixturedistributions as in practice. Table 2: Outline of scenarios simulated. Each column represents factors that were altered betweenscenarios.

Scenario TreatmentAssignmentCovariates InterferencePresent? Estimands Spatial RandomE ﬀ ect Estimated? AdditionalCovariatesin PSM? MissingConfoundersin PSM?1 X All

No ˆ τ No - -2a X All

Yes ˆ τ No - -2b X All

Yes ˆ τ, ˆ γ No - -3a X T No ˆ τ No - -3b X T Yes ˆ τ, ˆ γ No - -4a X T Yes ˆ τ Yes - -4b X T Yes ˆ τ, ˆ γ Yes - -5 X T Yes ˆ τ, ˆ γ No X , . . . , X -6a X T Yes ˆ τ, ˆ γ No - Omit X X T Yes ˆ τ, ˆ γ No - Omit X and X From this paradigm, scenario 1 may be described as a straightforward causalanalysis with no interference, considered as a baseline case. Scenario 3a wasa similar straightforward causal analysis with no interference, but illustrated thee ﬀ ects of the treatment assignment Z being speciﬁed as it truly is in nature, i.e.treatment assignment is informed only by the confounders’ treated distributionsversus the observed mixtures.Scenarios 2a and 2b described estimating the direct e ﬀ ect when there is inter-ference and the analysis ignores it (2a) and when the analysis assumes or tackles it(2b). In both these scenarios, we assumed Z was generated using the confounders’mixture distribution. Scenario 3b was arguably the most straightforward and real-istic scenario to practice where there was interference, the analysis assumed it, andthe treatment assignment was informed by only the confounders’ treated distribu-tions. Scenario 3a contrasted from 3b by having no interference and the analysisnot assuming it.We evaluated the use of spatial random e ﬀ ects in scenarios 4a and 4b by in-cluding such an e ﬀ ect in the potential outcomes model. We were interested inthe random e ﬀ ects’ ability to discern a spillover e ﬀ ect when we did and did notspecify a spillover e ﬀ ect explicitly in our potential outcomes model (equation 12versus 13, or 4a versus 4b). In both scenarios, we had speciﬁed the true outcomewith interference. We speciﬁed the spatial random e ﬀ ect to have an intrinsic con-ditionally autoregressive prior (CAR) (32) , which conditions a variable on its neigh-bors’ values using a spatial adjacency matrix. The CAR random e ﬀ ect ζ may be16peciﬁed as: ζ i | ζ − i , W , ω ∼ N (cid:32) (cid:80) Ni = w ji ζ i (cid:80) Ni = w ji , ω (cid:80) Ni = w ji (cid:33) (17)where W is a N × N adjacency matrix with w i j = i and j are spatiallycontiguous and ω is the variance of the spatial random e ﬀ ects ζ .In scenario 5, we considered the addition of covariates beyond the true con-founders X T . Speciﬁcally, we introduced X , . . . , X to the outcome regressionmodel of the log relative risk and to the PSMs for each method. In contrast, weconsidered the omission of true confounders from the analysis in scenarios 6a and6b. In 6a, we omitted X from the outcome regression and PSM, and in 6b, weomitted both X and X . As expected, we found that the average sample size shrunk as we moved fromthe full dataset to the pruned to the matched dataset in our simulation study. Forthe full method, which used the entire simulated sample data, the analysis datasetwas consistently the 500 counties. However, we found the pruned and stratiﬁeddatasets to be around 495 units as shown in Table 3. The matched datasets showedthe most drastic size reduction. The lowest sample sizes observed were in scenar-ios 6a and 6b ( N Matched = .

50) and scenario 5 ( N Matched = . ﬃ culty of matching by reducing the propensity score overlap be-tween treated and control units by either using less covariates (scenarios 6a and6b) or incorporating irrelevant covariates (scenario 5). This contributed to moredissimilar treated and control units. Table 3: The average sample size for each method from the 100 simulations for each scenario.

Sample Size 1 2a 2b 3a 3b 4a 4b 5 6a 6b N Pruned N Matched

We found that the stratiﬁed and matched methods consistently outperformedthe full and pruned methods when estimating direct and spillover e ﬀ ects simul-taneously. In scenario 1, which contained no interference and only attempted toestimate a direct e ﬀ ect, we found the higher probability coverages in the prunedand matched methods (91% and 90% respectively, Table 4); however, the matchedmethod had the lowest MSE (1.012 · × E − , Table 5). That is, the credible inter-val for the direct e ﬀ ect ˆ τ contained the true value of τ for 90 of the 100 simulations17sing the matched dataset. In comparison, the credible interval for the direct e ﬀ ectonly contained the true value for 65 of the 100 simulations using the full dataset.In scenario 2a, we introduced interference in the dataset, but made no attemptto estimate a spillover e ﬀ ect. In turn, we observed that we entirely missed esti-mating the true direct e ﬀ ect (0% probability coverage across all methods), thoughthe MSE remained relatively low (Table 5). In scenario 2b, we estimated both thedirect and spillover e ﬀ ects when interference was present. The full method hadmarginal probability coverage when estimating the direct e ﬀ ect (26%), but foundmost success in the stratiﬁed and matched methods. The credible interval for thedirect e ﬀ ect contained the true value of τ

94 times out of the 100 simulations forboth methods (Table 4). Similarly, the credible interval for the spillover e ﬀ ecthad 91% and 92% probability coverage for the stratiﬁed and matched methods,respectively.In scenarios 3a onward, we found that covariate imbalance played a role insuccessful estimation of the direct and spillover e ﬀ ects. The full method con-tinued to underperform the other methods even when there was no interference,with lower probability coverage for capturing the true direct e ﬀ ect and higherMSE (scenario 3a in Tables 4 and 5). When interference was introduced, the fullmethod severely failed: the credible interval for ˆ τ only captured the true directe ﬀ ect 33 of the 100 simulations in scenario 3b. The full method was slightly bet-ter at estimating the spillover e ﬀ ect; the credible intervals for ˆ γ contained the truespillover e ﬀ ect for 52 of the 100 simulations.Spatial random e ﬀ ects, as demonstrated in scenarios 4a and 4b, did not im-prove estimation for the direct and spillover e ﬀ ects. When interference was presentand a spillover e ﬀ ect was not included in the potential outcomes model (scenario4a), the probability of capturing the true direct e ﬀ ect was 50% or lower for allmethods (Table 4) and the MSE was ﬁve-fold higher when compared to all otherscenarios (Table 5). For the pruning scenario, the MSEs may be highest becausewe dropped observations but still needed to estimate coe ﬃ cients for imbalancedcovariates. Even when the spillover e ﬀ ect was included in the potential outcomesmodel (scenario 4b), we compared scenario 4b to 3b to ﬁnd that the model witha spatial random e ﬀ ect did not perform as well as the model with no spatial ran-dom e ﬀ ect. In fact, the matched method with a spatial random e ﬀ ect captured thetrue direct e ﬀ ect 69 out of 100 times compared to the matched method with nospatial random e ﬀ ect, which captured the true direct e ﬀ ect 91 out of 100 times.This was also true when estimating the spillover e ﬀ ect: 64% probability coveragecompared to 94% probability coverage.Incorporating extraneous covariates into the PSM increased the probability18overage for the direct e ﬀ ect when estimating the direct and spillover e ﬀ ects inthe full method (scenario 5 compared to scenario 3b). The stratiﬁed method hadthe highest probability coverage for estimating both e ﬀ ects, however (Table 4).We observed slightly higher MSE for estimating these e ﬀ ects when the PSM con-tained irrelevant confounders (Table 5).The MSE increased even more in scenarios 6a and 6b as we omitted relevantconfounders ( X in scenario 6a and both X and X in scenario 6b). However, westill found high probability coverage (above 84%) for capturing the true direct andspillover e ﬀ ects in every method except the full method in these scenarios. Table 4: The probability coverage for each simulation scenario. Each probability corresponds tothe number of times out of 100 simulations that the method resulted in a 95% credible interval forthe direct e ﬀ ect τ and spillover e ﬀ ect γ , when appropriate, that contained the true known value.We bolded the highest probability coverage corresponding to each method for each parameter. Parameter Method 1 2a 2b 3a 3b 4a 4b 5 6a 6b

Direct Full 65.0 0 26.0 70.0 33.0 5.0 49.0 55.0 44.0 40.0Pruned

Spillover Full - - 51.0 - 52.0 - 45.0 71.0 61.0 67.0Pruned - - - 93.0 - - 37.0

Matched - - 92.0 - 94.0 - 64.0 88.0

Table 5: The mean squared error (MSE) corresponding to estimates for the direct e ﬀ ect andspillover e ﬀ ect, when appropriate, for each simulation scenario. We report MSE scaled by 10 × E − except where Ψ , which we report as unscaled. We bolded the lowest MSE corresponding to eachmethod for each parameter. . Parameter Method 1 2a 2b 3a 3b 4a Ψ Ψ Direct Full 8.386

Spillover Full - - 7.526 - 7.245 - 3.594 3.880 6.169 5.645Pruned - - 1.382 - 2.105 - 56.888 1.396 1.953 1.232Stratiﬁed - - 1.322 - 1.123 - - - 8.532 1.518 . Data Application We applied the methods to publicly available data from the Surveillance, Epi-demiology, and End Results (SEER) program and Environmental Protection Agency(EPA) to estimate the causal e ﬀ ects of air pollution regulation on lung cancerincidence. SEER is a cancer registry database covering approximately 26% ofthe U.S. population with data registries including California, New Mexico, Iowa,Kentucky, Louisiana, Connecticut, New Jersey, and Georgia along with severalmetropolitan areas. The database contains information on year and county of can-cer diagnosis and patient demographics (33) .In 1970, the Clean Air Act allowed the EPA to regulate air emissions andestablish National Ambient Air Quality Standards (NAAQS) intended to protectpublic health and manage hazardous pollutant emissions (34) . In 1990, under theClean Air Act Ammendments (CAAA), the EPA began to regulate air qualityemissions and designate areas in violation of the NAAQS as “nonattainment”,prompting these areas to take actions to improve air quality (10) . In 1997 the EPAupdated its standards for ambient concentrations of particulate matter with aero-dynamic diameter < . µ m (PM . ). In 2005, the EPA designated certain countiesas nonattainment or otherwise “attainment” (or “unclassiﬁable” if there was notenough data to classify (35) ). The EPA made these designations based on ninefactors including historic air quality, population density and urbanization, tra ﬃ cand commuting patterns, meterology, and geography (36) . They used available datafrom 2001 onward.Several of these regulated areas overlap with the study area of the SEER Pro-gram. Our analysis focused speciﬁcally on lung cancer cases in California, Geor-gia, and Kentucky (Figure 3) based on the ratio of nonattainment counties to allcounties respective to each state. Multiple studies have established signiﬁcant as-sociations between air pollution exposure and lung cancer incidence (37,38,39) . Weevaluated lung cancer cases reported between 2005 and 2013. We aimed to es-timate the causal e ﬀ ects of the 1997 (PM . ) nonattainment designations on lungcancer incidence starting in 2005 following nonattainment designation. Basedon previous research on particulate matter with aerodynamic diameter < µ m and ozone (O ) (10) , we expected interference and sought to estimate direct causale ﬀ ects in the nonattainment counties and spillover causal e ﬀ ects in surroundingcounties.We imputed PM . values from the Downscaler Model (40) . The DownscalerModel allowed for ﬁner scale predictions rather than simply observed values from20 igure 3: The study area of N =

337 counties in California, Georgia, and Kentucky where countiesdesignated as nonattainment by the U.S. Environmental Protection Agency for 1997 National Am-bient Air Quality Standards for particulate matter with aerodynamic diameter < . µ m (PM . )are shaded red. the EPA Air Quality System, because the Downscaler Model utilizes both ob-served measurements and the Community Multi-Scale Air Quality Model (CMAQ).Because the data are reported as a mean value with a standard error at the censustract level, we treated PM . as a random Gaussian process with mean and stan-dard error speciﬁed by the Downscaler Model. For each ZIP code, we generatedten values in each year and aggregated to the county level using the mean. Theﬁnal analysis was performed using the county-level aggregated data.We obtained additional data from multiple sources including nonattainmentdesignations from the EPA Nonattainment Areas for Criteria Pollutants (“GreenBook”) (35) , population demographics from the U.S. Census, meteorological vari-ables from the Automated Surface Observing System (ASOS), elevation at countycentroids from ESRI, and county-level smoking rates estimated from the Centersfor Disease Control and Prevention’s Behavioral Risk Factor Surveillance Sys-tem (41) . We considered lung cancer cases reported between 2002 and 2004 andASOS climate metrics observed prior to 2005 to be baseline measurements.21e obtained ASOS weather data from 705 weather stations located acrossthe three states in our study area in addition to the bordering states for edge cor-rection. The stations collected data with varying frequency (typically more thanonce a day) and with no consistent coverage across the study area. For each year,we averaged observed measurements for wind speed (knots), wind direction (de-grees), relative humidity, dew point and air temperatures (Fahrenheit), visibility(in miles), barometric pressure, and hourly precipitation (inches). We omitted ob-servations where wind exceeded 50 knots, relative humidity was observed over100%, and visibility was observed greater than 11 miles, believing such observa-tions to be errors. To average circular wind direction, we converted wind speedand direction into U and V cosine direction vectors. For each variable, we interpo-lated observations at weather station sites across each state individually (Califor-nia, Georgia, and Kentucky) using inverse distance weighting (IDW). We chosethe optimal power and neighborhood size by comparing sums of squared residu-als after 5-fold cross-validation. We also compared IDW to ordinary kriging andfound that IDW had lower sum of squared residuals. We assigned each countythe interpolated value at the county’s centroid. Finally, we reverted the U and Vvectors to wind direction and speed (which we report in meters per second). Thisinterpolation method was shown e ﬀ ective for directional data by Gumiaux (42) .The full dataset contained information on the 337 counties in the three states,of which 46 were designated nonattainment. To build the PSM for each method,we used Student’s t -test for covariate selection and also considered practical knowl-edge of nonattainment designation and PM . dispersion. While we made noinference from the PSM, we limited the number of covariates in the model toavoid overﬁtting and used mean values across the study period, i.e. we averagedvalues between 2002 and 2004 for pre-treatment periods and between 2005 and2013 for post-treatment. The ﬁnal PSM contained the covariates listed in Table 6.Of these, we identiﬁed confounders relevant to the treatment assignment to be acounty’s ozone nonattainment designations made in 2004 based on 8-hour lev-els; mean PM . between 2002 and 2004 and between 2005 and 2013, separately;mean temperature; percent urban housing units; mean relative humidity; eleva-tion of the country centroid; population density in 2000; and the mean amountof time spent traveling to work by workers aged 16 or older (minutes). We alsoincluded dewpoint temperature in 2000 and 2013 as possible confounders basedon imbalances found in exploratory analyses. For the interference mechanism,we identiﬁed mean wind speed between 2005 and 2013 and elevation as relevantcovariates that could contribute to spillover. We chose these covariates to meetthe ignorability under interference assumption. We believe the treatment (PM2.522onattainment designation) and the network inﬂuence mechanism to be uncon-founded based on subject matter expertise, including the variables that determinednonattainment designations and the factors that inﬂuence downwind spillover.In the full model, we used data from all 337 counties and modeled the logrelative risk of lung cancer while adjusting for ozone nonattainment designation,mean PM . , mean temperature, mean relative humidity, population density, per-centage urban housing, mean work travel time, dewpoint temperature in 2000 and2013, smoking prevalence, and the mean number of lung cancer cases 2002-2004.To prune the data, we ﬁt a logistic regression PSM that contained all con-founders listed in Table 6. Within the pruned dataset, we checked for imbalancesin the covariates before modeling the log relative risk in a potential outcomesmodel. We used the pruned dataset additionally for the stratiﬁed potential out-comes model. We identiﬁed three subgroups based on the PSM, splitting the dataat the 86th and 96.5th quantiles of the PSM. We chose these quantiles in order tohave an equal number of nonattainment counties within each strata. The potentialoutcomes model contained two indicators to correspond to these three subgroups.We again checked for covariate balance within these strata and adjusted for the ap-propriate confounders. Finally, we created two matched datasets from the pruneddataset. In the ﬁrst matched dataset, we matched 1:1 with a caliper of 0.25 stan-dard deviation of the PSM. In the second dataset, we matched 1:1 with a caliperof one standard deviation. Once more, we controlled for relevant confounders inthe potential outcomes model for both matched datasets.For all of the potential outcomes models, we included a direct e ﬀ ect term, ˆ τ ,corresponding to nonattainment designation and a spillover term, ˆ γ , correspond-ing to the proportion of nonattainment counties surrounding a county. We alsoincluded a temporal term for each year with an exchangeable prior to capturechanges in lung cancer risk across time not captured by confounders or nonattain-ment designation. We provided normal priors to all estimated parameters withgamma hyperpriors for the variance terms with shape and scale parameters of 0.1and 0.1, respectively. For each model, we ran two MCMC chains for the samelengths, burn-in, and thinning as the simulation studies until we achieved conver-gence. Overall, nonattainment designation was static based on the 2005 designa-tions. For the ﬁnal report models, we also assumed the spillover structure to bethe proportion of adjacent nonattainment counties bordering a county.We considered several spillover structures for this application. We initiallyassumed spillover would be directional based on wind patterns. We used the in-terpolated annual wind bearing for each county to identify ﬁrst- and second-orderneighboring counties of nonattainment counties whose centroids fell within 30 ◦

23f the wind bearing. We expanded this threshold to 45 ◦ to compare whether thisbetter captured spillover counties. We also evaluated two simpler spillover struc-tures: an indicator for whether a county bordered a nonattainment county and theproportion of nonattainment counties bordering a county. We compared models’deviance information criteria (DIC) (43) to assess which spillover structure best ﬁtthe matched data. Models with lower DIC are considered better candidates fora dataset. In our evaluation, we also included models with spillover terms corre-sponding to a mixture of structures. We found that the proportion of nonattainmentcounties bordering a county as a spillover structure resulted in a model with thelowest DIC. We also found the weight for this structure to be near 1 in models thatincluded this structure in weighted mixtures. Table 6: Descriptive statistics for confounders that we deemed necessary for ignorability under in-terference. We report mean (standard deviation) for all continuous covariates except ozone nonat-tainment designation (*), which we report as the proportion of counties designated nonattainmentfor 8-hour ozone in 2004. For continuous covariates, we report the p-value from a two-sampleStudent’s t -test. For ozone nonattainment designation, we report a p-value from a two proportion z -test. Covariate Nonattainment Counties Control Counties P-value

Mean PM . µ g / m ) 11.4 (2.18) 11.4 (2.04) 0.995Mean PM . < < / second) 0.8 (0.55) 0.7 (0.54) 0.237Mean Temperature 2005-2013 ( ◦ F) 60.88 (3.16) 59.70 (5.01) 0.03Mean Relative Humidity (%) 69.0 (7.18) 72.1 (4.01) 0.006Dewpoint Temperature (2000) ( ◦ F) 46.4 (3.65) 48.3 (4.75) 0.002Dewpoint Temperature (2013) ( o F) 46.3 (5.31) 48.6 (6.38) 0.011Migration Rate (Persons) 7.4 (2.18) 7.0 (3.79) 0.301Urban Housing (%) 76.5 (23.45) 34.3 (29.24) < < Similar to the simulation study results, the sample size for the applied datasetshrank between methods. The pruned dataset contained 239 counties (Figure 4).We adjusted for ozone nonattainment designation, mean relative humidity, mean24ork travel time, elevation, and smoking prevalence in the pruned model alongwith a temporal term. For the stratiﬁed model, we used the same 239 countiesand adjusted for mean relative humidity, dewpoint temperature in 2013, smok-ing prevalence, and ozone nonattainment designation in the potential outcomesmodel. (a) California (b) Georgia(c) Kentucky

Figure 4: The pruned dataset contains N =

239 counties from California (a), Georgia (b), andKentucky (c) (outlined in black). Counties colored red are retained in the pruned dataset andconsidered nonattainment while blue counties are retained and considered control.

In the two matched models, we continued to control for mean PM . (a) California (b) Georgia(c) Kentucky Figure 5: The matched dataset contains N =

30 counties from California (a), Georgia (b), andKentucky (c) (outlined in black). Counties colored red are retained in the pruned dataset andconsidered nonattainment while blue counties are retained and considered control. a) California (b) Georgia(c) Kentucky Figure 6: The matched dataset contains N =

36 counties from California (a), Georgia (b), andKentucky (c) (outlined in black). Counties colored red are retained in the pruned dataset andconsidered nonattainment while blue counties are retained and considered control.

In general, we found a direct and spillover e ﬀ ect that trended toward protec-tive across the models (near or below 1 on the relative risk scale, Table 7). Wefound a signiﬁcant direct e ﬀ ect in the full and stratiﬁed models. According to thefull model, counties with nonattainment designation have between 0.90 and 0.91times the risk of lung cancer incidence as control counties after adjusting for rel-evant covariates and the proportion of nonattainment counties bordering a county.Similarly, the stratiﬁed model identiﬁed counties with nonattainment designationas having between 0.93 and 0.96 times the risk of lung cancer incidence compared27o counties without nonattainment designation.For the spillover e ﬀ ect, the risk trended toward protective except in the case ofthe full model (posterior mean: 1.66, 95% CI: 1.61, 1.71). These results suggest adecreased risk of lung cancer incidence in counties bordering counties with nonat-tainment designation. This decreased relative risk would only be fully achievedif a county is entirely surrounded by nonattainment counties (of which only ﬁvecounties in our entire dataset are); otherwise, this risk must be multiplied by theproportion of a county’s neighbors that are designated nonattainment. The totalcausal e ﬀ ect is the sum of the direct and spillover e ﬀ ects. We remind the readerthat the true e ﬀ ects could be not protective and misleading if we have not ad-dressed all confounders, i.e., violated our ignorability under interference assump-tion. More research needs to occur to validate these ﬁndings. Further researchmay also consider including second- or higher-order neighbor spillover e ﬀ ects.However, due to counties’ larger size compared to oft-used ZIP codes or censustracts or block groups in spatial analyses, ﬁrst-order neighbor counties may cap-ture substantial portions of PM . ’s range.Because the matched sample sizes were so small, we conducted an additionalsimulation study to assess how sample size may a ﬀ ect causal e ﬀ ect estimation.Using the template of scenario 3b, which we believed to best approximate theapplied analysis, we sampled 30 units from the complete matched dataset in eachof the 100 simulations. We found probability coverages of 88% and 93% and MSEof 9 . × − and 0.00019, respectively, for the direct and spillover e ﬀ ects. Basedupon the results of this simulation study, the observed results in our applicationare credible. Table 7: Results from application to SEER data. We report posterior mean estimates (95% credibleinterval) on the relative risk scale.

Estimate Full Model Pruned Model Stratiﬁed Model Matched Model(Caliper 0.25) Matched Model(Caliper 1)N

337 239 239 30 36

Direct E ﬀ ect Spillover E ﬀ ect . Discussion In this chapter, we have adapted new assumptions to spatial settings so thatwe may estimate causal e ﬀ ects in the presence of spatial interference. We haveillustrated how these assumptions may be met when addressing causal questions inair pollution epidemiology and shown how to apply these assumptions to a speciﬁcdataset. We also proposed methods to accurately and precisely estimate direct andspillover causal e ﬀ ects in the presence of interference and have demonstrated thescenarios in which these methods may outperform each other.In general, we argue that researchers should characterize both a treatmentassignment and interference mechanism when dealing with data under interfer-ence (6) . We also encourage using propensity scores to handle imbalances in con-founders, which are inherent in observational studies (23,29) . The use of propensityscores in spatial causal analysis has been limited. We know of only one otherpaper that utilizes propensity scores to address spatial confounding (22) , and in thatcase distance between study units was utilized within the propensity score model.We have also shown that including spatial random e ﬀ ects in a potential out-comes model is inadequate for addressing spatial confounding. Even when thespillover is explicitly and correctly modeled, we found higher bias and variancein the estimates for direct and spillover e ﬀ ects if the analysis even captured thetrue parameter value at all. Spatial random e ﬀ ects have been used previously toaddress spatial interference (14) , but our simulations show this may lead to mises-timation of direct and spillover e ﬀ ects. Estimating multiple spatially structuredterms, including a treatment e ﬀ ect, spillover e ﬀ ect, and spatial random e ﬀ ect,may be too computationally challenging as shown in scenarios 4a and 4b. Sim-ilar results were previously demonstrated by Hodges and Reich when studyingassociations (44) , but were not explored in a causal context. They found that in-cluding spatially-correlated error terms may contribute to collinearity and inﬂateerror variances. Further research may explore how including both a spillover ef-fect corresponding to an asymmetric, directional neighborhood matrix and a CARrandom e ﬀ ect based on ﬁrst-order adjacency may interact together. We suspect,however, that CAR random e ﬀ ects are not so ﬂexible that they could capture thestructure of the indirect e ﬀ ect as we speciﬁed in our simulations (i.e., the propor-tion of treated neighbors). We also did not address in this paper how well CARrandom e ﬀ ects estimate spatial dependence in data that was generated using aSAR, though using a binary weights matrix as we did is more similar to a CARstructure than row-standardized weights.We also found that incorporating covariates that are not necessarily confounders,29s may happen when the treatment assignment mechanism is not fully understood,does not severely hinder estimation of causal e ﬀ ects (scenario 5). Future mayresearch may include a simulation where true confounders are included in theoutcome model; we omitted this detail for simplicity’s sake, yet illustrated howconfounders may be modeled for estimation purposes. We also demonstrated thatsuccessful estimation is still possible when known confounders are omitted aslong as covariate imbalances are appropriately addressed (scenarios 6a and 6b).In summary, rigorous methods may supplement missing confounders that theoret-ically violate ignorability under interference.Further research may extend our simulation designs to covariates that are spa-tially correlated with one another. This is the ﬁrst simulation design we know ofto incorporate spatially autocorrelated covariates when assessing methods, but wealso know that in reality covariates are often correlated with one another. This pa-per does not address how that may a ﬀ ect analysis. We also do not investigate howthe size of an areal unit may a ﬀ ect results, though covariates such as populationdensity and land mass may proxy for this information. We recognize that the sizeof an area, speciﬁcally the proportion of area bordering neighboring areas, mayimpact spillover e ﬀ ects and the accuracy of spillover e ﬀ ect estimation.We experienced some of these challenges in the applied dataset. Overall, wefound evidence of protective direct and spillover e ﬀ ects from nonattainment desig-nation at the county level. This suggests that county-level actions to reduce PM . had a causal e ﬀ ect on lung cancer incidence, and further policy and action shouldbe considered either in these already designated areas or in other areas with ele-vated lung cancer risk. It is important to note that the true causal e ﬀ ects may not beprotective if we have not addressed all confounders, i.e., violated our ignorabilityunder interference assumption. More research needs to be conducted to validatethese ﬁndings. We also recognize that lung cancer can have an extended latency,and we may see stronger evidence if this analysis considered exposure periods of adecade or longer (39) . We theorize that the spillover e ﬀ ect may be stronger than thedirect e ﬀ ect because counties receiving spillover from the nonattainment policymay have a history of lower PM . levels (if they are not nonattainment countiesthemselves) and are beneﬁtting from county-level actions taken to reduce PM . .The non-signiﬁcant direct e ﬀ ects may be attributed to the fact that PM . levels de-creased in both control and nonattainment counties at nearly identical rates acrossthe study period (Figure 7). This fact remained true for all the study populations.We especially note the uptick in PM . from 2012 to 2013, most notable in thematched datasets. Possible explanations include PM . -reduction actions taper-ing o ﬀ with time and PM . output sources relocating to other counties not being30ctively regulated.We additionally recognize that we made a structural assumption about thespillover from nonattainment counties into other counties. While we exploredother spillover structures, we remained surprised that the proportion of nonattain-ment counties surrounding a county explained the most variation in lung cancerrisk. Spatial analysis in meteorology remains its own challenge. Further workmay be done to better model wind direction and PM . trajectories, which couldenhance spillover estimation in this application (42) .In summary, the methods outlined in this paper motivate further research inair pollution epidemiology. We are especially hopeful to apply these methods toestimate e ﬀ ects of air pollution in China on the United States’ West Coast. How-ever, we also believe these methods may be applied beyond air pollution studiesto a broader class of observational studies with interference. Researchers increas-ingly ask questions that involve spatially-related units. While nuances and expertknowledge change from study to study, we believe assumptions and analyticalapproaches may cater to speciﬁc problems and answer causal questions wherewe can model the relationship between units and the e ﬀ ect of a treatment or anexposure on an outcome. The data that support the ﬁndings of this study are available from the corre-sponding author upon reasonable request. Year P M . µ g / m Groups

Control NA (a) Year P M . µ g / m Groups

Control NA (b) Year P M . µ g / m Groups

Control NA (c) Year P M . µ g / m Groups

Control NA (d) Figure 7: PM . levels ( µ g / m ) for nonattainment (gray) and control (black) counties across thestudy period in the a.) full dataset ( N = N = N = N =

6. ReferencesReferences [1] M. A. Hern´an, J. M. Robins, Estimating causal e ﬀ ects from epidemiologicdata, Journal of Epidemiologic Community Health 60 (7) (2006) 578–586.322] D. Rubin, Estimating causal e ﬀ ects of treatments in randomized and nonran-domized studies, Journal of Educational Psychology 66 (1974) 688.[3] D. Cox, Planning Experiments, New York, NY: Wiley, 1958.[4] D. B. Rubin, Bayesianly justiﬁable and relevant frequency calculations forthe applies statistician, The Annals of Statistics 12 (4) (1984) 1151–1172.[5] M. E. Halloran, C. J. Struchiner, Causal Inference in Infectious Diseases,Epidemiology 6 (2) (1995) 142–151.[6] E. K. Kao, Causal Inference Under Networl Interference: A Frameworkfor Experiments on Social Networks, Dissertation, Department of Statistics,Harvard University.[7] P. Toulis, E. Kao, Estimation of Causal Peer Inﬂuence E ﬀ ects, Proceedingsof the 30th International Conference on International Conference on Ma-chine Learning 28 (2013) 1489–1497.[8] G. Hong, S. W. Raudenbush, Evaluating Kindergarten Retention Policy,Journal of the American Statistical Association 101 (475) (2006) 901–910.[9] N. Verbitsky-Savitz, S. Raudenbush, Causal Inference Under Interference inSpatial Settings: A Case Study Evaluating Community Policing Program inChicago, Epidemiologic Methods 1 (1) (2012) 107–130.[10] C. Zigler, F. Dominici, Y. Wang, Estimating causal e ﬀ ects of air quality reg-ulations using principal stratiﬁcation for spatially correlated mutlivariate in-termediate outcomes, Biostatistics 13 (2) (2012) 289–302.[11] E. J. T. Tchetgen, T. J. VanderWeele, On causal inference in the presence ofinterference, Statistical Methods in Medical Research 21 (1) (2010) 55–75.[12] M. E. Sobel, What Do Randomized Studies of Housing Mobility Demon-strate?: Causal Inference in the Face of Interference, Journal of the Ameri-can Statistical Association 101 (476) (2006) 1398–1407.[13] M. G. Hudgens, M. E. Halloran, Toward Causal Inference with Interference,Journal of the American Statistical Association 103 (482) (2008) 832–842.3314] C. M. Zigler, C. Choirat, F. Dominici, Impact of National Ambient Air Qual-ity Standards nonattainment designations on particulate pollution and health,Epidemiology (2017) 1–30.[15] C. M. Zigler, F. Dominici, Point: Clarifying Policy Evidence With Potential-Outcomes Thinking—Beyond Exposure-Response Estimation in Air Pollu-tion Epidemiology, American Journal of Epidemiology 180 (12).[16] D. L. Sussman, E. M. Airoldi, Elements of estimation theory for causal ef-fects in the presence of network interference, ArXiv e-prints arXiv:1702.03578 .[17] S. Athey, D. Eckles, G. Imbens, Exact P-values for Network Interference,ArXiv e-prints arXiv:1506.02084 .[18] C. Manski, Identiﬁcation of treatment response with social interactions,Econometrics Journal 16 (1). doi:10.1111/j.1368-423X.2012.00368.x .[19] G. Imbens, Causal inference for statistics, social, and biomedical sciences :an introduction, 2015.[20] D. B. Rubin, Bayesian Inference for Causal E ﬀ ects: The Role of Random-ization, The Annals of Statistics 6 (1978) 34–58.[21] P. H. Rosenbaum, D. B. Rubin, The Central Role of the Propensity Score inObservational Studies for Causal E ﬀ ects, Biometrika 70 (1983) 41–55.[22] G. Papadogeorgou, C. Choirat, C. M. Zigler, Adjusting for unmeasuredspatial confounding with distance adjusted propensity score matching, Bio-statistics.[23] E. A. Stuart, Matching Methods for Causal Inference: A Review and a LookForward, Statistical Science: a Review Journal of the Institute of Mathemat-ical Statistics 25 (1) (2010) 1–21.[24] R Core Team, R: A Language and Environment for Statistical Computing, RFoundation for Statistical Computing, Vienna, Austria (2017).URL ﬀ t, Journal of Statistical Software 59 (4) (2014) 1–25.URL [26] M. Pastore, overlapping: Estimation of Overlapping in Empirical Distribu-tions, r package version 1.5.0 (2017).URL https://CRAN.R-project.org/package=overlapping [27] A. D. Cli ﬀ , J. K. Ord, Spatial Processes, Pion, 1981.[28] R. Bivand, G. Piras, Comparing implementations of estimation methods forspatial econometrics, Journal of Statistical Software 63 (18) (2015) 1–36.URL [29] P. C. Austin, An Introduction to Propensity Score Methods for ReducingE ﬀ ects of Confounding in Observational Studies, Multivariate BehavioralResearch 46 (2011) 399–424.[30] D. Lunn, D. Spiegelhalter, A. Thomas, N. Best, The bugs project: Evolution,critique and future directions, Statistics in Medicine 28 (25) 3049–3067. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.3680 , doi:10.1002/sim.3680 .URL https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.3680 [31] A. Gelman, D. B. Rubin, Inference from iterative simulation using mul-tiple sequences, Statist. Sci. 7 (4) (1992) 457–472. doi:10.1214/ss/1177011136 .URL https://doi.org/10.1214/ss/1177011136 [32] J. Besag, J. York, A. Molli´e, Bayesian image restoration, with two applica-tions in spatial statistics, Annals of the Institute of Statistical Mathematics43 (1) (1991) 1–20. doi:10.1007/BF00116466 .URL https://doi.org/10.1007/BF00116466 [33] E. K. Cahoon, R. M. Pfei ﬀ er, D. C. Wheeler, J. Arhancet, S.-W. Lin,B. H. Alexander, M. S. Linet, D. M. Freedman, Relationship between am-bient ultraviolet radiation and non-Hodgkin lymphoma subtypes: A U.S.population-based study of racial and ethnic groups, International Journal ofCancer 136 (2015) 432–441. 3534] U. S. Environmental Protection Agency, Summary of the Clean AirActHttps: // / laws-regulations / summary-clean-air-act.[35] U. S. Environmental Protection Agency, Green Book PM-2.5 (1997) AreaInformationHttps: // / green-book / green-book-pm-25-1997-area-information.[36] U. S. Environmental Protection Agency, Technical support for state andtribal air quality ﬁne particle (pm2.5) designations (2004) 5–1–5–3.[37] L. Gharibvand, D. Shavlik, M. Ghamsary, W. L. Beeson, S. Soret, R. Knut-sen, S. F. Knutsen, The association between ambient ﬁne particulate air pol-lution and lung cancer incidence: Results from the ahsmog-2 study, Envi-ronmental health perspectives 125 (3).[38] P. J. Villeneuve, M. Jerrett, D. Brenner, J. Su, H. Chen, J. R. McLaughlin,Villeneuve et al. respond to “impact of air pollution on lung cancer”, Amer-ican Journal of Epidemiology 179 (4) (2014) 455–456.[39] J. E. Hart, Invited commentary: Epidemiologic studies of the impact of airpollution on lung cancer, American Journal of Epidemiology 179 (4) (2014)452–454.[40] D. Holland. Downscaler model for predicting daily air pollution [online](March 2019) [cited March 22, 2019].[41] L. Dwyer-Lindgren, A. H. Mokdad, T. Srebotnjak, A. D. Flaxman, G. M.Hansen, C. J. Murray, Cigarette smoking prevalence in us counties:1996-2012, Population Health Metrics 12 (1) (2014) 5. doi:10.1186/1478-7954-12-5 .URL https://doi.org/10.1186/1478-7954-12-5 [42] C. Gumiaux, D. Gapais, J. Brun, Geostatistics applied to best-ﬁt interpola-tion of orientation data, Tectonophysics 376 (3) (2003) 241–259.[43] D. J. Spiegelhalter, N. G. Best, B. P. Carlin, A. Van Der Linde, Bayesianmeasures of model complexity and ﬁt, Journal of the Royal Statis-tical Society: Series B (Statistical Methodology) 64 (4) 583–639. arXiv:https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/1467-9868.00353 , doi:10.1111/1467-9868.00353 .36RL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9868.00353 [44] J. S. Hodges, B. J. Reich, Adding Spatially-Correlated Errors Can Mess Upthe Fixed E ﬀﬀ