[PDF] Assessing the causal effects of a stochastic intervention in time series data: Are heat alerts effective in preventing deaths and hospitalizations?

Abstract

We introduce a new causal inference framework for time series data aimed at assessing the effectiveness of heat alerts in reducing mortality and hospitalization risks. We are interested in addressing the following question: how many deaths and hospitalizations could be averted if we were to increase the frequency of issuing heat alerts in a given location? In the context of time series data, the overlap assumption - each unit must have a positive probability of receiving the treatment - is often violated. This is because, in a given location, issuing a heat alert is a rare event on an average temperature day as heat alerts are almost always issued on extremely hot days. To overcome this challenge, first we introduce a new class of causal estimands under a stochastic intervention (i.e., increasing the odds of issuing a heat alert) for a single time series corresponding to a given location. We develop the theory to show that these causal estimands can be identified and estimated under a weaker version of the overlap assumption. Second, we propose nonparametric estimators based on time-varying propensity scores, and derive point-wise confidence bands for these estimators. Third, we extend this framework to multiple time series corresponding to multiple locations. Via simulations, we show that the proposed estimator has good performance with respect to bias and root mean squared error. We apply our proposed method to estimate the causal effects of increasing the odds of issuing heat alerts in reducing deaths and hospitalizations among Medicare enrollees in 2817 U.S. counties. We found weak evidence of a causal link between increasing the odds of issuing heat alerts during the warm seasons of 2006-2016 and a reduction in deaths and cause-specific hospitalizations across the 2817 counties.

Full PDF

AAssessing the causal eﬀects of a stochasticintervention in time series data: Are heat alerts eﬀective in preventing deaths and hospitalizations?

Xiao WuDepartment of Biostatistics, Harvard T.H. Chan School of Public HealthandKate R. WeinbergerSchool of Population and Public Health, University of British ColumbiaandGregory A. WelleniusDepartment of Environmental Health,Boston University School of Public HealthandFrancesca Dominici, Danielle BraunDepartment of Biostatistics, Harvard T.H. Chan School of Public HealthFebruary 23, 2021

Abstract

We introduce a new causal inference framework for time series data aimed atassessing the eﬀectiveness of heat alerts in reducing mortality and hospitalizationrisks. We are interested in addressing the following question: how many deaths andhospitalizations could be averted if we were to increase the frequency of issuing heatalerts in a given location? In the context of time series data, the overlap assumption– each unit must have a positive probability of receiving the treatment – is oftenviolated. This is because, in a given location, issuing a heat alert is a rare event onan average temperature day as heat alerts are almost always issued on extremely hotdays. To overcome this challenge, ﬁrst we introduce a new class of causal estimands a r X i v : . [ s t a t . A P ] F e b nder a stochastic intervention (i.e., increasing the odds of issuing a heat alert) for asingle time series corresponding to a given location. We develop the theory to showthat these causal estimands can be identiﬁed and estimated under a weaker versionof the overlap assumption. Second, we propose nonparametric estimators based ontime-varying propensity scores, and derive point-wise conﬁdence bands for these es-timators. Third, we extend this framework to multiple time series corresponding tomultiple locations. Via simulations, we show that the proposed estimator has goodperformance with respect to bias and root mean squared error. We apply our pro-posed method to estimate the causal eﬀects of increasing the odds of issuing heatalerts in reducing deaths and hospitalizations among Medicare enrollees in 2817 U.S.counties. We found weak evidence of a causal link between increasing the odds ofissuing heat alerts during the warm seasons of 2006-2016 and a reduction in deathsand cause-speciﬁc hospitalizations across the 2817 counties. Keywords:

Multiple time series; Propensity score; Time-varying confounding2

Introduction

Extreme heat events are widely recognized as a signiﬁcant threat to public health (US EPA2006). In the U.S., heat waves have been associated with very high morbidity and mortality(Bobb et al. 2014, Weinberger et al. 2020). In an eﬀort to reduce heat-related mortalityand morbidity, the U.S. National Weather Service (NWS) issues heat alerts in advanceof forecast extreme heat events in order to communicate these risks to the public andlocal government oﬃcials (Hawkins et al. 2017). However, it is largely unknown howeﬀective these heat alerts are in reducing adverse health outcomes such as deaths andhospitalizations. To ﬁll these knowledge gaps, we acquired daily time series data on: 1)daily maximum heat index (an index that combines air temperature and relative humidityto posit a human-perceived equivalent temperature); 2) issuance of heat alerts; and 3)number of deaths and hospitalizations among Medicare enrollees for 2817 U.S. countieswithin the warm season (April-October) from 2006 to 2016 (a total of 2354 days).Although randomized controlled experiments are considered the gold standard for eval-uating the eﬀectiveness of public policy (Athey & Imbens 2017), it would be unethicalto conduct them in this context since randomly suspending the heat alert system mayresult in excess deaths attributable to heat (Weinberger et al. 2018, Benmarhnia et al.2016). Therefore, evaluating the eﬀectiveness of NWS-issued heat alerts can only be con-ducted by analyzing observational data. In our time series data, the treatment (e.g., theissuance of a heat alert on a given day in a county) is determined by an unknown assign-ment mechanism rather than by controlled randomization. Consequentially, to eliminateconfounding bias, we must control for time varying covariates that are associated withboth the treatment (e.g., issuance of heat alerts) and outcomes of interest (e.g., deaths orhospitalizations). In addition, to reliably estimate causal eﬀects such as the average treat-ment eﬀect (ATE), i.e., the contrast of the potential outcome under a treatment W = 1vs. W = 0 ( AT E (1 ,

0) = E [ Y (1) − Y (0)]), the overlap assumption must be met, that is,any unit (e.g., an observation for a given location on a given day) must have a positiveprobability of receiving either treatment or control (Rosenbaum & Rubin 1983). In our3ontext, this assumption is likely violated because the assignment to the treatment is rareand unbalanced. Indeed, the issuance of a heat alert takes place on average in 2.52 %(standard deviation: 2.19 %) of the warm season days across U.S. counties. Moreover, aheat alert is highly unlikely on a relatively cool day, whereas it almost always happens onvery hot days. Even if the overlap assumption holds with limited overlap (i.e., chances ofassigning to the treatment are merely small), a large sample size is needed to learn aboutall possible treatment assignments and avoid large estimated variance (Kang et al. 2007).Causal inference in the context of time varying treatments has been extensively studiedin longitudinal settings by Robins & Hern´an (2009) where the unit of the analysis is theperson or the location (e.g., a county) and often the number of units ( N ) is much largerthan the number of repeated observations ( T ) for a single person or location. In contrast,our data set presents the unique feature of time series, where the unit of the analysis isthe observation for a given day in a given county and the number of days for a single timeseries might be much larger than the number of counties, that is T > N . Furthermore,it is plausible that the true causal eﬀects might be highly heterogeneous across counties.This presents a huge challenge: the causal estimand deﬁned on a super-population, e.g., anATE with respect to large N (e.g., counties), might not capture the heterogeneous natureof each county and cannot be estimated without stringent assumptions regarding modelextrapolation. Therefore, careful thought and consideration should be given to account forheterogeneous causal eﬀects across multiple counties.Previous research has focused on some aspects of the methodology gaps mentionedabove. Research on confounding adjustment in observational studies has relied extensivelyon the propensity score, that is, the probability of a unit being assigned to a particulartreatment given other pre-treatment covariates (Rosenbaum & Rubin 1983). Often, con-founding adjustment via propensity scores is used when the goal of the inference is toestimate a causal eﬀect of a deterministic intervention. That is, under a situation whereeach unit’s treatment can have a ﬁxed value (e.g., treatment W = 1 or control W = 0) suchas in the context of the ATE. Such estimands compare potential outcomes if units were4eterministically assigned treatment W = 1 vs. control W = 0. The overlap assumption,which is unlikely to hold in our setting, plays a central role when identifying the causalestimands under deterministic interventions. Stochastic interventions (e.g., changing theprobability of being assigned to a given treatment in some pre-speciﬁed way) have been pre-viously deﬁned to help overcome violations of the overlap assumption (Haneuse & Rotnitzky2013, Kennedy 2019, Naimi et al. 2021). Kim et al. (2019) have shown that a stochasticintervention framework and its corresponding causal estimand is appealing when the studycontains long time periods (large T ), as in our context. However, properties of this frame-work have only been demonstrated in the context of asymptotic arguments with respectto large sample size N in longitudinal settings. In other words, previous approaches (Kimet al. 2019) were designed to analyze longitudinal studies when N goes to inﬁnity ratherthan time series studies when T goes to inﬁnity. The causal inference literature on timeseries studies is sparse with a few exceptions (Bojinov & Shephard 2019, Papadogeorgouet al. 2020). Bojinov & Shephard (2019) only focused on the randomized experiment set-ting and proposed an ATE-type causal estimand under deterministic interventions, whichdoes not directly apply to our observational data. Papadogeorgou et al. (2020) is the ﬁrstto bridge stochastic intervention framework with observational spatial-temporal data, butthey focused on spatio-temporal point processes which are suitable for modelling a singletime series with a common treatment (e.g., airstrikes) rather than multiple time series withdiﬀerent treatment paths from heterogeneous sources (e.g., heat alerts issued by local NWSoﬃces).To the best of our knowledge, no existing causal inference approach answers causalquestions deﬁned by stochastic interventions under observational data that consists of mul-tiple time series. In this paper we introduce a novel statistical framework in the contextof a stochastic intervention for time series data. The key idea is to replace a deterministicintervention (i.e., issuing vs. not issuing a heat alert on day t , deterministically) by astochastic intervention (i.e., increasing the odds of issuing a heat alert on day t by a ratio δ t ). Speciﬁcally, our proposed incremental intervention characterizes an increased odds to reat (e.g., issuing a heart alert) by δ t -fold at time t conditional on past information up totime t . This novel causal inference framework allows us to deﬁne causal estimands on timeseries data and ease the identiﬁcation and estimation diﬃculties due to the violation ofoverlap assumption. Based on this framework, we further deﬁne a broad class of stochasticcausal estimands and propose nonparametric estimators based on time-varying propensityscores to estimate such causal eﬀects from observational data.In Section 2, we discuss the national heat alert, heat index and Medicare data thatmotivate our methodological developments. In Section 3, starting in the context of a sin-gle time series, we deﬁne the stochastic causal estimands based on incremental propensityscore interventions, and their corresponding assumptions for their identiﬁcation. In Sec-tion 4, we propose nonparametric estimators based on time-varying propensity scores. InSection 5, we extend our approach in the context of multiple time series, we introduceadditional identiﬁcation assumptions, and propose a random-eﬀect meta-analysis to poolcausal estimands across time series from multiple locations. In Section 6, we illustrate theﬁnite-sample performance of the proposed estimators via simulation studies. In Section 7,we apply our method to the national heat alert data set to estimate the eﬀectiveness of heatalerts in reducing morbidity and mortality among Medicare beneﬁciaries. In Section 8, weconclude with a summary and discussion.

For this study, we gathered text ﬁles containing records of all non-precipitation alerts issuedby NWS between 2006 and 2016 from the National Oceanic and Atmospheric Administra-tion (NOAA). We used the information in the ﬁle header, which contains information onthe type, location, and timing of each alert in a standard format, to identify the date andlocation of each heat alert issued between April 1st and October 31st for the years 2006to 2016 in the contiguous U.S.. We then created a daily time series containing a binaryvariable for the issuance of heat alerts for each of the 2817 U.S. counties. We deﬁned “heatalerts” to include both heat advisories (a type of heat alert issued when less severe heat is6orecast) and excessive heat warnings (a type of heat alert issued when more severe heat isforecast). While the exact criteria used to issue heat alerts varies across the jurisdictions oflocal NWS oﬃces, a key commonality across jurisdictions is that heat alerts are issued basedon forecasts of future weather conditions (mainly, heat index). Furthermore, in additionto using information from forecast models, NWS forecasters are encouraged to use theirexperience and judgment in deciding whether or not to issue a heat alert (Hawkins et al.2017). In summary, the assignment mechanism of issuing heat alerts is unknown and needsto be modelled based on observed covariates. We also obtained 4-km gridded estimates ofdaily maximum temperature and vapor-pressure deﬁcit using the Parameter-elevation Re-gressions on Independent Slopes Model (PRISM) (Daly et al. 2008, Daly 2013). From thesevariables, we calculate time series data of population-weighted daily maximum heat indexfor each county. This approach is described by Spangler et al. (2019) and implemented bythe R package by Anderson et al. (2013).As outcomes, we consider daily all-cause deaths among the entire Medicare enrolleesand cause-speciﬁc hospitalizations for ﬁve heat-related diseases (heat stroke, urinary tractinfections, septicemia, renal failure, ﬂuid and electrolyte disorders) using Clinical Classiﬁca-tions Software (CCS) groupings of principal discharge diagnosis codes among the MedicareFee-for-Service (FFS) enrollees, which have been previously reported in Bobb et al. (2014).Figure 1 presents the geographic distribution of the 2817 U.S. counties with heat indexdata available. The minimal threshold heat index (i.e., the lowest heat index to trigger aheat alert) of all heat alert days across 2006-2016 for each county is presented in the upperpanel, and the number of heat alert days for each county across 2006-2016 is presented inthe lower panel. We observe that the minimal threshold heat index tends to be lower in thenorth; whereas counties in Florida and Texas had the highest threshold heat index. Coun-ties in the south east and south California issued the most heat alerts. Table 1 summarizesthe characteristics for NWS-issued heat alerts and our health outcomes. We ﬁnd that theissuance of heat alerts is rare, that is, it occurs on average in less than 3% of warm seasondays. 7igure 1: The minimal threshold heat index of all heat alert days per county (upper panel)and the number of heat alerts days per county (lower panel) in all 2817 counties acrossApril-October of 2006-2016. We observe that the minimal threshold heat index tends to belower in the north; whereas counties in Florida and Texas had the highest threshold heatindex. Counties in the south east and south California issued the most heat alerts. Countiesin gray represent those counties with insuﬃcient heat alert data, which were excluded fromthis study. 8able 1: Characteristics for NWS-issued heat alerts, all-cause deaths among Medicareenrollees, and cause-speciﬁc hospitalizations for ﬁve heat-related diseases among MedicareFFS enrollees across April-October of 2006-2016Variables 2817 Counties 550 Populous Counties % Days with Heat Alerts 2.52 % 2.22 % > , δ t -fold on day t for each day in the warm season, how many deaths would have beenaverted and how many hospitalizations for heat-related diseases could have been avoided?”One could estimate the total number of deaths and hospitalizations avoided for the entirewarm season. Alternatively, there may be interest in estimating the number of deaths andhospitalizations avoided among extremely hot days rather than the entire warm season.We deﬁned ”extremely hot days” to be the top 5% hottest days with a heat index thatexceeded 95% of warm season days in the corresponding county. We believe deaths andhospitalizations attributable to heat (and avoidable to heat alert) are more likely to happenon extremely hot days, and therefore this quantity may have more policy relevant impact:help us to assess the eﬀectiveness of heat alerts and insights into how to improve them tobetter protect the public’s health from extreme heat events.Figure 2 illustrates the comparison between the observed frequency of the actual heatalerts and the anticipated heat alerts in one counterfactual setting where the odds of issuingheart alerts were increased by 10-times for each day in the warm season. It is worthclarifying that increasing the odds of issuing a heat alert by 10-times for each day in9igure 2: NWS-issued heat alerts for Los Angeles county 2006-2016. The blue x’s representthe actual heat alerts that have been issued and the red circles represents the anticipatedheat alerts under a counterfactual scenario where the odds of issuing heat alerts wereincreased by approximately 10-times for every warm season day resulting in approximately2 . t is 90%, theprobability of issuing a heat alert would increase to × × − ≈ .

9% under thestochastic intervention corresponding to a 10-fold increase in odds. Taking Los Angelescounty as an example, we ﬁnd that if we increase the odds of issuing a heat alert by 10-times on all warm season days, the local NWS oﬃce would have issued 128 heat alerts across2006-2016 under this counterfactual setting, compared to the total of 56 heat alerts that theNWS oﬃce actually issued. That is increasing the odds by 10-times results in increasingthe heat alerts issued by 2 . In this section, we introduce the notation in the context of time series data (in our motivat-ing example, i is the county). Let Y i,t , W i,t , C i,t be the outcome, treatment and additionalcovariates at time t , for t ∈ { , , ..., T } in county i . Because the focus of this section is tointroduce the causal estimands for a single time series we omit the index i . In Section 5,we will extend the notation and the methodology to multiple time series.Each time t can be assigned to the treatment W t = 1 or W t = 0, and subsequentlythe outcome Y t is observed. We deﬁne { C T = ( C , ..., C T ) , W T = ( W , ..., W T ) , Y T =( Y , ..., Y T ) } . We assume that W t is binary; Y t and C t can be binary, categorical or contin-uous, in which C t includes pre-treatment covariates prior to the treatment assignment attime t (e.g., the heat index of day t in our motivating example).We denote by F t the ﬁltration which is used to capture past information prior to thetreatment assignment at time t , that is F t = { C t , W t − , Y t − } . Following the po-tential outcome framework for time series data (Bojinov & Shephard 2019), we denote Y t ( w t ) the potential outcome at time t that would have been observed under treat-ment path w t = ( w , ..., w t ), and we introduce the potential outcome path Y t ( w t ) = { Y ( w ) , Y ( w ) , ..., , Y t ( w t } . See Figure 3 a as an example of all the potential outcomepaths, for T = 3. If we assume, w = (1 , , { Y (1) , Y (1 , , Y (1 , , } (see the bold blue arrows in Figure 3 a ).We deﬁne the time-varying propensity score at time t as p t ( w t , F t ) = P r ( W t = w t |F t ) , for w t = { , } .

11n our motivating example, the time-varying propensity score denotes the probability ofissuing a heat alert on day t conditional on all past information up to day t just prior tothe heat alert issuance.Following Kennedy (2019), instead of assuming that the intervention is deterministic ,we assume that the intervention is stochastic . An intervention is deterministic if a givenday is deterministically assigned to a treatment ( W t = 1) or not ( W t = 0). An interventionis stochastic if a given day t is randomly assigned to the treatment based on the incrementalpropensity score deﬁned as p Inc t ( w t = 1 , F t ) := δ t p t ( w t = 1 , F t ) δ t p t ( w t = 1 , F t ) + 1 − p t ( w t = 1 , F t ) (1)In other words, we assume that on a given day t , the probability of being assigned tothe treatment increases by δ t -fold conditional on past information prior to the treatmentassignment at time t .We now denote δ T = { δ , δ , ..., δ T } , the intervention path, that is a sequence of inter-ventions, where at each time t we assume that the odds of being treated increase by δ t -fold.We denote w Inc1: t ( δ t ) the post-intervened treatment path under the intervention path δ t .Let τ t ( δ t ) := Y t ( w Inc1: t ( δ t )), which denotes the potential outcome at time t with respectto the post-intervened treatment path w Inc1: t ( δ t ), which further depends on the stochasticintervention path δ t . Figure 3 provides an example where δ = { , , } , that is forthree consecutive days, we increase the odds of receiving the treatment by 2-fold (for sim-plicity we consider a common δ across the three days, but in practice these could vary).Figure 3 b represents the causal relationship between treatment path W T = ( W , ..., W T ),covariate path { C T = ( C , ..., C T ), and potential outcome path Y T = ( Y , ..., Y T ) } pre-intervention. The proportions of blue shading reﬂect the probabilities of receiving thetreatment on each day, that are p ( w = 1 , F ) = on day 1, p ( w = 1 , F ) = on day 2,and p ( w = 1 , F ) = on day 3. Figure 3 c represents the counterfactual time series afterthe proposed stochastic intervention, in which the probabilities of receiving the treatmentincrease due to the sequence of proposed stochastic interventions δ = { , , } . After theinterventions δ = { , , } , the probability of receiving the treatment W = 1 on day 1 in-12reases to p Inc1 ( w = 1 , F ) = × p ( w =1 , F )2 × p ( w =1 , F )+1 − p ( w =1 , F ) = ; similarly, p Inc2 ( w = 1 , F ) = and p Inc3 ( w = 1 , F ) = .Unlike the common causal estimands which are generally deﬁned as averages of subjectlevel potential outcomes, here we deﬁne the causal estimand on a set of potential out-comes in correspondence of a given intervention path. In our context, we deﬁne the causalestimand as the temporal average of the potential outcomes up to time T ,¯ τ ( δ t,T ) = 1 T T (cid:88) t =1 τ t ( δ t )This causal estimand can be explained as the average of potential outcome during thetime period up to T , had the odds of receiving treatment increased by δ t -fold for each day t = 1 , . . . , T . In our motivating example, the temporal average of the potential outcomesrepresents the daily average deaths or hospitalizations in a county had the odds of receivingtreatment increased by δ t -fold on day t for each day within a warm season. Moreover, analogto the conditional average causal estimand (CATE), we can calculate the average of deathsor hospitalizations over days with pre-speciﬁed conditions (e.g., extremely hot days).We also introduce another causal estimand which is the summation of potential out-comes during the time period up to T . This causal estimand characterizes the total numberof deaths or hospitalizations in a county, had the odds of receiving treatment increased by δ t -fold on day t . There are major limitations that are associated with the causal estimands ¯ τ ( δ t,T ) de-ﬁned in Section 3.1: First, τ T ( δ T ) depends on a treatment path that the length increasesthroughout time T , thus one needs to impose stringent modelling assumptions to extrapo-late its values; Second, the asymptotic theory for these causal estimands is hard to establishas T goes to inﬁnity, because τ T ( δ T ) depends on an inﬁnite set of variables. Motivatedby Bojinov & Shephard (2019), we deﬁne a t -step causal estimand conditional on the ob-served treatment path w obs t − t ) named τ t ( δ ( t − t +1): t ) and its temporal average counterpart13amed ¯ τ ( δ ( t − t +1): t,T ), that are τ t ( δ ( t − t +1): t ) = Y t ( w obs t − t ) , w Inc( t − t +1): t ( δ ( t − t +1): t )) , t = 1 , , ..., t − τ ( δ ( t − t +1): t,T ) = 1 T − t + 1 T (cid:88) t = t { τ t ( δ ( t − t +1): t ) } . By making the estimands dependent on the observed treatment path w obs t − t ) , we deﬁnecausal estimands that can be estimated nonparametrically (see in Section 4). The useof t -step causal estimands are consistent with time series models with pre-speciﬁed ﬁxedmemory (e.g., Autoregressive integrated moving average (ARIMA) model). They are alsocompatible with the causal inference literature since they can be treated as conditionalcausal estimands condition on partial historical information { W t − t ) = w obs t − t ) } (Imbens& Rubin 2015). We can also establish a central limit theorem about the t -step causalestimator with respect to T → ∞ (see Section 4).Besides the theoretical convenience described above, the use of t -step causal estimandshas important practical implications. In our motivating example, we may assume thatdeath or hospitalization risks for today don’t depend on heat alerts issued many daysbefore as heat alerts deliver instant information about extreme heat events to the public,and thus their eﬀects are expected to last a very short period of time. Following the potential outcomes framework (Rubin 1974) adapted to the time-varyingtreatment regime (Robins et al. 2000), we establish the following assumptions of causalidentiﬁcation:

Assumption 1 (SUTVA)

The potential outcomes are non-anticipating, and consistent,that is Y obs t = Y t ( w obs T ) = Y t ( w obs t ) ∀ t = 1 , ..., T. Assumption 1 generalizes the usual Stable Unit Treatment Value (SUTVA) assumption(Imbens & Rubin 2015) to allow for the potential outcome at time t to depend on the wholepath of past treatment assignments, yet not to be aﬀected by future treatment assignments14igure 3: Overview of notation, intervention and estimand of one time series. δ = { , , } , that is we increase the odds of receiving treatments by 2-fold for each day t = 1 , , . . a. We illustrate all the potential outcome paths for T = 3. The bluelines and circles represents the potential outcome path { Y ( w ) , Y ( w ) , Y ( w ) } giventhe treatment path W = w = (1 , , b. The causal relationship between treatmentpath W T = ( W , ..., W T ), covariate path { C T = ( C , ..., C T ), and potential outcome path Y T = ( Y , ..., Y T ) } pre-intervention. The proportions of blue shading reﬂect the probabil-ities of receiving treatments at each day, that are p ( w = 1 , F ) = on day 1, p ( w =1 , F ) = on day 2, and p ( w = 1 , F ) = on day 3. c. The counterfactual time serieswith outcome path { τ ( δ ) = Y ( w Inc1 ( δ )) , τ ( δ ) = Y ( w Inc1:2 ( δ )) , τ ( δ ) = Y ( w Inc1:3 ( δ )) } after the proposed stochastic intervention δ , in which the probabilities of receiving treat-ments altered. The probability of receiving the treatment W = 1 on day 1 increases to p Inc1 ( w = 1 , F ) = × p ( w =1 , F )2 × p ( w =1 , F )+1 − p ( w =1 , F ) = ; similarly, p Inc2 ( w = 1 , F ) = and p Inc3 ( w = 1 , F ) = . 15non-anticipating). We still assume that there is only one version of the treatment for eachtime t , and each treatment path up to time t realizes a unique observed outcome at time t (consistency). Assumption 2 (Unconfoundedness)

The assignment mechanism is unconfounded iffor all W T ∈ W = { , } T , W t ⊥ Y s ( w s ) | F t ∀ t = 1 , ..., T and t ≤ s ≤ T. Assumption 2 aligns with the ”sequential randomization” assumption introduced in longi-tudinal studies by Robins et al. (1994), which states that the treatment assignment onlydepends on the past information and thus is conditionally independent of future potentialoutcomes. This assumption also excludes the possibility that future potential outcomescould impact the current treatment assignment retrospectively (a phenomenon which hasbeen discussed in Granger (1980)).

Assumption 3 (Weak overlap)

The assignment mechanism weakly overlaps if for all t ∈ { , , ..., T } , p t ( w t | F t ) = 0 ⇒ p Inc t ( w t | F t ) = 0The usual overlap assumption in the standard deterministic intervention requires that theassignment at each time t ∈ { , , ..., T } has a positive probability of assigning each treat-ment conditional on the ﬁltration F t . On the other hand, the proposed stochastic interven-tion framework avoids the usual overlap assumption (Kennedy 2019) by not intervening attime points which have zero probabilities of receiving certain treatment conditions. Notingthat the stochastic intervention described in Section 3.1 always meets the weak overlapassumption, since p Inc t ( w t | F t ) = 0 always holds when the propensity p t ( w t | F t ) = 0 nomatter the value of δ t .We show the identiﬁcation of the proposed causal estimand under the SUTVA, uncon-foundedness and the weak overlap assumptions. At any time point t ∈ { , ..., T } , τ t ( δ t )can be deﬁned as τ t ( δ t ) = (cid:88) w t ∈ W t (cid:90) ∂ F t µ ( F t ) × t (cid:89) s =1 [ w s δ s p s (1 , F s ) + (1 − w s ) p s (0 , F s )] δ s p s (1 , F s ) + 1 − p s (1 , F s ) (cid:124) (cid:123)(cid:122) (cid:125) := p Inc s ( w s |F s ) × dP r ( ∂ F s |F s − )16here ∂ F t = ∂ F × ... × ∂ F t , ∂ F s = F s / F s − , s = 1 , ..., t , and µ ( F t ) = E ( Y t | W t , F t ).It is worth noting that although τ t ( δ ) is causally identiﬁable, the corresponding quantitymay be practically infeasible to estimate using a single time series. We instead focus onthe temporal average causal estimand¯ τ ( δ t,T ) = 1 T T (cid:88) t =1 τ t ( δ t ) , which can also be causally identiﬁed given τ t ( δ t ) is identiﬁable ∀ t = 1 , , ..., T . Likewise,the t -step causal estimand τ t ( δ ( t − t +1): t ) and its temporal average counterpart ¯ τ ( δ ( t − t +1): t,T )can be identiﬁed under the same assumptions. We focus on the estimation and inference of t -step causal estimands given the theoreticaland practical conveniences stated in Section 3.2. To estimate the proposed causal esti-mands, we propose the following estimator based on time-varying propensity scores. If thetime-varying propensity score p s ( w s , F s ) is known, as in Section 6 of Kim et al. (2019),we can have an unbiased estimator for a t -step estimand on observed treatment path τ t ( δ ( t − t +1): t ) deﬁned as,ˆ τ t ( δ ( t − t +1): t ) = t (cid:89) s = t − t +1 (cid:8) [ W s δ + (1 − W s )] δp s (1 , F s ) + 1 − p s (1 , F s ) (cid:124) (cid:123)(cid:122) (cid:125) := p Inc s ( w s |F s ) (cid:9) Y t When the time-varying propensity score p s ( w s , F s ) is unknown, the estimation strategyrequires two steps. First, at each time point s , we need to estimate the time-varyingpropensity score ˆ p s ( w s , F s ). Second, if the propensity scores can be modeled with cor-rect model speciﬁcations, we can construct the following unbiased estimator for a t -step17stimand on observed treatment path.ˆ τ t ( δ ( t − t +1): t ) = t (cid:89) s = t − t +1 (cid:8) [ W s δ + (1 − W s )] δ ˆ p s (1 , F s ) + 1 − ˆ p s (1 , F s ) (cid:124) (cid:123)(cid:122) (cid:125) := (cid:100) p Inc s ( w s |F s ) (cid:9) Y t ˆ¯ τ ( δ ( t − t +1): t,T ) = 1 T − t + 1 T (cid:88) t = t ˆ τ t ( δ ( t − t +1): t ) = 1 T − t + 1 T (cid:88) t = t (cid:110) t (cid:89) s = t − t +1 (cid:8) [ W s δ + (1 − W s )] δ ˆ p s (1 , F s ) + 1 − ˆ p s (1 , F s ) (cid:9) Y t (cid:111) One important feature of the proposed estimator is that the temporal observed outcomesare weighted by a time-varying weights which correspond to the product of estimatedincremental propensity scores (cid:100) p Inc s ( w s |F s ). Given the proposed estimator, we calculate the variance estimator for ˆ τ t ( δ ( t − t +1): t ) ),Σ t ( δ ( t − t +1): t ) = E (cid:110) t (cid:89) s = t − t +1 (cid:0) [ W s δ s + (1 − W s )] δ ˆ p s (1 , F s ) + 1 − ˆ p s (1 , F s ) (cid:1) Y t (cid:111)(cid:124) (cid:123)(cid:122) (cid:125) V − (cid:110) E (cid:104) t (cid:89) s = t − t +1 (cid:0) [ W s δ s + (1 − W s )] δ ˆ p s (1 , F s ) + 1 − ˆ p s (1 , F s ) (cid:1) Y t (cid:105)(cid:111) In which V = (cid:88) w ( t − t t ∈ W ( t − t t (cid:90) ∂ F ( t − t t E ( Y t | W ( t − t +1): t = w ( t − t +1): t , F t ) × t (cid:89) s =( t − t +1) (cid:110) [ w s δ s p s (1 , F s ) + (1 − w s ) p s (0 , F s )] δ s p s (1 , F s ) + 1 − p s (1 , F s ) (cid:111) (cid:124) (cid:123)(cid:122) (cid:125) := p Inc s ( w s |F s ) × dP r ( ∂ F s |F s − )ˆ V = t (cid:89) s = t − t +1 (cid:8) [ W s δ s + (1 − W s )][ δ s ˆ p s (1 , F s ) + 1 − ˆ p s (1 , F s )] (cid:9) Y t We then establish the asymptotic properties of the proposed estimators on one time serieswith respect to time T . Using martingale theory show in Bojinov & Shephard (2019), we18how the proposed estimators are √ T -asymptotically normal. Theorem 1 (Asymptotic Normal)

If Assumptions 1–3 hold, and there exist T − t + 1 T (cid:88) t = t ˆΣ t ( δ ( t − t +1): t ) −→ Σ , as T → ∞ . We have, √ T (cid:104) T − t + 1 ˆ¯ τ t ( δ { ( t − t +1): t,T } − T − t + 1 ¯ τ t ( δ { ( t − t +1): t,T } (cid:105) −→ N (0 , Σ) In many applications including our motivating example, we have a number of countieswith their distinct time series. To combine information across counties, we generalize ourstochastic intervention framework to the context of meta-analysis of heterogeneous timeseries. To formalize the causal identiﬁcation in the context of multiple time series, weintroduce and discuss new assumptions of identiﬁcation in Section 5.1. In subsequentSection 5.2, we describe estimation and inference of the pooled estimator based on meta-analysis models.

Before we formally introduce the new pooled estimator, additional assumptions are neededto allow causal inference from multiple time series. We introduce the subscript i to denotethe subject, that is, the county in our motivating application. Assumption 4 (Multiple Subject SUTVA)

The potential outcomes are non-anticipating,non-interfering, and consistency, that is Y obs i,t = Y i,t ( w obs N, T ) = Y i,t ( w obs N, t ) = Y i,t ( w obs i, t ) ∀ i =1 , ..., N ; t = 1 , ..., T. Assumption 5 (Multiple Subject Unconfoundedness)

The assignment mechanism isunconfounded if for all W i, T ∈ W = { , } T , and F t = {F ,t , ..., F N,t } , then W i,t ⊥ Y i,s ( w i, s ) | F t ∀ i = 1 , ..., N ; t = 1 , ..., T and t ≤ s ≤ T. Assumption 5 is similar to Assumption 2, which still states the treatment path for eachsubject only depends on the past information, yet it does not rule out the possibility thatthe propensity of one subject is treated at time t would reply on the confounders, treatment,or outcome of another subject before the treatment assignment at time t . There is no needto assume that the potential outcome of subject i , Y i, t ( w i, t ) is independent with thepotential outcome of subject j , Y j, t ( w k, t ) for i (cid:54) = j . Assumption 6 (Random Eﬀects under Identical Intervention Path)

The interven-tion path δ T is identical across multiple time series. The t -step causal estimand ¯ τ i ( δ ( t − t +1): t ) under the intervention path δ T are independent across i = 1 , ..., N , ∀ t = 1 , ..., T ; t =1 , ..., t . Assumption 6 is the assumption needed for random-eﬀect meta-analysis. Although theintervention paths δ i, T , i ∈ { , , ..., N } do not have to be the same for every subject i , ina meta-analysis settings, it is more practically meaningful to assume the same interventionpath δ T across all subjects. Importantly, Assumption 6 sets no constraint on the observedtreatment path W i, T , i ∈ { , , ..., N } , which can be distinct for each subjects. Under therandom-eﬀects meta-analysis framework we allow the true causal estimands to diﬀer fordiﬀerent subjects, yet to represent a random sample from a particular distribution.20ere we create a weighted average causal estimand across N units, as¯ τ N ( δ ( t − t +1): t,T ) = N (cid:88) i =1 c i ¯ τ i ( δ ( t − t +1): t,T ) , where N (cid:88) i =1 c i = 1 and { c i , i = 1 , , ..., N } are ﬁxed.When Assumption 4-6 hold, we show ¯ τ N ( δ ( t − t +1): t,T ) are causally identiﬁed using a modiﬁedversion of identiﬁcation equation from Section 3, that is, ∀ i ∈ { , , ..., N } τ i,t ( δ t ) = (cid:88) w i, t ∈ W i, t (cid:90) ∂ F t µ ( F t ) × t (cid:89) s = t − t +1 [ w i,s δ s p i,s (1 , F i,s ) + (1 − w i,s ) p i,s (0 , F s )] δ s p i,s ( F s ) + 1 − p i,s ( F s ) (cid:124) (cid:123)(cid:122) (cid:125) := p Inc i,s ( w i,s | F s ) × dP r ( ∂ F s | F s − ) , ¯ τ i ( δ ( t − t +1): t,T ) = 1 T − t + 1 T (cid:88) t = t τ i,t ( δ ( t − t +1): t ) , ¯ τ N ( δ ( t − t +1): t,T ) = N (cid:88) i =1 c i ¯ τ i ( δ ( t − t +1): t,T ) . where ∂ F t = ∂ F × ... × ∂ F t , and ∂ F s = F s / F s − , s = 1 , ..., t , µ ( F t ) = E ( Y t | W t , F t ). { c i , i = 1 , , ..., N } are the ﬁxed weights satisfying (cid:80) Ni =1 c i = 1. The remaining question ishow to choose suitable weights c i for each subject i . There are at least two popular statistical models for meta-analysis, the ﬁxed-eﬀect modeland the random-eﬀects model (Borenstein et al. 2010). In general, we found the random-eﬀects assumption is more plausible in our motivating example, given there is generallyno reason to assume that the causal eﬀects of heat alerts on health outcomes are identicalacross all counties. We propose meta-analysis methods relying on random-eﬀects modelsto obtain a pooled estimator that summarizes the causal eﬀects obtained by time seriesdata across multiple counties (Serghiou & Goodman 2019).In meta-analysis, for the purpose of obtaining a weighted average causal estimand thatcharacterizes the causal eﬀect obtained by multiple studies (i.e., counties), we want to21ssign more weight to studies that yield a more precise estimate of that eﬀect. Underrandom-eﬀects model, there are two sources of variance (Borenstein et al. 2010). First, theobserved causal eﬀect ˆ¯ τ i ( δ ( t − t +1): t,T ) for any time series diﬀers from that time series’s trueeﬀect because of within-study error variance, V i . Second, the true eﬀect from each timeseries diﬀers from the weighted average causal eﬀect because of between-study variance.Therefore, the weight assigned to each time series under the inverse variance scheme is c i = 1 V i + τ , i = 1 , , ..., N Note that the within-study error variance, V i , is unique to each time series, but the between-study variance τ is common to all time series. We study ﬁnite-sample properties of the proposed estimators via simulation study on asingle time series, in which we vary: a) the length of the time series T ; b) the step sizeof the estimator t ; and c) the assignment mechanisms of the treatment p t ( W t = 1 | F t ).To reﬂect the nature of treatment assignment in our motivation example, we generate thetreatment W t under a non-overlap setting (i.e., in general, more than 80% of time t with p t ( W t = 1 | F t ) > . p t ( W t = 1 | F t ) < . T . For each t = 1 , ..., T , C t = ( C ,t , C ,t , C ,t , C ,t , C ,t ) ∼ N ( , I ) p t ( W t = 1 | F t ) = expit(10 × C t − W t − + 0 . Y t | W t , W t − , C t ∼ N (3 × W t + W t − + C ,t + C ,t + C ,t + C ,t + C ,t , t , the assignment mechanism of the treatment W t is stochastic and depends onboth C t , and the treatment at time t − W t − . Also please note that the outcome Y t isaﬀected by the treatments both at time t and t −

1. The main quantity of interest is the t -step causal estimand on observed treatment path. We assess the performance of each22stimator by calculating the integrated bias and root mean squared error (RMSE) (cid:100) Integrated Bias = 1 J J (cid:88) j =1 (cid:12)(cid:12)(cid:12) I I (cid:88) i =1 ˆ¯ τ i ( δ t − t +1: t,T,j ) − ¯ τ i ( δ t − t +1: t,T,j ) (cid:12)(cid:12)(cid:12)(cid:100) RMSE = √ NJ J (cid:88) j =1 (cid:104) I I (cid:88) i =1 (cid:0) ˆ¯ τ i ( δ t − t +1: t,T,j ) − ¯ τ i ( δ t − t +1: t,T,j ) (cid:1) (cid:105) / where ¯ τ ( δ t − t +1: t,T,j ) := T − t +1 (cid:80) Tt = t ¯ τ ( δ t − t +1: t,j ) is temporal average causal quantity, andˆ¯ τ ( δ j ) is its estimator based on time-varying propensity score. We assess the performancesat J = 50 values of δ j equally spaced between 0 . I = 500 simulations.We vary the following combination of ( T, t ), T = (200 , , t = (2 , , W t | C t , W t − . We conduct additional simulations inthe Supplementary Materials to show the results when the form of the propensity scoremodel is misspeciﬁed.Table 2 shows the proposed estimator performed well in various settings. Under thesame step t , we observed that the integrated bias and RMSE of the estimator decreasewhen the length of time series T increase, regardless of the time-varying propensity scoreis estimated by logistic regression or Super Learner. This performance is expected sincewe have established a central limit theorem with respect to time T . We also observeda decreased performance of the estimator as the step t increases. This phenomenon isnot surprising, since the weights in the proposed estimator is deﬁned as the product of t terms of estimated incremental propensity scores. We, in general, expect the absolutebias and RMSE will likely be larger as more estimated product terms are included in theestimator. Noting that in our simulation setting, the outcome Y t only depends on thetreatment of W t and W t − , thus causal estimator with t = 2 could suﬃciently characterizethe causal relationship between time series Y t and W t , t ∈ { , , ..., T } . We also considered23able 2: Simulation results for the scenario assuming the time-varying propensity scoremodel is speciﬁed with a logit link. The Integrated Bias and RMSE (multiplied by 10 foreasier interpretation). T t Logistic Super Learner200 2 0.85 (1.17) 0.66 (1.02)200 5 1.78 (2.12) 1.26 (1.67)200 10 3.12 (3.45) 2.30 (2.71)1000 2 0.36 (0.51) 0.18 (0.38)1000 5 0.67 (0.79) 0.36 (0.51)1000 10 1.35 (1.43) 0.79 (0.89)5000 2 0.11 (0.19) 0.03 (0.16)5000 5 0.21 (0.28) 0.09 (0.19)5000 10 0.42 (0.47) 0.16 (0.29)the hypothetical scenarios of t = 5 and t = 10 since in data applications, the exactstep size t is unknown, thus one can only choose t based on substantive knowledge. Thesimulation results suggest the choices of t should be parsimonious, that is, choosing a stepsize t that is capable to capture the data complexity yet is as small as possible to achievethe better performances. We also found that using Super Learner model to estimate thetime-varying propensity score leads to improved performances compared to the logisticregression model even if the propensity score model is correctly speciﬁed with a logit link.This ﬁnding suggests that the Super Learner as an ensemble of ﬂexible parametric/non-parametric models are competent in various settings, and additional simulations in theSupplementary Materials indicate promising performances of the Super Learner when theunderlying data generating mechanism is unknown and potentially complex.Figure 4 further shows the curve of 2-step causal eﬀects when the odds of treatmentassignment were multiplied by factor δ ∈ [exp( − . , exp(2 . .52.02.53.0 −2 −1 0 1 2 log odds ratio d T = 200 log odds ratio d T = 1000 log odds ratio d T = 5000

Causal Effect Curve

Figure 4: The curve of 2-step causal eﬀects when the odds of treatment assignment weremultiplied by factor δ ∈ [exp( − . , exp(2 . T = 200 , , T increases. The point-wise 95% point-wise conﬁdencebands correctly capture the true curve in all three scenarios with diﬀerent number of timeperiod T = 200 , , T gets larger. Importantly, the 95% point-wise conﬁdence bands cap-ture the true curve in all settings, showing the proposed variance estimator also performreasonably well in the simulation study. We analyze the real data described in Section 2. First, we implement the methods proposedin Section 3 to estimate the stochastic causal estimands ¯ τ i ( δ ( t − t +1): t,T ) based on the dailytime series among warm season days for the years 2006 to 2016 in each county i for δ t ∈ [exp( − . , exp(2 . t in county i is 90%. Undera 10-fold increase in odds, that probability would become × × − ≈ . . × . × − ≈ . δ for diﬀerent times t .In the time-varying propensity score model, we included the following observed covari-ates: daily maximum heat index, lagged-1 day heat index, lagged-2 day heat index, movingaverage heat index during this warm season, lagged-1 day heat alert, lagged-2 day heatalert, running total number of heat alerts that have been issued during the correspondingwarm season, moving average number of deaths/hospitalizations during the correspondingwarm season, and an indicator for holidays.For each county, we estimate the counterfactual daily all-cause death counts and cause-speciﬁc hospitalizations counts for ﬁve heat-related diseases (heat stroke, urinary tractinfections, septicemia, renal failure, ﬂuid and electrolyte disorders) under several hypo-thetical scenarios ranging from reducing the odds of issuing heart alerts by 10-times (i.e.,26xp( − . . t = 3 as we assume the issuance of heat alerts is unlikely to aﬀect theoutcome three or more days later. We deﬁne the county-speciﬁc causal eﬀect curve forcounty i as ¯ τ i ( δ ( t − t,T ) − ¯ τ i (1 , , τ i ( δ ( t − t,T ) denotes the estimated counterfac-tual counts under various stochastic interventions ( δ ∈ [exp( − . , exp(2 . τ i (1 , , δ = 1). After obtaining all 2817 county-speciﬁc causal eﬀect curves, we utilize the meta-analysis approach proposed in Section 5to pool the estimated county-speciﬁc causal eﬀect curves across counties resulting in theestimation of the pooled causal eﬀect curve, (cid:80) Ni =1 c i [¯ τ i ( δ ( t − t,T ) − ¯ τ i (1 , , c i , i = 1 , , . . . , N are obtained by the random-eﬀect meta-analysis model.Figure 5 shows the estimated pooled causal eﬀects of average all-cause deaths andcause-speciﬁc hospitalizations for ﬁve heat-related diseases per day per county among 2817counties across 2000-2016. The curves represent the diﬀerences in counts of deaths andhospitalizations per day on average across 2817 counties comparing the counterfactual sce-narios where odds of issuing heat alerts were multiplied by factor δ ∈ [exp( − . , exp(2 . δ = 1).The corresponding point-wise 95% conﬁdence bands of the diﬀerences are presented bythe dotted lines. The black vertical lines represent the average number of deaths andhospitalizations that could be avoided on a warm season day for a county and their cor-responding conﬁdence intervals (CIs) if we had increased the odds of issuing heart alertsby [exp(2 . ≈ δ ) increase above 0 indi-cating the reductions of all-cause deaths and cause-speciﬁc hospitalizations for ﬁve heat-related diseases among Medicare enrollees per day per county as log( δ ) increases. Thecurves are relatively ﬂat, especially when log( δ ) <

0, and also the conﬁdence intervalscontain 0 throughout the range of δ ∈ [exp( − . , exp(2 . . ≈ .

08 (95% conﬁdence interval (CI): − .

02 to 0 .

17) deaths could be avoided for a county27see vertical black line in Figure 5 a ). In the calculation of the total events avoidableto heat alerts, we only take into account estimated deaths and hospitalizations avoidedamong extremely hot days (i.e., the top 5% hottest days with a heat index that exceeded95% of warm season days in the corresponding county). This gives us approximately2 ,

329 avoided deaths (95% CI: −

510 to 5 , ,

137 avoided deaths (95% CI: − ,

095 to 103 , . ≈ −

122 to 536) forheat stroke; 2 ,

508 (95% CI: − ,

110 to 7 , ,

674 (95% CI: − ,

229 to 13 , ,

164 (95% CI: − ,

642 to 5 , ,

733 (95% CI: − ,

235 to 4 , δ ∈ [exp( − . , exp(2 . δ = 1. The estimated numbers of averteddeaths and avoided hospitalizations are with substantial statistical uncertainty. We doﬁnd there is large between-county heterogeneity in the random-eﬀect meta-analysis model(P-value of heterogeneity test < . aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa −0.3−0.2−0.10.00.10.2 −2 −1 0 1 2 Log d C oun t s pe r da y pe r c oun t y Diff in

Deaths as d varies vs. d = 1 eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee −0.003−0.002−0.0010.0000.0010.002 −2 −1 0 1 2 Log d C oun t s pe r da y pe r c oun t y Diff in

Heat stroke hosps as d varies vs. d = 1 cccccccccccccccccccccccccccccccccccccccccccccccccc −0.04−0.020.000.02 −2 −1 0 1 2 Log d C oun t s pe r da y pe r c oun t y Diff in

Urinary tract infections hosps as d varies vs. d = 1 dddddddddddddddddddddddddddddddddddddddddddddddddd −0.08−0.040.000.04 −2 −1 0 1 2 Log d C oun t s pe r da y pe r c oun t y Diff in

Septicemia hosps as d varies vs. d = 1 bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb −0.03−0.02−0.010.000.010.02 −2 −1 0 1 2 Log d C oun t s pe r da y pe r c oun t y Diff in

Renal failure hosps as d varies vs. d = 1 ffffffffffffffffffffffffffffffffffffffffffffffffff −0.02−0.010.000.010.02 −2 −1 0 1 2 Log d C oun t s pe r da y pe r c oun t y Diff in

Fluid and electrolyte disorders hosps as d varies vs. d = 1 Figure 5: The estimated pooled causal eﬀect curves of average all-cause deaths and cause-speciﬁc hospitalizations for ﬁve heat-related diseases (heat stroke, urinary tract infections,septicemia, renal failure, ﬂuid and electrolyte disorders) per day per county among 2817counties. The curves represent the diﬀerences in deaths and hospitalizations comparingthe counterfactual situations where odds of issuing heat alerts were multiplied by factor δ ∈ [exp( − . , exp(2 . δ = 1). The corresponding point-wise 95% conﬁdence bands of the diﬀerencesare presented by the dotted lines. The black vertical lines represent the daily averagenumber of deaths and hospitalizations that could be avoided for a county and their corre-sponding conﬁdence intervals (CIs) if we had increased the odds of issuing heart alerts by[exp(2 . ≈ > ,

000 population sizes. These counties included more than 70%of all-cause deaths and cause-speciﬁc hospitalizations observed among the 2817 counties.The motivation for this additional sensitivity analysis was the concern that combiningcounties with varying population sizes introduces too much heterogeneity in the pooledcausal estimates and therefore we examined the subset of populous counties as a sensitivityanalysis. As shown in Section S.2, the shapes of the causal eﬀect curves based on the subsetof 550 populous counties are very similar to the estimated curves based on data from all 2817counties. Notably, the width of conﬁdence bands is, in general, narrower than the curvesbased on data from all 2817 counties, which indicates reduced heterogeneity as expected.Additional sensitivity analysis was conducted where we varied the step size and estimatedpooled causal eﬀect curves of 1-step and 6-step estimands. As shown in SupplementaryMaterials Section S.2, the causal eﬀect of increasing the frequency of issuing heat alertsmay have more pronounced eﬀects on health outcomes when considering longer time lags.

We have developed a causal inference framework for time series data from observationalstudies. Under this framework, we introduced a new class of time series causal estimandsunder stochastic interventions. Our proposed nonparametric estimators based on time-varying propensity scores are shown to provide unbiased estimation of the causal estimandswith theoretical justiﬁcation. When observational studies contain time series data frommultiple locations (counties in our data application), we link our stochastic interventionframework to the random-eﬀect meta-analysis framework. The ultimate goal in this settingis to obtain a pooled causal estimator that summarizes the potentially heterogeneous causaleﬀects obtained from multiple time series.The proposed approaches were motivated from our data application where the goal wasto assess causal evidence of heat warning eﬀectiveness in reducing morbidity and mortality.We implement the proposed method to estimate the causal eﬀects of NWS-issued heat30lerts on daily average all-cause deaths among Medicare enrollees, and cause-speciﬁc hos-pitalizations for ﬁve heat-related diseases (heat stroke, urinary tract infection, septicemia,renal failure, ﬂuid and electrolyte disorders) among Medicare FFS enrollees residing in 2817counties from 2006 to 2016. The results suggest weak evidence of reductions in morbid-ity and mortality if NWS increased the odds of issuing heat alerts. However, the resultswere subject to large statistical uncertainties, probably due to 1) the distinct nature ofthe demographics for each county 2) the diﬀerent criteria of issuing heat alerts across thejurisdictions of local NWS oﬃces 3) the low prevalence of the heat alerts in most counties.Importantly, the random-eﬀect meta-analysis model indicated large between-county het-erogeneity of the causal eﬀects of NWS-issued heat alerts according to hypothesis test forassumption of heterogeneity in random-eﬀect meta analysis. When we restrict the analysiswithin 550 counties with > ,

000 population size, we ﬁnd reduced yet still statisticallysigniﬁcant heterogeneity of the causal eﬀects of NWS-issued heat alerts across counties.This ﬁnding provides important policy implications that the optimal health-based metricsfor issuing heat alerts might be speciﬁc to geographic locations.One key distinction of our approach compared to traditional environmental epidemi-ology methods in the ﬁeld, is that most time-series studies on heat alerts used matchedcase-crossover or diﬀerence-in-diﬀerence designs, where the focus is on the causal eﬀectacross case days (i.e., the days with heat alerts), and the number of heat alert days is ﬁxed(Chau et al. 2009, Benmarhnia et al. 2016, Weinberger et al. 2018). In contrast, our anal-ysis focuses on the causal eﬀect of a stochastic intervention, estimating the causal eﬀectas the frequency of the heat alert changes, and as such the counterfactual heat alert daysare not ﬁxed. Therefore our results are not directly comparable to previously publishedstudies. While our novel causal inference approach provides new insights about the policyevaluation, there are trade-oﬀs to be considered. First, we expect causal eﬀects deﬁned bystochastic interventions to answer a diﬀerent policy question than that of a deterministicintervention (Kennedy 2019). The proposed stochastic causal estimand is an intuitive quan-tity to answer the following causal question “how many adverse health outcomes could have31een averted if we shifted the frequency of heat alerts?” which was the focus of our work.However, other practitioners’ interests may be more aligned with estimating causal eﬀectsof deterministic interventions, for instance, “how many adverse health outcomes could havebeen avoided if the temperature was, on average, one degree lower during extremely hotdays (Weinberger et al. 2020)?” In that case, more traditional causal inference methodsbased on deterministic interventions (Bojinov & Shephard 2019, Bojinov et al. 2020, Ram-bachan & Shephard 2019) may be used. Second, the random-eﬀect meta-analysis createsa weighted average causal estimand and its corresponding pooled estimator, in which theweights are calculated by the combination of between-study and within-study error vari-ance. For this reason, the weighted average causal quantity is deﬁned on a hypotheticalweighted population. Dahabreh et al. (2020) has criticized that standard meta-analysesmay produce results that do not belong to a clear target population when each study (coun-ties in our data application) represents a diﬀerent population and the treatment eﬀect variesacross these populations. In the future work, we plan to develop approaches that allowinferences to be transported from multiple time series to a pre-speciﬁed target population(Li & Song 2020, Dahabreh et al. 2020). Third, the non-interference assumption may notbe met in many environmental health and climate change studies using spatial-temporaldata. For instance, the NWS-issued heat alerts in one county may impact people in ad-jacent counties. In future work, we plan to extend the time series intervention path to amultivariate intervention path deﬁned by random matrices (Papadogeorgou et al. 2020), topotentially overcome the violation of non-interference assumption, and identify direct andspillover eﬀects under this stochastic intervention framework. Fourth, the stable estima-tion of time-varying propensity score based on observational time series data is challengingsince both the time-varying treatment paths and observed confounder set are potentiallyhigh-dimensional. We will need to extend the modelling strategy for approximate residualbalancing for approximately balancing weights in high dimensions to these observationaltime series study settings (Athey et al. 2018).The stochastic intervention framework for time series data introduced in this paper32s the ﬁrst approach which allows for identiﬁcations and estimation of causal quantitiesdeﬁned by stochastic interventions on multiple time series. We believe this statisticalframework addresses one of the emerging methodological needs in climate change andenvironmental health research where researchers often collect time series data from multiplegeographic locations seeking causal evidence among representative regions (Liu et al. 2019,Lee et al. 2020). We expect the proposed framework can be adapted without methodologicalbarriers to science and policy-relevant research in political science, economics, and lawwhere considerable amount of spatial-temporal data are generated and collected.

Acknowledgement

The authors are grateful to Jose R. Zubizarreta and Guanbo Wang for helpful discussions.Funding was provided by National Institute of Health (NIH) grants R01 ES029950, R01AG066793-01R01, R01 ES030616, R01 AG060232-01A1, R01 ES028033, R01 MD012769,R01 ES026217; USEPA grants 83587201-0; and Harvard University Climate Change Solu-tions Fund. Dr. Wellenius and Dr. Dominici have received consulting income from theHealth Eﬀects Institute (Boston, MA). Dr. Wellenius recently served as a visiting scientistat Google, LLC (Mountain View, CA). The authors report that they have no conﬂicts ofinterest relevant to this work.

References

Anderson, G. B., Bell, M. L. & Peng, R. D. (2013), ‘Methods to calculate the heat index asan exposure metric in environmental health research’,

Environmental health perspectives (10), 1111–1119.Athey, S. & Imbens, G. W. (2017), ‘The state of applied econometrics: Causality and policyevaluation’,

Journal of Economic Perspectives (2), 3–32.Athey, S., Imbens, G. W. & Wager, S. (2018), ‘Approximate residual balancing: debiased33nference of average treatment eﬀects in high dimensions’, Journal of the Royal StatisticalSociety: Series B (Statistical Methodology) (4), 597–623.Benmarhnia, T., Bailey, Z., Kaiser, D., Auger, N., King, N. & Kaufman, J. S. (2016), ‘Adiﬀerence-in-diﬀerences approach to assess the eﬀect of a heat action plan on heat-relatedmortality, and diﬀerences in eﬀectiveness according to sex, age, and socioeconomic status(montreal, quebec)’, Environmental health perspectives (11), 1694–1699.Bobb, J. F., Obermeyer, Z., Wang, Y. & Dominici, F. (2014), ‘Cause-speciﬁc risk of hospitaladmission related to extreme heat in older adults’,

Jama (24), 2659–2667.Bojinov, I., Rambachan, A. & Shephard, N. (2020), ‘Panel experiments and dynamic causaleﬀects: A ﬁnite population perspective’, arXiv preprint arXiv:2003.09915 .Bojinov, I. & Shephard, N. (2019), ‘Time series experiments and causal estimands: exactrandomization tests and trading’,

Journal of the American Statistical Association pp. 1–36.Borenstein, M., Hedges, L. V., Higgins, J. P. & Rothstein, H. R. (2010), ‘A basic intro-duction to ﬁxed-eﬀect and random-eﬀects models for meta-analysis’,

Research synthesismethods (2), 97–111.Chau, P., Chan, K. & Woo, J. (2009), ‘Hot weather warning might help to reduce elderlymortality in hong kong’, International journal of biometeorology (5), 461–468.Dahabreh, I. J., Petito, L. C., Robertson, S. E., Hern´an, M. A. & Steingrimsson, J. A.(2020), ‘Toward causally interpretable meta-analysis: Transporting inferences from mul-tiple randomized trials to a new target population’, Epidemiology (3), 334–344.Daly, C. (2013), ‘Descriptions of prism spatial climate datasets for the conterminous unitedstates’, PRISM doc . 34aly, C., Halbleib, M., Smith, J. I., Gibson, W. P., Doggett, M. K., Taylor, G. H., Curtis,J. & Pasteris, P. P. (2008), ‘Physiographically sensitive mapping of climatological tem-perature and precipitation across the conterminous united states’, International Journalof Climatology: a Journal of the Royal Meteorological Society (15), 2031–2064.Granger, C. W. (1980), ‘Testing for causality: a personal viewpoint’, Journal of EconomicDynamics and control , 329–352.Haneuse, S. & Rotnitzky, A. (2013), ‘Estimation of the eﬀect of interventions that modifythe received treatment’, Statistics in medicine (30), 5260–5277.Hawkins, M. D., Brown, V. & Ferrell, J. (2017), ‘Assessment of noaa national weatherservice methods to warn for extreme heat events’, Weather, climate, and society (1), 5–13.Imbens, G. W. & Rubin, D. B. (2015), Causal inference in statistics, social, and biomedicalsciences , Cambridge University Press.Kang, J. D., Schafer, J. L. et al. (2007), ‘Demystifying double robustness: A comparison ofalternative strategies for estimating a population mean from incomplete data’,

Statisticalscience (4), 523–539.Kennedy, E. H. (2019), ‘Nonparametric causal eﬀects based on incremental propensity scoreinterventions’, Journal of the American Statistical Association (526), 645–656.Kim, K., Kennedy, E. H. & Naimi, A. I. (2019), ‘Incremental intervention eﬀectsin studies with many timepoints, repeated outcomes, and dropout’, arXiv preprintarXiv:1907.04004 .Lee, W., Kim, Y., Sera, F., Gasparrini, A., Park, R., Choi, H. M., Prifti, K., Bell, M. L.,Abrutzky, R., Guo, Y. et al. (2020), ‘Projections of excess mortality related to diurnaltemperature range under climate change scenarios: a multi-country modelling study’,

The Lancet Planetary Health (11), e512–e521.35i, X. & Song, Y. (2020), ‘Target population statistical inference with data integrationacross multiple sources—an approach to mitigate information shortage in rare diseaseclinical trials’, Statistics in Biopharmaceutical Research (3), 322–333.Liu, C., Chen, R., Sera, F., Vicedo-Cabrera, A. M., Guo, Y., Tong, S., Coelho, M. S.,Saldiva, P. H., Lavigne, E., Matus, P. et al. (2019), ‘Ambient particulate air pollutionand daily mortality in 652 cities’, New England Journal of Medicine (8), 705–715.Naimi, A. I., Rudolph, J. E., Kennedy, E. H., Cartus, A., Kirkpatrick, S. I., Haas, D. M.,Simhan, H. & Bodnar, L. M. (2021), ‘Incremental propensity score eﬀects for time-ﬁxedexposures’,

Epidemiology (2), 202–208.Papadogeorgou, G., Imai, K., Lyall, J. & Li, F. (2020), ‘Causal inference with spatio-temporal data: Estimating the eﬀects of airstrikes on insurgent violence in iraq’, arXivpreprint arXiv:2003.13555 .Rambachan, A. & Shephard, N. (2019), ‘Econometric analysis of potential outcomes timeseries: instruments, shocks, linearity and the causal response function’, arXiv preprintarXiv:1903.01637 .Robins, J. M. & Hern´an, M. A. (2009), ‘Estimation of the causal eﬀects of time-varyingexposures’, Longitudinal data analysis , 599.Robins, J. M., Hernan, M. A. & Brumback, B. (2000), ‘Marginal structural models andcausal inference in epidemiology’.Robins, J. M., Rotnitzky, A. & Zhao, L. P. (1994), ‘Estimation of regression coeﬃcientswhen some regressors are not always observed’,

Journal of the American statistical As-sociation (427), 846–866.Rosenbaum, P. R. & Rubin, D. B. (1983), ‘The central role of the propensity score inobservational studies for causal eﬀects’, Biometrika (1), 41–55.36ubin, D. B. (1974), ‘Estimating causal eﬀects of treatments in randomized and nonran-domized studies.’, Journal of educational Psychology (5), 688.Serghiou, S. & Goodman, S. N. (2019), ‘Random-eﬀects meta-analysis: summarizing evi-dence with caveats’, Jama (3), 301–302.Spangler, K. R., Weinberger, K. R. & Wellenius, G. A. (2019), ‘Suitability of griddedclimate datasets for use in environmental epidemiology’,

Journal of exposure science &environmental epidemiology (6), 777–789.US EPA, E. P. A. (2006), ‘Excessive heat events guidebook’.Van der Laan, M. J., Polley, E. C. & Hubbard, A. E. (2007), ‘Super learner’, Statisticalapplications in genetics and molecular biology (1).Weinberger, K. R., Harris, D., Spangler, K. R., Zanobetti, A. & Wellenius, G. A. (2020),‘Estimating the number of excess deaths attributable to heat in 297 united states coun-ties’, Environmental Epidemiology (3), e096.Weinberger, K. R., Zanobetti, A., Schwartz, J. & Wellenius, G. A. (2018), ‘Eﬀectiveness ofnational weather service heat alerts in preventing mortality in 20 us cities’, Environmentinternational116