[PDF] Estimating Sibling Spillover Effects with Unobserved Confounding Using Gain-Scores

Abstract

A growing area of research in epidemiology is the identification of health-related sibling spillover effects, or the effect of one individual's exposure on their sibling's outcome. The health and health care of family members may be inextricably confounded by unobserved factors, rendering identification of spillover effects within families particularly challenging. We demonstrate a gain-score regression method for identifying exposure-to-outcome spillover effects within sibling pairs in a linear fixed effects framework. The method can identify the exposure-to-outcome spillover effect if only one sibling's exposure affects the other's outcome; and it identifies the difference between the spillover effects if both siblings' exposures affect the others' outcomes. The method fails in the presence of outcome-to-exposure spillover and outcome-to-outcome spillover. Analytic results and Monte Carlo simulations demonstrate the method and its limitations. To exercise this method, we estimate the spillover effect of a child's preterm birth on an older sibling's literacy skills, measured by the Phonological Awarenesses Literacy Screening-Kindergarten test. We analyze 20,010 sibling pairs from a population-wide, Wisconsin-based (United States) birth cohort. Without covariate adjustment, we estimate that preterm birth modestly decreases an older sibling's test score (-2.11 points; 95% confidence interval: -3.82, -0.40 points). In conclusion, gain-scores are a promising strategy for identifying exposure-to-outcome spillovers in sibling pairs while controlling for sibling-invariant unobserved confounding in linear settings.

Full PDF

EEstimating Sibling Spillover Eﬀects with UnobservedConfounding Using Gain-Scores

David C. Mallinson , Felix Elwert , Department of Population Health Sciences, University of Wisconsin-Madison Center for Demography and Ecology, University of Wisconsin-Madison Department of Sociology, University of Wisconsin-MadisonFebruary 24, 2021

Abstract

A growing area of research in epidemiology is the identiﬁcation of health-related sibling spillover eﬀects,or the eﬀect of one individual’s exposure on their sibling’s outcome. The health and health care of familymembers may be inextricably confounded by unobserved factors, rendering identiﬁcation of spillover eﬀectswithin families particularly challenging. We demonstrate a gain-score regression method for identifyingexposure-to-outcome spillover eﬀects within sibling pairs in a linear ﬁxed eﬀects framework. The method canidentify the exposure-to-outcome spillover eﬀect if only one sibling’s exposure aﬀects the other’s outcome; andit identiﬁes the diﬀerence between the spillover eﬀects if both siblings’ exposures aﬀect the others’ outcomes.The method fails in the presence of outcome-to-exposure spillover and outcome-to-outcome spillover. Analyticresults and Monte Carlo simulations demonstrate the method and its limitations. To exercise this method,we estimate the spillover eﬀect of a child’s preterm birth on an older sibling’s literacy skills, measured bythe Phonological Awareness Literacy Screening-Kindergarten test. We analyze 20,010 sibling pairs from apopulation-wide, Wisconsin-based (United States) birth cohort. Without covariate adjustment, we estimatethat preterm birth modestly decreases an older sibling’s test score (-2.11 points; 95% conﬁdence interval:-3.82, -0.40 points). In conclusion, gain-scores are a promising strategy for identifying exposure-to-outcomespillovers in sibling pairs while controlling for sibling-invariant unobserved confounding in linear settings.

Acknowledgements : This work was supported by the Eunice Kennedy Shriver National Institute for Child Health and Human Development(T32 HD007014-42), the University of Wisconsin-Madison Clinical and Translational Science Award program through the National Institutesof Health National Center for Advancing Translational Sciences (Grant UL1TR00427), the University of Wisconsin-Madison School ofMedicine and Public Health’s Wisconsin Partnership Program, and the University of Wisconsin-Madison Institute for Research on Poverty.We thank the Wisconsin Department of Children and Families, Department of Health Services, and Department of Public Instruction for theuse of data. We also thank Steven T. Cook, Dan Ross, Jane A. Smith, Kristen Voskuil, and Lynn Wimer for data access and programmingassistance. We thank Michael Sobel for methodological discussions and Deborah B. Ehrenthal for feedback on this manuscript. The contentis solely the responsibility of the authors and does not necessarily represent the oﬃcial views of supporting agencies. Supporting agenciesdo not certify the accuracy of the analyses presented. Conﬂicts of interest: none. a r X i v : . [ s t a t . M E ] F e b ontents Introduction

A sibling spillover eﬀect (i.e., “interference” or “carryover eﬀect”) is the eﬀect of an individual’s exposure ontheir sibling’s outcome.

The past two decades of epidemiologic research witnessed a burgeoning interestin the role of family environments in childhood health, calling attention to the importance of spilloverswithin families.

Yet, sibling spillovers are largely unexamined in the epidemiologic literature, as mostﬁeld-speciﬁc advancements in spillover identiﬁcation have been restricted to infectious diseases.

Withgrowing interest on the familial interdependence of health, the need for analytical tools to identify siblingspillovers is apparent.Unobserved confounding is particularly salient with sibling spillovers. Siblings often share experiencesthat cultivate their development, which may be unmeasured even in data-rich contexts. Fixed eﬀect (FE)designs that control for unobserved time-invariant confounding are immediately appealing, but there islittle precedent for their use to identify sibling spillovers. Sj¨olander et al. (2016) investigated FE models withsibling pairs for identifying targeted eﬀects of one siblings’ exposure on their own outcome in the presence ofspillover, noting that spillover may be identiﬁable if only one sibling’s exposure aﬀects the other’s outcome. Black et al. (2020) employed a diﬀerence-in-diﬀerences model with three-sibling clusters for identifying alower-bound estimate of a child’s disability on an older sibling’s academic performance. In this paper, we demonstrate a method for the identiﬁcation of one- and two-sided exposure-to-outcomespillovers in sibling pairs with gain-scores (i.e., diﬀerence-in-diﬀerences, or diﬀerence scores), a staple of FEestimation that removes shared confounding by diﬀerencing outcomes. We evaluate the gain-score estimatorin identifying spillover eﬀects across various models with one-sided or two-sided spillovers. Consistent withthe applied FE literature, we focus on linear models with homogenous eﬀects.

This paper is organized as follows. First, we brieﬂy introduce causal directed acyclic graphs (DAGs),which illustrate our models. Second, we discuss various two-sibling models with one-or two-sided spilloverand explain how and when gain-score methods can identify spillover eﬀects. Third, we illustrate our resultswith simulations. Fourth, we apply the method to identify the eﬀect of a younger sibling’s preterm birth onan older sibling’s literacy test performance.

Causal DAGs are useful for explaining the identiﬁcation of causal eﬀects. We review necessary terminologyfor this exposition. Causal DAGs are diagrams consisting of nodes (variables) and directed edges (directcausal eﬀects) that represent the assumed data-generating process (causal model).

Paths are sequencesof adjacent edges, regardless of the arrows’ directions. On causal paths between exposure and outcome, allarrows point from the exposure to the outcome. On non- causal paths between an exposure and an outcome,at least one arrow points away from the outcome. Causal paths “transmit” causal eﬀects, whereas non-causalpaths may transmit spurious associations.

Colliders are variables that receive two inbound arrows on a path(a given variable may be a collider on one path but not on another). Pearl’s d-separation criterion determineswhich variables in data generated by the assumed DAG are conditionally or unconditionally independent:two variables are independent if all paths between them are closed; and a path is closed if it includes a non- ollider as intermediate variable that is conditioned on, or if it includes a collider as intermediate variablethat is not conditioned on. Conversely, two variables may be associated if at least one path betweenthem is open ( d-connected ); and a path is open if it is not closed.Typically, health researchers attempt to identify causal eﬀects by adjusting for observed variables viaregression analysis, matching, or inverse-probability weighting, so that all causal paths between exposure andoutcome are open and all non-causal paths between them are closed.

However, researchers may worryabout open non-causal paths with unobserved confounders that cannot be closed by covariate adjustment.In multilevel analyses in which observations are clustered into groups (e.g., children in sibling pairs), FEmethods can sometimes identify causal eﬀects by subtracting out certain types of group-level unobservedconfounding.

Next, we describe several sibling spillover models with unobserved confounding andshow when gain-score estimation—a FE approach—can identify the spillover eﬀect.

Model and Assumptions

We ﬁrst present our baseline sibling spillover model and subsequently introduce variations on this model.For illustration, we discuss the spillover eﬀect of a child’s early health shock (e.g., serious illness) on theirsibling’s later academic achievement (e.g., test scores). This example is purposefully generic but broadlyapplicable, and it draws upon prior work of health-related spillover eﬀects on academic performance whilemotivating our empirical application.Our baseline model is a linear two-sibling comparison design with one-sided spillover ( Figure 1A ).Subscript i = 1 , . . . , N indicates cluster (family) and subscript j = 1 , T ij represents abinary or continuous exposure (e.g., the health shock), Y ij represents a continuous outcome (e.g., academicperformance), U i represents unobserved family-level confounding (i.e., the FE), and D i represents the gain-score, D i = Y i − Y i . Causal eﬀects in this model include the spillover eﬀect, θ ( T i → Y i ), of sibling 1’sexposure on sibling 2’s outcome; the targeted eﬀects, δ ( T ij → Y ij ), of each sibling’s exposure on their ownoutcome; and confounding eﬀects of the unobserved family-level confounders on each sibling’s exposure andoutcome, ψχ ( T i ← U i → Y i ) and ψγ ( T i ← U i → Y i ).All models embed several simplifying assumptions. First, the targeted eﬀects, δ , of T ij on Y ij and theconfounding eﬀects, ψ , of U i on Y ij are equal for both siblings. Second, all eﬀects are linear and homoge-nous. Third, there is only partial interference (i.e., spillovers within sibling clusters but not between siblingclusters).

Aside from partial interference, these assumptions align with conventional FE models.

Notably, our presentation abstracts from sibling-speciﬁc observed baseline covariates, C ij . Covariatesmay be added to our baseline and subsequent models as long as one can condition on C ij without loss ofgenerality. i T i Y i T i Y i D i (1A) χ ψψγ δθδ -1+1 U i (1B) T i Y i T i Y i D i χ ψψγ δθδ -1+1 τ U i (1C) T i Y i T i Y i D i χ ψψγ δθδ -1+1 φ Figure 1.

Causal directed acyclic graphs for linear data-generating models with one-sided exposure-to-outcome sibling spillover. Subscripts i and j denote cluster and sibling, respectively. T ij is theexposure, Y ij is the outcome, D i is the diﬀerence score, and U i is an unobserved family-level con-founder. Greek letters denote eﬀects. (1A) does not have exposure-to-exposure spillover, whereas(1B) and (1C) have exposure-to-exposure spillovers.3 ain-Score Estimation and Identiﬁcation of Spillover Eﬀects This subsection details the gain-score estimation strategy and demonstrates when our estimator point-identiﬁes spillover eﬀects (i.e., recovers the estimand precisely) for nine sibling spillover models that diﬀer bywhether spillover is one- or two-sided and by whether additional spillovers originate from outcomes.

Gain-score estimation

We investigate the ability of a gain-score estimator to identify exposure-to-outcome spillover eﬀects. First,we regress the gain-score on both siblings’ exposures, D i = b T i + b T i + e i , (1)where b and b are partial regression coeﬃcients for T i and T i , respectively, and e i is an error term. Wethen sum the partial regression coeﬃcients to compute a “spillover coeﬃcient” ( SC ), SC = b + b . (2)We will now interrogate whether the SC identiﬁes causal spillover eﬀects in each of several commonlyassumed data generating processes in health research. Settings with one-sided spillover

The object of interest (estimand) is θ , or the direct spillover eﬀect of sibling 1’s health shock on sibling 2’sacademic performance. Under the baseline model ( Figure 1A ), three open paths connect T i and Y i . Theﬁrst path, T i → Y i , is the causal spillover eﬀect of interest. The other two paths are non-causal pathsthat may transmit spurious association. The ﬁrst non-causal path, T i ← U i → T i → Y i , can be closedby adjusting on T i . However, the second non-causal path, T i ← U i → Y i , cannot be closed by covariateadjustment because it only contains the unobserved variable U i .Nonetheless, we can identify θ through gain-score regression. Under the assumptions of Figure 1A ,it can be shown that b = θ − δ and b = δ , using elementary regression algebra. Therefore, the spillovercoeﬃcient equals SC = b + b = θ .The intuition for this result is that ﬁrst-diﬀerencing exactly oﬀsets confounding biases involving U i ,26and that the SC corrects for the contamination of the spillover estimate in b . Speciﬁcally, the coeﬃcient b on T i captures the association ﬂowing along the open paths from T i to D i . There are ﬁve paths from T i to D i (listed together with their corresponding path coeﬃcients):1. T i ← U i → T i → Y i → D i (non-causal): δχγ T i ← U i → Y i → D i (non-causal): − ψχ T i ← U i → Y i → D i (non-causal): ψχ T i → Y i → D i (causal): δ . T i → Y i → D i (causal): θ The ﬁrst path is closed because the regression conditions on T i . The second and third paths cancel eachother out exactly. The fourth path transmits the spillover eﬀect. The ﬁfth path transmits the negative ofthe targeted eﬀect. Hence, the regression coeﬃcient b = θ − δ identiﬁes the diﬀerence between the spilloverand targeted eﬀect.The coeﬃcient b on T i captures the association ﬂowing along the open paths from T i and D i . Thereare four paths from T i to D i :1. T i ← U i → T i → Y i → D i (non-causal): − δχγ T i ← U i → Y i → D i (non-causal): − ψγ T i ← U i → Y i → D i (non-causal): ψγ T i → Y i → D i (causal): δ The ﬁrst path is closed because the regression conditions on T i ; the second and third paths cancel eachother out; and the fourth path captures the targeted eﬀect. Thus, b = δ identiﬁes the targeted eﬀect, and SC = b + b = θ identiﬁes the causal spillover eﬀect.Many statistical software have functions for summing regression coeﬃcients and obtaining standard errors.Examples include Stata’s lincom command, R’s contrast package, and SAS’s SCORE procedure. The analysis is only slightly complicated in the presence of exposure-to-exposure spillover ( T ij → T ij (cid:48) ) –for example, when one child’s serious illness increases their sibling’s risk of illness. When T i → T i ( Figure1B ), the analysis does not change. However, if T i → T i ( Figure 1C ), then the interpretation of SC = θ changes from representing the entire spillover eﬀect of T i on Y i to capturing only the direct spillover eﬀect,since the indirect component of the spillover eﬀect that operates via the causal path T i → T i → Y i is closedbecause the regression controls for T i . See the Supplementary Material for details.

Settings with two-sided spillover

Analysts may also encounter scenarios with two-sided spillover. In our example, each siblings’ health shockcould aﬀect the other’s academic performance ( T i → Y i and T i → Y i ). Reﬂecting this possibility, Figure2A modiﬁes the baseline model of

Figure 1A to allow spillover T i → Y i with eﬀect κ . The partial regressioncoeﬃcients in the gain-score approach identify b = θ − δ and b = δ − κ , so that SC = b + b = θ − κ .Consequently, with two-sided exposure-to-outcome spillover, the SC does not identify the spillover eﬀect of T i on Y i but instead the diﬀerence between the two exposure-to-outcome spillover eﬀects. However, if theanalyst can defend assumptions about one or more of the signs of the two spillover eﬀects, then the SC remains informative even though it no longer point-identiﬁes θ . Speciﬁcally, if κ >

0, the SC underestimates(i.e., gives a lower bound for) θ . By contrast, if κ <

0, then the SC overestimates (gives an upper bound for) θ . One can make additional inferences about θ depending on the value of the SC and the assumed sign of κ . For example, if SC > κ >

0, then θ >

0. Of note, a ﬁnding that SC = 0 is uninformative, because i T i Y i T i Y i D i (2A) χ ψψγ δ θδ -1+1 κ U i (2B) T i Y i T i Y i D i χ ψψγ δ θδ -1+1 κτ U i (2C) T i Y i T i Y i D i χ ψψγ δ θδ -1+1 κφ Figure 2.

Causal directed acyclic graphs for linear data-generating models with two-sided exposure-to-outcome sibling spillover. Subscripts i and j denote cluster and sibling, respectively. T ij is theexposure, Y ij is the outcome, D i is the diﬀerence score, and U i is an unobserved family-level con-founder. Greek letters denote eﬀects. (2A) does not have exposure-to-exposure spillover, whereas(2B) and (2C) have exposure-to-exposure spillover.6 t is compatible with the possibility that the two spillover eﬀects are equal, θ = κ , and that they are bothzero, θ = κ = 0.If T ij → T ij (cid:48) in addition to two-sided spillover ( Figures 2B-C ), this does not aﬀect the interpretations of b and b , and SC still identiﬁes the diﬀerence between siblings’ unmediated spillover eﬀects. However, SC will not capture the mediated part of spillover eﬀect, T ij → T ij (cid:48) → Y ij (cid:48) . See the Supplementary Material for details.

Settings with spillovers from outcomes

Analysts may also encounter settings with outcome-to-outcome spillover ( Y ij → Y ij (cid:48) ) or outcome-to-exposurespillover ( Y ij → T ij (cid:48) ). In our setting, it is reasonable to assume that siblings’ academic outcomes may becausally related by outcome-to-outcome spillover. In contrast, an academic outcome causing a health shockis implausible, but exposure-to-outcome spillovers may be relevant elsewhere.If outcomes cause future exposures or outcomes ( Figure 3 ), then our estimator does not identify spilloversor simple functions of spillovers. See the

Supplementary Material for details.

We conducted nine Monte Carlo simulations, one simulation for each of the nine models in Figures 1-3 , todemonstrate when the method identiﬁes exposure-to-outcome spillover eﬀects. Our simulation model follows: U i , υ i , υ i ∼ N (0 , T i =  τ T i + χU i ≤ .

51 if τ T i + χU i > . T i =  φT i + ωY i + γU i ≤ .

21 if φT i + ωY i + γU i > . Y i = δT i + κT i + λY i + ψU i + υ i Y i = δT i + θT i + ηY i + ψU i + υ i D i = Y i − Y i We simulated each model with 1000 runs of 5000 observations each, where each observation representeda sibling pair. We set the following parameters at ﬁxed values: θ = 0 . δ = 1, ψ = 1, χ = 2, and γ = 3.Parameters distinguishing the models— κ , τ , φ , ω , η , and λ —were set to zero unless otherwise speciﬁed. Toavoid simultaneity, at least one parameter in each pair ( τ , φ ), ( η , λ ), and ( κ , ω ) was always set to zero.In each sample, we regressed the gain-score on siblings’ exposures and computed the spillover coeﬃcientaccording to equations (1) and (2). We conducted simulations in Stata Statistical Software: Release 16. Simulation code is in the

Supplementary Material . i T i Y i T i Y i D i (3A) χ ψψγ δ θδ -1+1 ω U i (3B) T i Y i T i Y i D i χ ψψγ δθδ -1+1 η U i (3C) T i Y i T i Y i D i χ ψψγ δθδ -1+1 λ Figure 3.

Causal directed acyclic graphs for linear data-generating models with one-sided exposure-to-outcome sibling spillover and spillover from outcomes. Subscripts i and j denote cluster and sibling,respectively. T ij is the exposure, Y ij is the outcome, D i is the diﬀerence score, and U i is an unobservedfamily-level confounder. Greek letters denote eﬀects. (3A) has outcome-to-exposure spillover, and(3B) and (3C) have outcome-to-outcome spillover.8 igure 4 displays the simulation results. The ﬁrst three rows conﬁrm that the spillover coeﬃcient isunbiased in the three settings with one-sided exposure-to-outcome spillover of Figure 1 , as the averageof estimated spillover coeﬃcient equals the known spillover eﬀect, (cid:100) SC F igure = 0 . Figure 2 with two-sided exposure-to-outcome spillover identiﬁes the diﬀerence between the two spillovers, (cid:100) SC F igure = 0 . − . . κ > (cid:100) SC F igure underestimates thespillover eﬀect, θ . The ﬁnal three simulations show that (cid:100) SC F igure is biased in all models of Figure 3 withspillovers from outcomes. Size and direction of the biases are fairly complicated functions of the coeﬃcientsin the data-generating model and can be large. The estimated ordinary least squares standard errors closelyresemble the empirical standard errors for each model, indicating that the built-in standard errors in Stata’slincom command are accurate. We applied the method to estimating the spillover eﬀect of a child’s preterm birth (gestational age < If a child is bornpreterm, parents may reallocate investments (time, ﬁnancial, or otherwise) from older siblings to support theyounger sibling’s health, thereby inhibiting the older siblings’ development, including early literacy.For this application, we analyzed Big Data for Little Kids (BD4LK), a longitudinal cohort of birth recordsfor all live in-state resident deliveries in Wisconsin during 2007-2016 (N > PALS-K evaluates readiness for kindergarten-level literacy instruction on six domains (rhyme awareness; beginning sound awareness; alphabet knowledge;letter sounds; spelling; word concept).

In Wisconsin, children must be ﬁve years-old at kindergartenenrollment to qualify for PALS-K testing. Our analysis includes 20,010 sibling pairs (40,020 children) thatwere sequentially-born from diﬀerent deliveries to the same biological mother and had non-missing English-language PALS-K test scores and covariates. The

Supplementary Material contains the full samplingdescription.We estimate the following gain-score regression model, D i = b P T B i + b P T B i + β C i + v i where D i = P ALSK i − P ALSK i . Subscripts i = 1 , . . . N and j = 1 , j = 1 is the younger sibling. P T B ij is a binary preterm birth indicator (1 if preterm;0 otherwise), P ALSK ij is the continuous PALS-K score (0-102 points), and C i is a vector of covariatesmeasured at the older sibling’s delivery, which may be empty. Covariates include maternal age (years),maternal education (no high school diploma; high school diploma/equivalent; 1-3 years college; 4+ yearscollege), and Medicaid delivery payment. D i is the gain-score estimator. q = 0.5)( q – k = 0.2)Figure 1A1B1CFigure 2A2B2CFigure 3A3B3C S i b li ng S p ill o v e r M ode l -.5 0 .5 1 1.5 Mean Spillover Coefficient (95% CI)

Figure 4.

Results (average spillover coeﬃcients and empirical 95% conﬁdence intervals [CI]) fromsimulations of the nine sibling spillover models in

Figures 1-3 . Each simulation consisted of 1000runs of 5000 observations, where each observation represented a sibling pair. Subscripts i and j indicate cluster and sibling, respectively. T ij is the exposure and Y ij is the outcome. The targetquantity is the spillover eﬀect ( T i → Y i ), set to θ = 0 . κ ( T i → Y i ), τ ( T i → T i ), φ ( T i → T i ), ω ( Y i → T i ), η ( Y i → Y i ), and λ ( Y i → Y i ). Exceptfor θ , all spillover parameters were set to zero except in the following cases: κ = 0 . τ = 0 . φ = 0 . ω = 0 . η = 0 . λ = 0 . θ in all models of one-sided spillover ( Figure 1 ); identiﬁes thediﬀerence between the two spillover eﬀects with two-sided spillover (

Figure 2 ); and is biased in thepresence of spillovers originating from outcomes (

Figure 3 ).10 e ran the model twice, once with and once without covariates. In each model, we summed the regressioncoeﬃcients on both siblings’ preterm birth indicators to compute the SC = b + b . Assuming the one-sidedspillover model of Figure 1A , SC from the regression without covariates identiﬁes the eﬀect of a youngersibling’s preterm birth on the older sibling’s PALS-K score. Additionally, b identiﬁes the eﬀect of eachsibling’s preterm birth on their own PALS-K score. We performed all analyses in Stata Statistical Software:Release 16. The University of Wisconsin-Madison minimal risk institutional review board approved ourproject.

Supplemental Tables 1 and 2 summarize baseline characteristics of our sample (

SupplementaryMaterial ). Preterm birth incidence was slightly greater among older siblings relative to younger siblings(6.78% vs. 6.65%). On average, older siblings received slightly lower PALS-K scores (mean 63.58 points; SD24.12 points) relative to younger siblings (mean 64.22 points; SD 23.83 points). Approximately 10% of siblingclusters had discordant preterm birth exposure. In the regression without covariate adjustment, the oldersibling’s preterm birth coeﬃcient was ˆ b = -2.49 points (95% CI -3.83, -1.15 points), the younger sibling’spreterm birth coeﬃcient was ˆ b = 0.38 points (95% CI: -0.97, 1.73 points), and the resulting (cid:100) SC was -2.11points (95% CI: -3.82, -0.40 points) ( Table 1 ). This indicates that a younger sibling’s preterm birth modestlyharmed their older sibling’s PALS-K performance.

Figure 5 displays these results graphically relative tothe assumed data-generating model. However, covariate adjustment attenuated the (cid:100) SC to -1.49 points (95%CI -3.21, 0.22 points). We described a simple approach to identifying spillovers with gain-scores in sibling pairs. This method canpoint-identify spillovers if only one sibling’s exposure aﬀects the other’s outcome, and it can identify thediﬀerence in siblings’ spillovers in the presence of two-sided spillover. The method leverages the primarybeneﬁt of FE estimation: controlling for family-level, sibling-invariant, unobserved confounding. Whereaspreceding epidemiologic research on spillover identiﬁcation primarily considered infectious diseases, our workcontributes to the growing literature on spillovers within families.We acknowledge some limitations. First, we restricted our attention to linear settings. This methoddoes not necessarily apply to contexts with nonlinear relationships, such as those with binary outcomes (seeSj¨olander et al. (2016) for binary outcomes in our Figure 1A ). Second, we did not consider clusters ofthree or more siblings. Spillovers that originate from larger sibling clusters may pose unique challenges thatare unaddressed here – for example, whether one can identify the eﬀect of a middle child’s exposure on theyoungest sibling’s outcome if an eldest sibling’s exposure aﬀects all siblings’ outcomes. Lastly, we did nottest the method in the presence of shared mediator or collider variables. Sj¨olander and Zetterqvist (2017)interrogated sibling comparison models with shared mediators and colliders, ﬁnding that such factors mayinduce bias. Nonetheless, our paper lays groundwork for subsequent research. Speciﬁc avenues that advance thismethod include testing in nonlinear settings or settings with shared mediator variables, as well as expandingmodels to allow three or more siblings. able 1. Ordinary least squares regression of the diﬀerence in siblings’ PALS-K scores a (points) ontheir preterm birth statuses (N = 20,010 sibling pairs) Unadjusted Regression Adjusted Regression b Coeﬃcient (95% CI) Coeﬃcient (95% CI)

Preterm birth (gestational age <

37 weeks)

Older sibling -2.49 (-3.83, -1.15) -2.28 (-3.62, -0.94)

Younger sibling c -2.11 (-3.82, -0.40) -1.49 (-3.21, 0.22) a The diﬀerence in PALS-K scores equals the older sibling’s PALS-K Score minus the younger sibling’sPALS-K score. b Covariates include maternal age at delivery (years), maternal education at delivery (no high schooldiploma; high school diploma/equivalent; 1-3 years college; 4+ years college) and Medicaid deliverypayment (no; yes), all of which were measured at the time of the older sibling’s delivery. c The spillover coeﬃcient is the sum of the partial regression coeﬃcients for the older sibling’s pretermbirth indicator and the younger sibling’s preterm birth indicator. Assuming one-sided spillover asin

Figure 1A , this identiﬁes the eﬀect of a younger sibling’s preterm birth on the older sibling’sPALS-K score.Abbreviations: “CI” conﬁdence interval; “PALS-K” Phonological Awareness Literacy Screening-Kindergarten. 12 i PTB i PALSK i PTB i PALSK i D i χ ψψγ ˆ δ = − . ˆ θ = − . ˆ δ = − . -1+1 Figure 5.

A directed acyclic graph of the relationship between siblings’ preterm birth (gestationalage <

37 weeks) and their score on the Phonological Awareness Literacy Assessment-Kindergartentest with overlaid estimates. Subscripts i and j denote cluster and sibling, respectively, where j = 1is the younger sibling and j = 2 is the older sibling. P T B ij is a preterm birth indicator, P ALSK ij is the test score, D i is a diﬀerence score, and U i is an unobserved confounder. Greek letters denoteeﬀects, and the values of θ and δ are estimated using gain-score regression.13 eferences [1] Ogburn EL, VanderWeele TJ. Causal diagrams for interference. Stat Sci :559-578.[2] VanderWeele TJ. Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford:Oxford University Press, 2015.[3] Sj¨olander A, Frisell T, Kuja-Halkola R, ¨Oberg S, Zetterqvist J. Carryover eﬀects in sibling comparisondesigns. Epidemiol :852-858.[4] Kuh D, Ben-Shlomo Y, Lynch J, Hallqvist J, Power C. Life course epidemiology. J Epidemiol CommunityHealth :778-783.[5] Lawlor DA, Mishra GD. Family Matters: Designing, Analysing and Understanding Family Based Studiesin Life Course Epidemiology. 1st edn. Oxford: Oxford University Press, 2009.[6] Liu S, Jones RN, Glymour MM. Implications of lifecourse epidemiology for research on determinants ofadult disease. Public Health Rev :489–511.[7] Feinberg ME, Solmeyer AR, McHale SM. The third rail of family systems: sibling relationships, mentaland behavioral health, and preventive intervention in childhood and adolescence. Clin Child Fam PsycholRev :43–57.[8] Viner RM, Ross D, Hardy R, et al. Life course epidemiology: recognising the importance of adolescence. J Epidemiol Community Health :719-720.[9] Ben-Shlomo Y, Cooper R, Kuh D. The last two decades of life course epidemiology, and its relevance forresearch on ageing. Int J Epidemiol :973-988.[10] De Neve JW, Kawachi I. Spillovers between siblings and from oﬀspring to parents are understudied: areview and future directions for research. Soc Sci Med :56-61.[11] Morris AS, Robinson LR, Hays-Grudo J, Claussen AH, Hartwig SA, Treat AE. Targeting parenting inearly childhood: a public health approach to improve outcomes for children living in poverty.

Child Dev :388-397.[12] Halloran ME, Struchiner CJ. Study designs for dependent happenings. Epidemiol :331–38.[13] Halloran ME, Struchiner CJ. Causal inference in infectious diseases. Epidemiol :142–51.[14] Longini IM, Sagatelian K, Rida WN, Halloran ME. Optimal vaccine trial design when estimating vaccineeﬃcacy for susceptibility and infectiousness from multiple populations. Stat Med :1121–36.[15] Hudgens MG, Halloran ME. Toward causal inference with interference. J Am Stat Assoc :832-842.

16] VanderWeele TJ, Tchetgen Tchetgen EJ. Eﬀect partitioning under interference in two-stage randomizedvaccine trials.

Stat Probabil Lett :861–69.[17] Clemens J, Shin S, Ali M. New approaches to the assessment of vaccine herd protection in clinical trials. Lancet Infect Dis :482–87.[18] Halloran ME. The minicommunity design to assess indirect eﬀects of vaccination. Epidemiol Methods :83–105.[19] Tchetgen Tchetgen EJ, VanderWeele TJ. On causal inference in the presence of interference. Stat Meth-ods Med Res :55–75.[20] VanderWeele TJ, Tchetgen Tchetgen EJ, Halloran M. Components of the indirect eﬀect in vaccine trials:identiﬁcation of contagion and infectiousness eﬀects. Epidemiol :751–61.[21] Halloran ME, Hudgens MG. Dependents happenings: a recent methodological review. Curr EpidemiolRep :297-305.[22] Benjamin-Chung J, Arnold BF, Berger D, et al. Spillover eﬀects in epidemiology: parameters, studydesigns and methodological considerations. Int J Epidemiol :332-347.[23] Gunasekara FI, Richardson K, Carter K, Blakely T. Fixed eﬀects analysis of repeated measures data. Int J Epidemiol :264-269.[24] Imai K, Kim IS. When should we use unit ﬁxed eﬀects regression models for causal inference withlongitudinal data? Am J Pol Sci :467-490.[25] Black SE, Breining S, Figlio DN, et al. Sibling spillovers. Econ J

Sociol Methods Res

Epidemiol :37-48.[28] Pearl J. Causality: Models, Reasoning, and Inference. 2nd edn. Cambridge: Cambridge University Press,2009.[29] Shpitser I, VanderWeele T, Robins JM. On the validity of covariate adjustment for estimating causaleﬀects. arXiv, doi:1203.3515, 15 March 2012, preprint: not peer reviewed.[30] Elwert F. Graphical causal models. In: Morgan SL (ed). Handbook of Causal Analysis for Social Re-search. Dordrecht: Springer Netherlands, 2013, pp. 245-273.[31] Pearl J. Linear models: a useful “microscope” for causal analysis. J Causal Inference :155-170.[32] Morgan SL, Winship C. Counterfactuals and Causal Inference: Methods and Principles for Social Re-search. 2nd edn. Cambridge: Cambridge University Press, 2014.

33] Sobel ME. What do randomized studies of housing mobility demonstrate? Causal inference in the faceof interference.

J Am Stat Assoc

Adv Econ :429-77.[38] StataCorp. Stata Statistical Software: Release 16. College Station, TX: StataCorp LLC; 2019.[39] Mathiasen R, Hansen BM, Andersen AM, Forman JL, Greisen G. Gestational age and basic schoolachievements: a national follow- up study in Denmark. Pediatrics :e1553-e1561.[40] Mallinson DC, Grodsky E, Ehrenthal DB. Gestational age, kindergarten-level literacy, and eﬀect modiﬁ-cation by maternal socio-economic and demographic factors.

Paediatr Perinat Epidemiol :467-479.[41] Larson A, Berger LM, Mallinson DC, Grodsky E, Ehrenthal DB. Variable uptake of Medicaid-coveredPrenatal Care Coordination: the relevance of treatment level and service context. J Community Health :32-43.[42] Ford K, Invernizzi M. PALS Espa˜nol K Technical Reference. Charlottesville, VA: University of VirginiaCurry School of Education; 2014.[43] Invernizzi M, Juel C, Swank L, Meier J. PALS-K Technical Reference. Charlottesville: University ofVirginia Curry School of Education, 2015.[44] Wisconsin Department of Public Instruction. Admissions and early entrance to four- and ﬁve-year-oldkindergarten. https://dpi.wi.gov/early-childhood/kind/admission (1 September 2020, date last accessed).[45] Sj¨olander A, Zetterqvist J. Confounders, mediators, or colliders. Epidemiol :540-547. stimating Sibling Spillover Eﬀects with UnobservedConfounding Using Gain-Scores Supplementary Material

Derivations for Models with One-Sided Spillover

Base model with one-sided spillover (Figure 1A)

We ﬁrst consider models with one-sided exposure-to-outcome spillover ( T i → Y i , but T i does not directlyaﬀect Y i ). Under the data generating process in Figure 1A , we point identify the spillover eﬀect θ . Forreference, β denotes a regression coeﬃcient, ρ denotes a correlation coeﬃcient, σ denotes standard deviation,and σ denotes variance. We compute the following partial regression coeﬃcients: β D i T i · T i = ρ D i T i − ρ D i T i ∗ ρ T i T i − ρ T i T i ∗ σ D i σ T i β D i T i · T i = ρ D i T i − ρ D i T i ∗ ρ T i T i − ρ T i T i ∗ σ D i σ T i The correlation coeﬃcient between two variables is equal to the sum of the products of path coeﬃcientsand of the “root variable’s” variation divided by the product the variables’ standard deviations. A “rootvariable” is the variable on the path with no incoming arrows. See Pearl (2013) for details. We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ T i δ + σ T i θ + σ U i δχγ + σ U i ψχσ D i σ T i = − σ T i δ + σ T i θ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + σ U i θχγ + σ T i δ + σ U i ψγσ D i σ T i = − σ U i δχγ + σ U i θχγ + σ T i δσ D i σ T i ρ T i T i = σ U i χγσ T i σ T i We plug the correlation coeﬃcients into the formulas for our partial regression coeﬃcients, and we derive thefollowing: S1 D i T i · T i = − σ Ti δ + σ Ti θ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ + σ Ui θχγ + σ Ti δσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγσ Ti σ Ti ) ∗ σ D i σ T i = θ − δβ D i T i · T i = − σ Ui δχγ + σ Ui θχγ + σ Ti δσ Di σ Ti − − σ Ti δ + σ Ti θ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγσ Ti σ Ti ) ∗ σ D i σ T i = δ We can then compute the spillover coeﬃcient ( SC ) to point identify θ : SC = β D i T i · T i + β D i T i · T i = θ − δ + δ = θ One-sided spillover and exposure-to-exposure spillover (Figures 1B-1C)

Under the data generating process in

Figure 1B , sibling 2’s exposure aﬀects sibling 1’s exposure ( T i → T i ).Regardless, we still point identify the spillover eﬀect θ . We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ U i ψγτ + ( − σ T i δ + σ T i θ + σ T i δτ + σ U i δχγ + σ U i ψγτ + σ U i ψχσ D i σ T i = − σ T i δ + σ T i θ + σ T i δτ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + ( − σ T i δτ + σ U i θχγ + σ T i θτ + σ T i δ + σ U i ψγσ D i σ T i = − σ U i δχγ − σ T i δτ + σ U i θχγ + σ T i θτ + σ T i δσ D i σ T i ρ T i T i = σ U i χγ + σ T i τσ T i σ T i We compute β D i T i · T i and β D i T i · T i : S2 D i T i · T i = − σ Ti δ + σ Ti θ + σ Ti δτ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ − σ Ti δτ + σ Ui θχγ + σ Ti θτ + σ Ti δσ Di σ Ti ∗ σ Ui χγ + σ Ti τσ Ti σ Ti − ( σ Ui χγ + σ Ti τσ Ti σ Ti ) ∗ σ D i σ T i = θ − δβ D i T i · T i = − σ Ui δχγ − σ Ti δτ + σ Ui θχγ + σ Ti θτ + σ Ti δσ Di σ Ti − − σ Ti δ + σ Ti θ + σ Ti δτ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγ + σ Ti τσ Ti σ Ti − ( σ Ui χγ + σ Ti τσ Ti σ Ti ) ∗ σ D i σ T i = δ Finally, we compute SC : SC = β D i T i · T i + β D i T i · T i = θ − δ + δ = θ The data generating process in

Figure 1C also has exposure-to-exposure spillover. In this setting, sibling1’s exposure aﬀects sibling 2’s exposure ( T i → T i ). Still, we point identify the spillover eﬀect θ . We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ T i δ + σ T i θ + σ T i δφ + σ U i δχγ + σ U i ψχσ D i σ T i = − δσ T i + σ T i θ + σ T i δφ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχφ + ( − σ U i δχγ + ( − σ T i δφ + σ U i θχγ + σ T i θφ + σ T i δ + σ U i ψχφ + σ U i ψγσ D i σ T i = − σ U i δχγ − σ T i δφ + σ U i θχγ + σ T i θφ + σ T i δσ D i σ T i ρ T i T i = σ U i χγ + σ T i φσ T i σ T i S3 e compute β D i T i · T i and β D i T i · T i : β D i T i · T i = − δσ Ti + σ Ti θ + σ Ti δφ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ − σ Ti δφ + σ Ui θχγ + σ Ti θφ + σ Ti δσ Di σ Ti ∗ σ Ui χγ + σ Ti φσ Ti σ Ti − ( σ Ui χγ + σ Ti φσ Ti σ Ti ) ∗ σ D i σ T i = θ − δβ D i T i · T i = − σ Ui δχγ − σ Ti δφ + σ Ui θχγ + σ Ti θφ + σ Ti δσ Di σ Ti − − δσ Ti + σ Ti θ + σ Ti δφ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγ + σ Ti φσ Ti σ Ti − ( σ Ui χγ + σ Ti φσ Ti σ Ti ) ∗ σ D i σ T i = δ Finally, we compute SC : SC = β D i T i · T i + β D i T i · T i = θ − δ + δ = θ S4 Derivations for Models with Two-Sided Spillover

Base model with two-sided spillover (Figure 2A)

We now consider models with two-sided exposure-to-outcome spillover ( T i → Y i and T i → Y i ). Underthe data generating process in Figure 2A , we can identify the diﬀerence between θ and κ . This may beinformative depending on the value of κ (for example, if κ >

0, then SC underestimates θ ). We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ T i δ + ( − σ U i κχγ + σ T i θ + σ U i δχγ + σ U i ψχσ D i σ T i = − σ T i δ − σ U i κχγ + σ T i θ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + ( − σ T i κ + σ U i θχγ + σ T i δ + σ U i ψγσ D i σ T i = − σ U i δχγ − σ T i κ + σ U i θχγ + σ T i δσ D i σ T i ρ T i T i = σ U i χγσ T i σ T i We compute β D i T i · T i and β D i T i · T i : β D i T i · T i = − σ Ti δ − σ Ui κχγ + σ Ti θ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ − σ Ti κ + σ Ui θχγ + σ Ti δσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγσ Ti σ Ti ) ∗ σ D i σ T i = θ − δβ D i T i · T i = − σ Ui δχγ − σ Ti κ + σ Ui θχγ + σ Ti δσ Di σ Ti − − σ Ti δ − σ Ui κχγ + σ Ti θ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγσ Ti σ Ti ) ∗ σ D i σ T i = δ − κ S5 inally, we compute SC : SC = β D i T i · T i + β D i T i · T i = θ − δ + δ − κ = θ − κ Two-sided spillover and exposure-to-exposure spillover (Figures 2B-2C)

Under the data generating process in

Figure 2B , we can still identify the diﬀerence between θ and κ evenwith exposure-to-exposure spillover. We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ U i ψτ γ + ( − σ T i δ + ( − σ T i κτ + ( − σ U i κχγ + σ U i θ + σ T i δτ + σ U i δχγ + σ U i ψτ γ + σ U i ψχσ D i σ T i = − σ T i δ − σ T i κτ − σ U i κχγ + σ U i θ + σ T i δτ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + ( − σ T i δτ + ( − σ T i κ + σ U i θχγ + σ T i θτ + σ T i δ + σ U i ψγσ D i σ T i = − σ U i δχγ − σ T i δτ − σ T i κ + σ U i θχγ + σ T i θτ + σ T i δσ D i σ T i ρ T i T i = σ U i χγ + σ T i τσ T i σ T i We compute β D i T i · T i and β D i T i · T i : β D i T i · T i = − σ Ti δ − σ Ti κτ − σ Ui κχγ + σ Ui θ + σ Ti δτ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ − σ Ti δτ − σ Ti κ + σ Ui θχγ + σ Ti θτ + σ Ti δσ Di σ Ti ∗ σ Ui χγ + σ Ti τσ Ti σ Ti − ( σ Ui χγ + σ Ti τσ Ti σ Ti ) ∗ σ D i σ T i = θ − δ S6 D i T i · T i = − σ Ui δχγ − σ Ti δτ − σ Ti κ + σ Ui θχγ + σ Ti θτ + σ Ti δσ Di σ Ti − − σ Ti δ − σ Ti κτ − σ Ui κχγ + σ Ui θ + σ Ti δτ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγ + σ Ti τσ Ti σ Ti − ( σ Ui χγ + σ Ti τσ Ti σ Ti ) ∗ σ D i σ T i = δ − κ Finally, we compute SC : SC = β D i T i · T i + β D i T i · T i = θ − δ + δ − κ = θ − κ We also consider the data generating process in

Figure 2C . This is identical to that of

Figure 2B except T i → T i with an eﬀect of φ . Changing the direction of the exposure-to-exposure spillover does not aﬀectthe partial regression coeﬃcients and the spillover coeﬃcient. S7 Derivations for Models with Spillovers from Outcomes

Past outcome aﬀects future exposure (Figure 3A)

We lastly consider models in which spillovers originate from outcomes. Under the data generating process in

Figure 3A , we do not identify the spillover eﬀect θ . We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ T i δ + σ T i θ + σ T i δ ω + σ U i δχγ + σ U i ψχσ D i σ T i = − σ T i δ + σ T i θ + σ T i δ ω + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + ( − σ Y i ω + σ U i θχγ + σ U i θωψχ + σ T i δ + σ U i δωψχ + σ U i ωψ + σ U i ψγσ D i σ T i = − σ U i δχγ − σ Y i ω + σ U i θχγ + σ U i θωψχ + σ T i δ + σ U i δωψχ + σ U i ωψ σ D i σ T i ρ T i T i = σ U i χγ + σ U i ωψχ + σ T i δωσ T i σ T i We compute β D i T i · T i and β D i T i · T i : β D i T i · T i = − σ Ti δ + σ Ti θ + σ Ti δ ω + σ Ui δχγσ Di σ Ti − − σ U i δχγ − σ Y i ω + σ U i θχγ + σ U i θωψχ + σ T i δ + σ U i δωψχ + σ U i ωψ σ Di σ Ti ∗ σ Ui χγ + σ Ui ωψχ + σ Ti δωσ Ti σ Ti − ( σ Ui χγ + σ Ui ωψχ + σ Ti δωσ Ti σ Ti ) ∗ σ D i σ T i β D i T i · T i = − σ Ui δχγ − σ Yi ω + σ Ui θχγ + σ Ui θωψχ + σ Ti δ + σ Ui δωψχ + σ Ui ωψ σ Di σ Ti − − σ Ti δ + σ Ti θ + σ Ti δ ω + σ Ui δχγσ Di σ Ti ∗ σ Ui χγ + σ Ui ωψχ + σ Ti δωσ Ti σ Ti − ( σ Ui χγ + σ Ui ωψχ + σ Ti δωσ Ti σ Ti ) ∗ σ D i σ T i In this data generating process, simplifying the partial regression coeﬃcients will does not isolate the targetedeﬀect δ nor the spillover eﬀect θ . Additionally, computing SC by summing the previously calculated valuesof β D i T i · T i and β D i T i · T i will not identify θ . S8 utcome-to-outcome spillover (Figures 3B-3C) Under the data generating process in

Figure 3B , sibling 1’s outcome aﬀects sibling 2’s outcome ( Y i → Y i ),and we do not identify the spillover eﬀect θ . We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ T i δ + σ U i ηψχ + σ T i ηδ + σ T i θ + σ U i δχγ + σ U i ψχσ D i σ T i = − σ T i δ + σ U i ηψχ + σ T i ηδ + σ T i θ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + σ U i ηψγ + σ T i ηδ + σ U i θχγ + σ T i δ + σ U i ψγσ D i σ T i = − σ U i δχγ + σ U i ηψγ + σ T i ηδ + σ U i θχγ + σ T i δσ D i σ T i ρ T i T i = σ U i χγσ T i σ T i We compute β D i T i · T i and β D i T i · T i : β D i T i · T i = − σ Ti δ + σ Ui ηψχ + σ Ti ηδ + σ Ti θ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ + σ Ui ηψγ + σ Ti ηδ + σ Ui θχγ + σ Ti δσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγ + σ Ti φσ Ti σ Ti ) ∗ σ D i σ T i β D i T i · T i = − σ Ui δχγ + σ Ui ηψγ + σ Ti ηδ + σ Ui θχγ + σ Ti δσ Di σ Ti − − σ Ti δ + σ Ui ηψχ + σ Ti ηδ + σ Ti θ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγσ Ti σ Ti ) ∗ σ D i σ T i In this data generating process, simplifying the partial regression coeﬃcients will does not isolate the targetedeﬀect δ nor the spillover eﬀect θ . Additionally, computing SC by summing the previously calculated valuesof β D i T i · T i and β D i T i · T i will not identify θ . S9 he data generating process in Figure 3C has outcome-to-outcome spillover in which sibling 2’s outcomeaﬀects sibling 1’s outcome ( Y i → Y i ). Similarly, we do not identify the spillover eﬀect θ . We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ T i δ + ( − σ T i λθ + ( − σ U i λδχγ + σ U i δχγ + σ U i ψχσ D i σ T i = − σ T i δ − σ T i λθ − σ U i λδχγ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + ( − σ U i λθχγ + ( − σ T i λδ + σ U i θχγ + σ T i δ + σ U i ψγσ D i σ T i = − σ U i δχγ − σ U i λθχγ − σ T i λδ + σ U i θχγ + σ T i δσ D i σ T i ρ T i T i = σ U i χγσ T i σ T i We compute β D i T i · T i and β D i T i · T i : β D i T i · T i = − σ Ti δ − σ Ti λθ − σ Ui λδχγ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ − σ Ui λθχγ − σ Ti λδ + σ Ui θχγ + σ Ti δσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγ + σ Ti φσ Ti σ Ti ) ∗ σ D i σ T i β D i T i · T i = − σ Ui δχγ − σ Ui λθχγ − σ Ti λδ + σ Ui θχγ + σ Ti δσ Di σ Ti − − σ Ti δ − σ Ti λθ − σ Ui λδχγ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγσ Ti σ Ti ) ∗ σ D i σ T i In this data generating process, simplifying the partial regression coeﬃcients will does not isolate the targetedeﬀect δ nor the spillover eﬀect θ . Additionally, computing SC by summing the previously calculated valuesof β D i T i · T i and β D i T i · T i will not identify θ . S10

Sampling Description for Empirical Application

For our application, we estimated the eﬀect of a younger sibling’s preterm birth (gestational age <

37 com-pleted weeks) on their older sibling’s Phonological Awareness Literacy Screening-Kindergarten (PALS-K) testscore. PALS-K evaluates fundamental literacy skills at kindergarten entry.

We used data from Big Datafor Little Kids (BD4LK), a cohort of birth records for live resident in-state deliveries in Wisconsin during2007-2016 (N > We sampled sibling pairs born sequentially to the same mother between January 1, 2007-September 1,2010 and took the English-language PALS-K test. We restricted eligibility on birthdate because children hadto be at least ﬁve years old by September 1, 2015 for PALS-K testing eligibility. PALS-K is available in twolanguages: English and Spanish.

However, test versions were developed separately and are not directlycomparable, so we only considered children who took the English-language test.BD4LK includes 252,883 unique deliveries during January 1, 2007-September 1, 2010. We identiﬁed806 records (0.3%) with multiple maternal or child identiﬁers that imperfectly matched to Medicaid claimsor PALS-K scores. Among those records, we randomly selected one match and excluded the remainder.We then identiﬁed 177,863 records (70.3%) that linked to PALS-K scores, of which 46,743 records were in-sample siblings (26.3% of test-linked records). Finally, we pulled eligible sibling pairs: sequentially-born fromdiﬀerent deliveries, took the English-language PALS-K test, and completed information on control variables(maternal age, maternal education, and Medicaid delivery coverage, all of which were measured at the oldersibling’s delivery). This generated a ﬁnal sample of 40,020 siblings (85.6% of tested siblings), or 20,010 siblingpairs.

S11

Supplemental Tables

Supplemental Table 1.

Descriptive statistics of sibling pairs for the empirical application (N =20,010 sibling pairs a ) Older Siblings Younger Siblings(N = 20,010) (N = 20,010)

PALS-K Score (Points) c , Mean (SD) 63.58 (24.12) 64.22 (23.83)Preterm Birth b , N (%) 1,357 (6.78%) 1,331 (6.65%)Maternal Age (Years) d , Mean (SD) 26.01 (5.12) –Maternal Education d , N (%) No high school degree

High school degree/equivalent only

4+ years college d , N (%) 7,528 (37.62%) – a Each sibling pair includes one older sibling and one younger sibling. The sample consists of 40,020children in total. b Preterm birth is deﬁned as gestational age <

37 completed weeks. c PALS-K has a score range of 0-102 points. d Measured at the older sibling’s delivery.Notes: “PALS-K” Phonological Awareness Literacy Screening-Kindergarten, “SD” standard devia-tion. S12 upplemental Table 2.

Cross-tabulation of preterm birth a by sibling within two-sibling clusters(N = 20,010 pairs b ) Younger Sibling Younger Sibling TotalNot Preterm PretermOlder Sibling

Not Preterm (94.63%) (5.37%) (100.00%)(94.50%) (74.21%) (93.22%)

Older Sibling

Preterm (75.68%) (24.32%) (100.00%)(5.50%) (24.79%) (6.78%)

Total a Preterm birth is deﬁned as gestational age <

37 completed weeks. b Each sibling pair includes one older sibling and one younger sibling. The sample consists of 40,020children in total.Notes: Whole numbers indicate the frequency of sibling pairs. Within a cell, the ﬁrst percentageindicates the row percentage (i.e., the percent of younger siblings who are preterm or not pretermby the older sibling’s preterm birth status), and the second percentage is a column percentage (i.e.,the percent of older siblings who are preterm or not preterm by the younger sibling’s preterm birthstatus). S13

Simulation Code

We provide the Stata Statistical Software code for replicating the simulation in our main text. Note thatwe conducted simulations in Stata Statistical Software: Release 16. This code may not be compatible witholder versions of the software. clearset more off/*******************************************************/*0. Description/*******************************************************//*This code simulates identification of sibling spillover effects in a fixedeffects data generating process. Monte Carlo simulations run theregressions over many randomly generated samples and then stores thespillover coefficients in a separate data set.The spillover of interest is the effect of sibling 1’s exposure on sibling 2’soutcome (sp).We run nine simulations in total, each of which correspond to a uniquespillover model (see manuscript).*//*Definitions:T1 = sibling 1’s exposureY1 = sibling 1’s outcomeT2 = sibling 2’s exposureY2 = sibling 2’s outcome*//*Sections:0. Description

S14 . Input, variable, and output definitions2. Simulations3. Output for tables and figures*//*******************************************************/*1. Input, variable, and output definitions/*******************************************************//*Input Information:1. ie_1 = targeted effect2. sp = spillover effect of sibling 1’s exposure on sibling 2’s outcome3. ts = exposure-to-exposure spillover4. sp2 = spillover effect of sibling 2’s exposure on sibling 1’s outcome5. sp3 = spillover effect of sibling 1’s outcome on sibling 2’s exposure6. os = outcome-to-outcome spillover*//*Variable Information:1. cluster = identifier for family/sibling cluster2. fe = family fixed effects3. t_1 = sibling 1’s exposure4. t_2 = sibling 2’s exposure5. y_1 = sibling 1’s outcome6. y_2 = sibling 2’s outcome7. d = difference of outcomes8. e_1 = error term for sibling 1’s outcome9. e_2 = error term for sibling 2’s outcome*//*Output Information:1. sc_mc

S15 imulated samples (

S16 orvalues i = 1(1)‘mc’ {if floor((‘i’-1)/100)==((‘i’-1)/100) {noisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)*N/A*spillover effect (T2 to Y1)*N/A*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for sibling 1gen e_1=rnormal(0,1)*error term for sibling 2gen e_2=rnormal(0,1)

S17 generate sibling 1’s exposure**t_1 affected by fegen t_1=(fe_t1*fe)>0.5*generate sibling 2’s exposure**t_2 affected by fegen t_2=(fe_t2*fe)>0.2*generate sibling 1’s outcome**y_1 affected by fe, t_1, e_1gen y_1=(ie*t_1)+fe+e_1*generate sibling 2’s outcome**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore

S18 store simulation results in data setreplace sc_mc1=‘sc_sim’ in ‘i’replace sc_se_mc1=‘sc_se_sim’ in ‘i’replace sc_ub_mc1=‘sc_ub_sim’ in ‘i’replace sc_lb_mc1=‘sc_lb_sim’ in ‘i’replace cov1=0replace cov1=1 if sc_lb_mc1<=0.5 & sc_ub_mc1>=0.5}}/************/*SIMULATION 2*Spillover from T1 to Y2*Spillover from T2 to T1*Figure 1B/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc2=.**spillover coefficient standard errorgen sc_se_mc2=.**spillover coefficient 95% CI boundsgen sc_ub_mc2=.gen sc_lb_mc2=.**coverage indicatorgen cov2=.quietly{forvalues i = 1(1)‘mc’ {

S19 f floor((‘i’-1)/100)==((‘i’-1)/100) {noisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)gen ts=0.3*spillover effect (T2 to Y1)*N/A*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)

S20 generate T2**t_2 affected by fegen t_2=(fe_t2*fe)>0.2*generate T1**t_1 affected by fe, t_2gen t_1=(fe_t1*fe)+(ts*t_2)>0.5*generate Y1**y_1 affected by fe, t_1, e_1gen y_1=(ie*t_1)+fe+e_1*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore

S21 store simulation results in data setreplace sc_mc2=‘sc_sim’ in ‘i’replace sc_se_mc2=‘sc_se_sim’ in ‘i’replace sc_ub_mc2=‘sc_ub_sim’ in ‘i’replace sc_lb_mc2=‘sc_lb_sim’ in ‘i’replace cov2=0replace cov2=1 if sc_lb_mc2<=0.5 & sc_ub_mc2>=0.5}}/************/*SIMULATION 3*Spillover from T1 to Y2*Spillover from T1 to T2*Figure 1C/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc3=.**spillover coefficient standard errorgen sc_se_mc3=.**spillover coefficient 95% CI boundsgen sc_ub_mc3=.gen sc_lb_mc3=.**coverage indicatorgen cov3=.quietly{forvalues i = 1(1)‘mc’ {

S22 f floor((‘i’-1)/100)==((‘i’-1)/100) {noisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)gen ts=0.3*spillover effect (T2 to Y1)*N/A*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)*generate T1

S23 *t_1 affected by fegen t_1=(fe_t1*fe)>0.5*generate T2**t_2 affected by fe, t_1gen t_2=(fe_t2*fe)+(ts*t_1)>0.2*generate Y1**y_1 affected by fe, t_1, e_1gen y_1=(ie*t_1)+fe+e_1*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore*store simulation results in data set

S24 eplace sc_mc3=‘sc_sim’ in ‘i’replace sc_se_mc3=‘sc_se_sim’ in ‘i’replace sc_ub_mc3=‘sc_ub_sim’ in ‘i’replace sc_lb_mc3=‘sc_lb_sim’ in ‘i’replace cov3=0replace cov3=1 if sc_lb_mc3<=0.5 & sc_ub_mc3>=0.5}}/************/*SIMULATION 4*Spillover from T1 to Y2*Spillover from T2 to Y1*Figure 2A/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc4=.**spillover coefficient standard errorgen sc_se_mc4=.**spillover coefficient 95% CI boundsgen sc_ub_mc4=.gen sc_lb_mc4=.**coverage indicatorgen cov4=.quietly{forvalues i = 1(1)‘mc’ {if floor((‘i’-1)/100)==((‘i’-1)/100) {

S25 oisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)*N/A*spillover effect (T2 to Y1)gen sp2=0.3*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)*generate T1**t_1 affected by fe

S26 en t_1=(fe_t1*fe)>0.5*generate T2**t_2 affected by fegen t_2=(fe_t2*fe)>0.2*generate Y1**y_1 affected by fe, t_1, t_2, e_1gen y_1=(ie*t_1)+fe+(sp2*t_2)+e_1*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore*store simulation results in data setreplace sc_mc4=‘sc_sim’ in ‘i’

S27 eplace sc_se_mc4=‘sc_se_sim’ in ‘i’replace sc_ub_mc4=‘sc_ub_sim’ in ‘i’replace sc_lb_mc4=‘sc_lb_sim’ in ‘i’replace cov4=0replace cov4=1 if sc_lb_mc4<=0.5 & sc_ub_mc4>=0.5}}/************/*SIMULATION 5*Spillover from T1 to Y2*Spillover from T2 to Y1*Spillover from T2 to T1*Figure 2B/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc5=.**spillover coefficient standard errorgen sc_se_mc5=.**spillover coefficient 95% CI boundsgen sc_ub_mc5=.gen sc_lb_mc5=.**coverage indicatorgen cov5=.quietly{forvalues i = 1(1)‘mc’ {if floor((‘i’-1)/100)==((‘i’-1)/100) {

S28 oisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)gen ts=0.3*spillover effect (T2 to Y1)gen sp2=0.3*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)*generate T2**t_2 affected by fe

S29 en t_2=(fe_t2*fe)>0.2*generate T1**t_1 affected by fe, t_2gen t_1=(fe_t1*fe)+(ts*t_2)>0.5*generate Y1**y_1 affected by fe, t_1, t_2, e_1gen y_1=(ie*t_1)+fe+(sp2*t_2)+e_1*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore*store simulation results in data set

S30 eplace sc_mc5=‘sc_sim’ in ‘i’replace sc_se_mc5=‘sc_se_sim’ in ‘i’replace sc_ub_mc5=‘sc_ub_sim’ in ‘i’replace sc_lb_mc5=‘sc_lb_sim’ in ‘i’replace cov5=0replace cov5=1 if sc_lb_mc5<=0.5 & sc_ub_mc5>=0.5}}/************/*SIMULATION 6*Spillover from T1 to Y2*Spillover from T2 to Y1*Spillover from T1 to T2*Figure 2C/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc6=.**spillover coefficient standard errorgen sc_se_mc6=.**spillover coefficient 95% CI boundsgen sc_ub_mc6=.gen sc_lb_mc6=.**coverage indicatorgen cov6=.quietly{forvalues i = 1(1)‘mc’ {

S31 f floor((‘i’-1)/100)==((‘i’-1)/100) {noisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)gen ts=0.3*spillover effect (T2 to Y1)gen sp2=0.3*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)

S32 generate T1**t_1 affected by fegen t_1=(fe_t1*fe)>0.5*generate T2**t_2 affected by fe, t_1gen t_2=(fe_t2*fe)+(ts*t_1)>0.2*generate Y1**y_1 affected by fe, t_1, t_2, e_1gen y_1=(ie*t_1)+fe+(sp2*t_2)+e_1*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore

S33 store simulation results in data setreplace sc_mc6=‘sc_sim’ in ‘i’replace sc_se_mc6=‘sc_se_sim’ in ‘i’replace sc_ub_mc6=‘sc_ub_sim’ in ‘i’replace sc_lb_mc6=‘sc_lb_sim’ in ‘i’replace cov6=0replace cov6=1 if sc_lb_mc6<=0.5 & sc_ub_mc6>=0.5}}/************/*SIMULATION 7*Spillover from T1 to Y2*Spillover from Y1 to T2*Figure 3A/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc7=.**spillover coefficient standard errorgen sc_se_mc7=.**spillover coefficient 95% CI boundsgen sc_ub_mc7=.gen sc_lb_mc7=.**coverage indicatorgen cov7=.quietly{forvalues i = 1(1)‘mc’ {

S34 f floor((‘i’-1)/100)==((‘i’-1)/100) {noisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)*N/A*spillover effect (T2 to Y1)*N/A*spillover effect (Y1 to T2)gen sp3=0.3*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)

S35 generate T1**t_1 affected by fegen t_1=(fe_t1*fe)>0.5*generate Y1**y_1 affected by fe, t_1, e_1gen y_1=(ie*t_1)+fe+e_1*generate T2**t_2 affected by fe, y_1gen t_2=(fe_t2*fe)+(sp3*y_1)>0.2*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore*store simulation results in data set

S36 eplace sc_mc7=‘sc_sim’ in ‘i’replace sc_se_mc7=‘sc_se_sim’ in ‘i’replace sc_ub_mc7=‘sc_ub_sim’ in ‘i’replace sc_lb_mc7=‘sc_lb_sim’ in ‘i’replace cov7=0replace cov7=1 if sc_lb_mc7<=0.5 & sc_ub_mc7>=0.5}}/************/*SIMULATION 8*Spillover from T1 to Y2*Spillover from Y1 to Y2*Figure 3B/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc8=.**spillover coefficient standard errorgen sc_se_mc8=.**spillover coefficient 95% CI boundsgen sc_ub_mc8=.gen sc_lb_mc8=.**coverage indicatorgen cov8=.quietly{forvalues i = 1(1)‘mc’ {if floor((‘i’-1)/100)==((‘i’-1)/100) {

S37 oisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)*N/A*spillover effect (T2 to Y1)*N/A*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)gen os=0.3*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)*generate T1**t_1 affected by fe

S38 en t_1=(fe_t1*fe)>0.5*generate T2**t_2 affected by fegen t_2=(fe_t2*fe)>0.2*generate Y1**y_1 affected by fe, t_1, e_1gen y_1=(ie*t_1)+fe+e_1*generate Y2**y_2 affected by fe, t_2, t_1, y_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+(os*y_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore*store simulation results in data setreplace sc_mc8=‘sc_sim’ in ‘i’replace sc_se_mc8=‘sc_se_sim’ in ‘i’

S39 eplace sc_ub_mc8=‘sc_ub_sim’ in ‘i’replace sc_lb_mc8=‘sc_lb_sim’ in ‘i’replace cov8=0replace cov8=1 if sc_lb_mc8<=0.5 & sc_ub_mc8>=0.5}}/************/*SIMULATION 9*Spillover from T1 to Y2*Spillover from Y1 to Y2*Figure 3C/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc9=.**spillover coefficient standard errorgen sc_se_mc9=.**spillover coefficient 95% CI boundsgen sc_ub_mc9=.gen sc_lb_mc9=.**coverage indicatorgen cov9=.quietly{forvalues i = 1(1)‘mc’ {if floor((‘i’-1)/100)==((‘i’-1)/100) {noisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}

S40 reserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)*N/A*spillover effect (T2 to Y1)*N/A*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)gen os=0.3*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)*generate T1**t_1 affected by fegen t_1=(fe_t1*fe)>0.5

S41 generate T2**t_2 affected by fegen t_2=(fe_t2*fe)>0.2*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate Y1**y_1 affected by fe, t_1, y_2, e_1gen y_1=(ie*t_1)+fe+(os*y_2)+e_1*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore*store simulation results in data setreplace sc_mc9=‘sc_sim’ in ‘i’replace sc_se_mc9=‘sc_se_sim’ in ‘i’replace sc_ub_mc9=‘sc_ub_sim’ in ‘i’replace sc_lb_mc9=‘sc_lb_sim’ in ‘i’

S42 eplace cov9=0replace cov9=1 if sc_lb_mc9<=0.5 & sc_ub_mc9>=0.5}}/*******************************************************/*3. Output for tables and figures/*******************************************************/**tabulate spillover coefficient and standard error by modeltabstat sc_mc1 sc_se_mc1, stat(mean)tabstat sc_mc2 sc_se_mc2, stat(mean)tabstat sc_mc3 sc_se_mc3, stat(mean)tabstat sc_mc4 sc_se_mc4, stat(mean)tabstat sc_mc5 sc_se_mc5, stat(mean)tabstat sc_mc6 sc_se_mc6, stat(mean)tabstat sc_mc7 sc_se_mc7, stat(mean)tabstat sc_mc8 sc_se_mc8, stat(mean)tabstat sc_mc9 sc_se_mc9, stat(mean)**tabulate coverage probability by modeltab cov1tab cov2tab cov3tab cov4tab cov5tab cov6tab cov7tab cov8tab cov9**create figure/*The figure will plot mean spillover coefficient estimates and95% CIs for each model. Save the mean spillover coefficient,standard errors, and CI bounds for each model.*/

S43 gen sc1=mean(sc_mc1)egen sc2=mean(sc_mc2)egen sc3=mean(sc_mc3)egen sc4=mean(sc_mc4)egen sc5=mean(sc_mc5)egen sc6=mean(sc_mc6)egen sc7=mean(sc_mc7)egen sc8=mean(sc_mc8)egen sc9=mean(sc_mc9)egen se1=mean(sc_se_mc1)egen se2=mean(sc_se_mc2)egen se3=mean(sc_se_mc3)egen se4=mean(sc_se_mc4)egen se5=mean(sc_se_mc5)egen se6=mean(sc_se_mc6)egen se7=mean(sc_se_mc7)egen se8=mean(sc_se_mc8)egen se9=mean(sc_se_mc9)egen ub1=mean(sc_ub_mc1)egen ub2=mean(sc_ub_mc2)egen ub3=mean(sc_ub_mc3)egen ub4=mean(sc_ub_mc4)egen ub5=mean(sc_ub_mc5)egen ub6=mean(sc_ub_mc6)egen ub7=mean(sc_ub_mc7)egen ub8=mean(sc_ub_mc8)egen ub9=mean(sc_ub_mc9)egen lb1=mean(sc_lb_mc1)egen lb2=mean(sc_lb_mc2)egen lb3=mean(sc_lb_mc3)egen lb4=mean(sc_lb_mc4)egen lb5=mean(sc_lb_mc5)egen lb6=mean(sc_lb_mc6)egen lb7=mean(sc_lb_mc7)egen lb8=mean(sc_lb_mc8)egen lb9=mean(sc_lb_mc9)

S44 *Keep the mean spillover coefficients, standard errors, andCI bounds only. Drop duplicates (one observation).*/keep sc1-sc9 se1-se9 ub1-ub9 lb1-lb9duplicates reportduplicates dropduplicates report/*Reshape data so that it’s vertical (one observation per model).*/gen id=1reshape long sc se ub lb, i(id)rename _j modellabel define model_lbl 1 "Figure 1A" 2 "Figure 1B" 3 "Figure 1C" ///4 "Figure 2A" 5 "Figure 2B" 6 "Figure 2C" ///7 "Figure 3A" 8 "Figure 3B" 9 "Figure 3C"label values model model_lbl/*Use a truncated label for models -- saves space on plot.*/label define model2_lbl 1 "Figure 1A" 2 "1B" 3 "1C" ///4 "Figure 2A" 5 "2B" 6 "2C" ///7 "Figure 3A" 8 "3B" 9 "3C"label values model model2_lbl/*Plot the figure.*/

S45 woway(scatter model sc, mcolor(black) ysc(reverse) ///graphregion(color(white)) ///xtitle(Mean Spillover Coefficient (95% CI)) ///ytitle(Sibling Spillover Model) ///xline(0.2, lcolor(black) lpattern(shortdash_dot)) ///xline(0.5, lcolor(black) lpattern(dash)) ///legend(off) xtick(woway(scatter model sc, mcolor(black) ysc(reverse) ///graphregion(color(white)) ///xtitle(Mean Spillover Coefficient (95% CI)) ///ytitle(Sibling Spillover Model) ///xline(0.2, lcolor(black) lpattern(shortdash_dot)) ///xline(0.5, lcolor(black) lpattern(dash)) ///legend(off) xtick(