Estimating Sibling Spillover Effects with Unobserved Confounding Using Gain-Scores
EEstimating Sibling Spillover Effects with UnobservedConfounding Using Gain-Scores
David C. Mallinson , Felix Elwert , Department of Population Health Sciences, University of Wisconsin-Madison Center for Demography and Ecology, University of Wisconsin-Madison Department of Sociology, University of Wisconsin-MadisonFebruary 24, 2021
Abstract
A growing area of research in epidemiology is the identification of health-related sibling spillover effects,or the effect of one individual’s exposure on their sibling’s outcome. The health and health care of familymembers may be inextricably confounded by unobserved factors, rendering identification of spillover effectswithin families particularly challenging. We demonstrate a gain-score regression method for identifyingexposure-to-outcome spillover effects within sibling pairs in a linear fixed effects framework. The method canidentify the exposure-to-outcome spillover effect if only one sibling’s exposure affects the other’s outcome; andit identifies the difference between the spillover effects if both siblings’ exposures affect the others’ outcomes.The method fails in the presence of outcome-to-exposure spillover and outcome-to-outcome spillover. Analyticresults and Monte Carlo simulations demonstrate the method and its limitations. To exercise this method,we estimate the spillover effect of a child’s preterm birth on an older sibling’s literacy skills, measured bythe Phonological Awareness Literacy Screening-Kindergarten test. We analyze 20,010 sibling pairs from apopulation-wide, Wisconsin-based (United States) birth cohort. Without covariate adjustment, we estimatethat preterm birth modestly decreases an older sibling’s test score (-2.11 points; 95% confidence interval:-3.82, -0.40 points). In conclusion, gain-scores are a promising strategy for identifying exposure-to-outcomespillovers in sibling pairs while controlling for sibling-invariant unobserved confounding in linear settings.
Acknowledgements : This work was supported by the Eunice Kennedy Shriver National Institute for Child Health and Human Development(T32 HD007014-42), the University of Wisconsin-Madison Clinical and Translational Science Award program through the National Institutesof Health National Center for Advancing Translational Sciences (Grant UL1TR00427), the University of Wisconsin-Madison School ofMedicine and Public Health’s Wisconsin Partnership Program, and the University of Wisconsin-Madison Institute for Research on Poverty.We thank the Wisconsin Department of Children and Families, Department of Health Services, and Department of Public Instruction for theuse of data. We also thank Steven T. Cook, Dan Ross, Jane A. Smith, Kristen Voskuil, and Lynn Wimer for data access and programmingassistance. We thank Michael Sobel for methodological discussions and Deborah B. Ehrenthal for feedback on this manuscript. The contentis solely the responsibility of the authors and does not necessarily represent the official views of supporting agencies. Supporting agenciesdo not certify the accuracy of the analyses presented. Conflicts of interest: none. a r X i v : . [ s t a t . M E ] F e b ontents Introduction
A sibling spillover effect (i.e., “interference” or “carryover effect”) is the effect of an individual’s exposure ontheir sibling’s outcome.
The past two decades of epidemiologic research witnessed a burgeoning interestin the role of family environments in childhood health, calling attention to the importance of spilloverswithin families.
Yet, sibling spillovers are largely unexamined in the epidemiologic literature, as mostfield-specific advancements in spillover identification have been restricted to infectious diseases.
Withgrowing interest on the familial interdependence of health, the need for analytical tools to identify siblingspillovers is apparent.Unobserved confounding is particularly salient with sibling spillovers. Siblings often share experiencesthat cultivate their development, which may be unmeasured even in data-rich contexts. Fixed effect (FE)designs that control for unobserved time-invariant confounding are immediately appealing, but there islittle precedent for their use to identify sibling spillovers. Sj¨olander et al. (2016) investigated FE models withsibling pairs for identifying targeted effects of one siblings’ exposure on their own outcome in the presence ofspillover, noting that spillover may be identifiable if only one sibling’s exposure affects the other’s outcome. Black et al. (2020) employed a difference-in-differences model with three-sibling clusters for identifying alower-bound estimate of a child’s disability on an older sibling’s academic performance. In this paper, we demonstrate a method for the identification of one- and two-sided exposure-to-outcomespillovers in sibling pairs with gain-scores (i.e., difference-in-differences, or difference scores), a staple of FEestimation that removes shared confounding by differencing outcomes. We evaluate the gain-score estimatorin identifying spillover effects across various models with one-sided or two-sided spillovers. Consistent withthe applied FE literature, we focus on linear models with homogenous effects.
This paper is organized as follows. First, we briefly introduce causal directed acyclic graphs (DAGs),which illustrate our models. Second, we discuss various two-sibling models with one-or two-sided spilloverand explain how and when gain-score methods can identify spillover effects. Third, we illustrate our resultswith simulations. Fourth, we apply the method to identify the effect of a younger sibling’s preterm birth onan older sibling’s literacy test performance.
Causal DAGs are useful for explaining the identification of causal effects. We review necessary terminologyfor this exposition. Causal DAGs are diagrams consisting of nodes (variables) and directed edges (directcausal effects) that represent the assumed data-generating process (causal model).
Paths are sequencesof adjacent edges, regardless of the arrows’ directions. On causal paths between exposure and outcome, allarrows point from the exposure to the outcome. On non- causal paths between an exposure and an outcome,at least one arrow points away from the outcome. Causal paths “transmit” causal effects, whereas non-causalpaths may transmit spurious associations.
Colliders are variables that receive two inbound arrows on a path(a given variable may be a collider on one path but not on another). Pearl’s d-separation criterion determineswhich variables in data generated by the assumed DAG are conditionally or unconditionally independent:two variables are independent if all paths between them are closed; and a path is closed if it includes a non- ollider as intermediate variable that is conditioned on, or if it includes a collider as intermediate variablethat is not conditioned on. Conversely, two variables may be associated if at least one path betweenthem is open ( d-connected ); and a path is open if it is not closed.Typically, health researchers attempt to identify causal effects by adjusting for observed variables viaregression analysis, matching, or inverse-probability weighting, so that all causal paths between exposure andoutcome are open and all non-causal paths between them are closed.
However, researchers may worryabout open non-causal paths with unobserved confounders that cannot be closed by covariate adjustment.In multilevel analyses in which observations are clustered into groups (e.g., children in sibling pairs), FEmethods can sometimes identify causal effects by subtracting out certain types of group-level unobservedconfounding.
Next, we describe several sibling spillover models with unobserved confounding andshow when gain-score estimation—a FE approach—can identify the spillover effect.
Model and Assumptions
We first present our baseline sibling spillover model and subsequently introduce variations on this model.For illustration, we discuss the spillover effect of a child’s early health shock (e.g., serious illness) on theirsibling’s later academic achievement (e.g., test scores). This example is purposefully generic but broadlyapplicable, and it draws upon prior work of health-related spillover effects on academic performance whilemotivating our empirical application.Our baseline model is a linear two-sibling comparison design with one-sided spillover ( Figure 1A ).Subscript i = 1 , . . . , N indicates cluster (family) and subscript j = 1 , T ij represents abinary or continuous exposure (e.g., the health shock), Y ij represents a continuous outcome (e.g., academicperformance), U i represents unobserved family-level confounding (i.e., the FE), and D i represents the gain-score, D i = Y i − Y i . Causal effects in this model include the spillover effect, θ ( T i → Y i ), of sibling 1’sexposure on sibling 2’s outcome; the targeted effects, δ ( T ij → Y ij ), of each sibling’s exposure on their ownoutcome; and confounding effects of the unobserved family-level confounders on each sibling’s exposure andoutcome, ψχ ( T i ← U i → Y i ) and ψγ ( T i ← U i → Y i ).All models embed several simplifying assumptions. First, the targeted effects, δ , of T ij on Y ij and theconfounding effects, ψ , of U i on Y ij are equal for both siblings. Second, all effects are linear and homoge-nous. Third, there is only partial interference (i.e., spillovers within sibling clusters but not between siblingclusters).
Aside from partial interference, these assumptions align with conventional FE models.
Notably, our presentation abstracts from sibling-specific observed baseline covariates, C ij . Covariatesmay be added to our baseline and subsequent models as long as one can condition on C ij without loss ofgenerality. i T i Y i T i Y i D i (1A) χ ψψγ δθδ -1+1 U i (1B) T i Y i T i Y i D i χ ψψγ δθδ -1+1 τ U i (1C) T i Y i T i Y i D i χ ψψγ δθδ -1+1 φ Figure 1.
Causal directed acyclic graphs for linear data-generating models with one-sided exposure-to-outcome sibling spillover. Subscripts i and j denote cluster and sibling, respectively. T ij is theexposure, Y ij is the outcome, D i is the difference score, and U i is an unobserved family-level con-founder. Greek letters denote effects. (1A) does not have exposure-to-exposure spillover, whereas(1B) and (1C) have exposure-to-exposure spillovers.3 ain-Score Estimation and Identification of Spillover Effects This subsection details the gain-score estimation strategy and demonstrates when our estimator point-identifies spillover effects (i.e., recovers the estimand precisely) for nine sibling spillover models that differ bywhether spillover is one- or two-sided and by whether additional spillovers originate from outcomes.
Gain-score estimation
We investigate the ability of a gain-score estimator to identify exposure-to-outcome spillover effects. First,we regress the gain-score on both siblings’ exposures, D i = b T i + b T i + e i , (1)where b and b are partial regression coefficients for T i and T i , respectively, and e i is an error term. Wethen sum the partial regression coefficients to compute a “spillover coefficient” ( SC ), SC = b + b . (2)We will now interrogate whether the SC identifies causal spillover effects in each of several commonlyassumed data generating processes in health research. Settings with one-sided spillover
The object of interest (estimand) is θ , or the direct spillover effect of sibling 1’s health shock on sibling 2’sacademic performance. Under the baseline model ( Figure 1A ), three open paths connect T i and Y i . Thefirst path, T i → Y i , is the causal spillover effect of interest. The other two paths are non-causal pathsthat may transmit spurious association. The first non-causal path, T i ← U i → T i → Y i , can be closedby adjusting on T i . However, the second non-causal path, T i ← U i → Y i , cannot be closed by covariateadjustment because it only contains the unobserved variable U i .Nonetheless, we can identify θ through gain-score regression. Under the assumptions of Figure 1A ,it can be shown that b = θ − δ and b = δ , using elementary regression algebra. Therefore, the spillovercoefficient equals SC = b + b = θ .The intuition for this result is that first-differencing exactly offsets confounding biases involving U i ,26and that the SC corrects for the contamination of the spillover estimate in b . Specifically, the coefficient b on T i captures the association flowing along the open paths from T i to D i . There are five paths from T i to D i (listed together with their corresponding path coefficients):1. T i ← U i → T i → Y i → D i (non-causal): δχγ T i ← U i → Y i → D i (non-causal): − ψχ T i ← U i → Y i → D i (non-causal): ψχ T i → Y i → D i (causal): δ . T i → Y i → D i (causal): θ The first path is closed because the regression conditions on T i . The second and third paths cancel eachother out exactly. The fourth path transmits the spillover effect. The fifth path transmits the negative ofthe targeted effect. Hence, the regression coefficient b = θ − δ identifies the difference between the spilloverand targeted effect.The coefficient b on T i captures the association flowing along the open paths from T i and D i . Thereare four paths from T i to D i :1. T i ← U i → T i → Y i → D i (non-causal): − δχγ T i ← U i → Y i → D i (non-causal): − ψγ T i ← U i → Y i → D i (non-causal): ψγ T i → Y i → D i (causal): δ The first path is closed because the regression conditions on T i ; the second and third paths cancel eachother out; and the fourth path captures the targeted effect. Thus, b = δ identifies the targeted effect, and SC = b + b = θ identifies the causal spillover effect.Many statistical software have functions for summing regression coefficients and obtaining standard errors.Examples include Stata’s lincom command, R’s contrast package, and SAS’s SCORE procedure. The analysis is only slightly complicated in the presence of exposure-to-exposure spillover ( T ij → T ij (cid:48) ) –for example, when one child’s serious illness increases their sibling’s risk of illness. When T i → T i ( Figure1B ), the analysis does not change. However, if T i → T i ( Figure 1C ), then the interpretation of SC = θ changes from representing the entire spillover effect of T i on Y i to capturing only the direct spillover effect,since the indirect component of the spillover effect that operates via the causal path T i → T i → Y i is closedbecause the regression controls for T i . See the Supplementary Material for details.
Settings with two-sided spillover
Analysts may also encounter scenarios with two-sided spillover. In our example, each siblings’ health shockcould affect the other’s academic performance ( T i → Y i and T i → Y i ). Reflecting this possibility, Figure2A modifies the baseline model of
Figure 1A to allow spillover T i → Y i with effect κ . The partial regressioncoefficients in the gain-score approach identify b = θ − δ and b = δ − κ , so that SC = b + b = θ − κ .Consequently, with two-sided exposure-to-outcome spillover, the SC does not identify the spillover effect of T i on Y i but instead the difference between the two exposure-to-outcome spillover effects. However, if theanalyst can defend assumptions about one or more of the signs of the two spillover effects, then the SC remains informative even though it no longer point-identifies θ . Specifically, if κ >
0, the SC underestimates(i.e., gives a lower bound for) θ . By contrast, if κ <
0, then the SC overestimates (gives an upper bound for) θ . One can make additional inferences about θ depending on the value of the SC and the assumed sign of κ . For example, if SC > κ >
0, then θ >
0. Of note, a finding that SC = 0 is uninformative, because i T i Y i T i Y i D i (2A) χ ψψγ δ θδ -1+1 κ U i (2B) T i Y i T i Y i D i χ ψψγ δ θδ -1+1 κτ U i (2C) T i Y i T i Y i D i χ ψψγ δ θδ -1+1 κφ Figure 2.
Causal directed acyclic graphs for linear data-generating models with two-sided exposure-to-outcome sibling spillover. Subscripts i and j denote cluster and sibling, respectively. T ij is theexposure, Y ij is the outcome, D i is the difference score, and U i is an unobserved family-level con-founder. Greek letters denote effects. (2A) does not have exposure-to-exposure spillover, whereas(2B) and (2C) have exposure-to-exposure spillover.6 t is compatible with the possibility that the two spillover effects are equal, θ = κ , and that they are bothzero, θ = κ = 0.If T ij → T ij (cid:48) in addition to two-sided spillover ( Figures 2B-C ), this does not affect the interpretations of b and b , and SC still identifies the difference between siblings’ unmediated spillover effects. However, SC will not capture the mediated part of spillover effect, T ij → T ij (cid:48) → Y ij (cid:48) . See the Supplementary Material for details.
Settings with spillovers from outcomes
Analysts may also encounter settings with outcome-to-outcome spillover ( Y ij → Y ij (cid:48) ) or outcome-to-exposurespillover ( Y ij → T ij (cid:48) ). In our setting, it is reasonable to assume that siblings’ academic outcomes may becausally related by outcome-to-outcome spillover. In contrast, an academic outcome causing a health shockis implausible, but exposure-to-outcome spillovers may be relevant elsewhere.If outcomes cause future exposures or outcomes ( Figure 3 ), then our estimator does not identify spilloversor simple functions of spillovers. See the
Supplementary Material for details.
We conducted nine Monte Carlo simulations, one simulation for each of the nine models in Figures 1-3 , todemonstrate when the method identifies exposure-to-outcome spillover effects. Our simulation model follows: U i , υ i , υ i ∼ N (0 , T i = τ T i + χU i ≤ .
51 if τ T i + χU i > . T i = φT i + ωY i + γU i ≤ .
21 if φT i + ωY i + γU i > . Y i = δT i + κT i + λY i + ψU i + υ i Y i = δT i + θT i + ηY i + ψU i + υ i D i = Y i − Y i We simulated each model with 1000 runs of 5000 observations each, where each observation representeda sibling pair. We set the following parameters at fixed values: θ = 0 . δ = 1, ψ = 1, χ = 2, and γ = 3.Parameters distinguishing the models— κ , τ , φ , ω , η , and λ —were set to zero unless otherwise specified. Toavoid simultaneity, at least one parameter in each pair ( τ , φ ), ( η , λ ), and ( κ , ω ) was always set to zero.In each sample, we regressed the gain-score on siblings’ exposures and computed the spillover coefficientaccording to equations (1) and (2). We conducted simulations in Stata Statistical Software: Release 16. Simulation code is in the
Supplementary Material . i T i Y i T i Y i D i (3A) χ ψψγ δ θδ -1+1 ω U i (3B) T i Y i T i Y i D i χ ψψγ δθδ -1+1 η U i (3C) T i Y i T i Y i D i χ ψψγ δθδ -1+1 λ Figure 3.
Causal directed acyclic graphs for linear data-generating models with one-sided exposure-to-outcome sibling spillover and spillover from outcomes. Subscripts i and j denote cluster and sibling,respectively. T ij is the exposure, Y ij is the outcome, D i is the difference score, and U i is an unobservedfamily-level confounder. Greek letters denote effects. (3A) has outcome-to-exposure spillover, and(3B) and (3C) have outcome-to-outcome spillover.8 igure 4 displays the simulation results. The first three rows confirm that the spillover coefficient isunbiased in the three settings with one-sided exposure-to-outcome spillover of Figure 1 , as the averageof estimated spillover coefficient equals the known spillover effect, (cid:100) SC F igure = 0 . Figure 2 with two-sided exposure-to-outcome spillover identifies the difference between the two spillovers, (cid:100) SC F igure = 0 . − . . κ > (cid:100) SC F igure underestimates thespillover effect, θ . The final three simulations show that (cid:100) SC F igure is biased in all models of Figure 3 withspillovers from outcomes. Size and direction of the biases are fairly complicated functions of the coefficientsin the data-generating model and can be large. The estimated ordinary least squares standard errors closelyresemble the empirical standard errors for each model, indicating that the built-in standard errors in Stata’slincom command are accurate. We applied the method to estimating the spillover effect of a child’s preterm birth (gestational age < If a child is bornpreterm, parents may reallocate investments (time, financial, or otherwise) from older siblings to support theyounger sibling’s health, thereby inhibiting the older siblings’ development, including early literacy.For this application, we analyzed Big Data for Little Kids (BD4LK), a longitudinal cohort of birth recordsfor all live in-state resident deliveries in Wisconsin during 2007-2016 (N > PALS-K evaluates readiness for kindergarten-level literacy instruction on six domains (rhyme awareness; beginning sound awareness; alphabet knowledge;letter sounds; spelling; word concept).
In Wisconsin, children must be five years-old at kindergartenenrollment to qualify for PALS-K testing. Our analysis includes 20,010 sibling pairs (40,020 children) thatwere sequentially-born from different deliveries to the same biological mother and had non-missing English-language PALS-K test scores and covariates. The
Supplementary Material contains the full samplingdescription.We estimate the following gain-score regression model, D i = b P T B i + b P T B i + β C i + v i where D i = P ALSK i − P ALSK i . Subscripts i = 1 , . . . N and j = 1 , j = 1 is the younger sibling. P T B ij is a binary preterm birth indicator (1 if preterm;0 otherwise), P ALSK ij is the continuous PALS-K score (0-102 points), and C i is a vector of covariatesmeasured at the older sibling’s delivery, which may be empty. Covariates include maternal age (years),maternal education (no high school diploma; high school diploma/equivalent; 1-3 years college; 4+ yearscollege), and Medicaid delivery payment. D i is the gain-score estimator. q = 0.5)( q – k = 0.2)Figure 1A1B1CFigure 2A2B2CFigure 3A3B3C S i b li ng S p ill o v e r M ode l -.5 0 .5 1 1.5 Mean Spillover Coefficient (95% CI)
Figure 4.
Results (average spillover coefficients and empirical 95% confidence intervals [CI]) fromsimulations of the nine sibling spillover models in
Figures 1-3 . Each simulation consisted of 1000runs of 5000 observations, where each observation represented a sibling pair. Subscripts i and j indicate cluster and sibling, respectively. T ij is the exposure and Y ij is the outcome. The targetquantity is the spillover effect ( T i → Y i ), set to θ = 0 . κ ( T i → Y i ), τ ( T i → T i ), φ ( T i → T i ), ω ( Y i → T i ), η ( Y i → Y i ), and λ ( Y i → Y i ). Exceptfor θ , all spillover parameters were set to zero except in the following cases: κ = 0 . τ = 0 . φ = 0 . ω = 0 . η = 0 . λ = 0 . θ in all models of one-sided spillover ( Figure 1 ); identifies thedifference between the two spillover effects with two-sided spillover (
Figure 2 ); and is biased in thepresence of spillovers originating from outcomes (
Figure 3 ).10 e ran the model twice, once with and once without covariates. In each model, we summed the regressioncoefficients on both siblings’ preterm birth indicators to compute the SC = b + b . Assuming the one-sidedspillover model of Figure 1A , SC from the regression without covariates identifies the effect of a youngersibling’s preterm birth on the older sibling’s PALS-K score. Additionally, b identifies the effect of eachsibling’s preterm birth on their own PALS-K score. We performed all analyses in Stata Statistical Software:Release 16. The University of Wisconsin-Madison minimal risk institutional review board approved ourproject.
Supplemental Tables 1 and 2 summarize baseline characteristics of our sample (
SupplementaryMaterial ). Preterm birth incidence was slightly greater among older siblings relative to younger siblings(6.78% vs. 6.65%). On average, older siblings received slightly lower PALS-K scores (mean 63.58 points; SD24.12 points) relative to younger siblings (mean 64.22 points; SD 23.83 points). Approximately 10% of siblingclusters had discordant preterm birth exposure. In the regression without covariate adjustment, the oldersibling’s preterm birth coefficient was ˆ b = -2.49 points (95% CI -3.83, -1.15 points), the younger sibling’spreterm birth coefficient was ˆ b = 0.38 points (95% CI: -0.97, 1.73 points), and the resulting (cid:100) SC was -2.11points (95% CI: -3.82, -0.40 points) ( Table 1 ). This indicates that a younger sibling’s preterm birth modestlyharmed their older sibling’s PALS-K performance.
Figure 5 displays these results graphically relative tothe assumed data-generating model. However, covariate adjustment attenuated the (cid:100) SC to -1.49 points (95%CI -3.21, 0.22 points). We described a simple approach to identifying spillovers with gain-scores in sibling pairs. This method canpoint-identify spillovers if only one sibling’s exposure affects the other’s outcome, and it can identify thedifference in siblings’ spillovers in the presence of two-sided spillover. The method leverages the primarybenefit of FE estimation: controlling for family-level, sibling-invariant, unobserved confounding. Whereaspreceding epidemiologic research on spillover identification primarily considered infectious diseases, our workcontributes to the growing literature on spillovers within families.We acknowledge some limitations. First, we restricted our attention to linear settings. This methoddoes not necessarily apply to contexts with nonlinear relationships, such as those with binary outcomes (seeSj¨olander et al. (2016) for binary outcomes in our Figure 1A ). Second, we did not consider clusters ofthree or more siblings. Spillovers that originate from larger sibling clusters may pose unique challenges thatare unaddressed here – for example, whether one can identify the effect of a middle child’s exposure on theyoungest sibling’s outcome if an eldest sibling’s exposure affects all siblings’ outcomes. Lastly, we did nottest the method in the presence of shared mediator or collider variables. Sj¨olander and Zetterqvist (2017)interrogated sibling comparison models with shared mediators and colliders, finding that such factors mayinduce bias. Nonetheless, our paper lays groundwork for subsequent research. Specific avenues that advance thismethod include testing in nonlinear settings or settings with shared mediator variables, as well as expandingmodels to allow three or more siblings. able 1. Ordinary least squares regression of the difference in siblings’ PALS-K scores a (points) ontheir preterm birth statuses (N = 20,010 sibling pairs) Unadjusted Regression Adjusted Regression b Coefficient (95% CI) Coefficient (95% CI)
Preterm birth (gestational age <
37 weeks)
Older sibling -2.49 (-3.83, -1.15) -2.28 (-3.62, -0.94)
Younger sibling c -2.11 (-3.82, -0.40) -1.49 (-3.21, 0.22) a The difference in PALS-K scores equals the older sibling’s PALS-K Score minus the younger sibling’sPALS-K score. b Covariates include maternal age at delivery (years), maternal education at delivery (no high schooldiploma; high school diploma/equivalent; 1-3 years college; 4+ years college) and Medicaid deliverypayment (no; yes), all of which were measured at the time of the older sibling’s delivery. c The spillover coefficient is the sum of the partial regression coefficients for the older sibling’s pretermbirth indicator and the younger sibling’s preterm birth indicator. Assuming one-sided spillover asin
Figure 1A , this identifies the effect of a younger sibling’s preterm birth on the older sibling’sPALS-K score.Abbreviations: “CI” confidence interval; “PALS-K” Phonological Awareness Literacy Screening-Kindergarten. 12 i PTB i PALSK i PTB i PALSK i D i χ ψψγ ˆ δ = − . ˆ θ = − . ˆ δ = − . -1+1 Figure 5.
A directed acyclic graph of the relationship between siblings’ preterm birth (gestationalage <
37 weeks) and their score on the Phonological Awareness Literacy Assessment-Kindergartentest with overlaid estimates. Subscripts i and j denote cluster and sibling, respectively, where j = 1is the younger sibling and j = 2 is the older sibling. P T B ij is a preterm birth indicator, P ALSK ij is the test score, D i is a difference score, and U i is an unobserved confounder. Greek letters denoteeffects, and the values of θ and δ are estimated using gain-score regression.13 eferences [1] Ogburn EL, VanderWeele TJ. Causal diagrams for interference. Stat Sci :559-578.[2] VanderWeele TJ. Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford:Oxford University Press, 2015.[3] Sj¨olander A, Frisell T, Kuja-Halkola R, ¨Oberg S, Zetterqvist J. Carryover effects in sibling comparisondesigns. Epidemiol :852-858.[4] Kuh D, Ben-Shlomo Y, Lynch J, Hallqvist J, Power C. Life course epidemiology. J Epidemiol CommunityHealth :778-783.[5] Lawlor DA, Mishra GD. Family Matters: Designing, Analysing and Understanding Family Based Studiesin Life Course Epidemiology. 1st edn. Oxford: Oxford University Press, 2009.[6] Liu S, Jones RN, Glymour MM. Implications of lifecourse epidemiology for research on determinants ofadult disease. Public Health Rev :489–511.[7] Feinberg ME, Solmeyer AR, McHale SM. The third rail of family systems: sibling relationships, mentaland behavioral health, and preventive intervention in childhood and adolescence. Clin Child Fam PsycholRev :43–57.[8] Viner RM, Ross D, Hardy R, et al. Life course epidemiology: recognising the importance of adolescence. J Epidemiol Community Health :719-720.[9] Ben-Shlomo Y, Cooper R, Kuh D. The last two decades of life course epidemiology, and its relevance forresearch on ageing. Int J Epidemiol :973-988.[10] De Neve JW, Kawachi I. Spillovers between siblings and from offspring to parents are understudied: areview and future directions for research. Soc Sci Med :56-61.[11] Morris AS, Robinson LR, Hays-Grudo J, Claussen AH, Hartwig SA, Treat AE. Targeting parenting inearly childhood: a public health approach to improve outcomes for children living in poverty.
Child Dev :388-397.[12] Halloran ME, Struchiner CJ. Study designs for dependent happenings. Epidemiol :331–38.[13] Halloran ME, Struchiner CJ. Causal inference in infectious diseases. Epidemiol :142–51.[14] Longini IM, Sagatelian K, Rida WN, Halloran ME. Optimal vaccine trial design when estimating vaccineefficacy for susceptibility and infectiousness from multiple populations. Stat Med :1121–36.[15] Hudgens MG, Halloran ME. Toward causal inference with interference. J Am Stat Assoc :832-842.
16] VanderWeele TJ, Tchetgen Tchetgen EJ. Effect partitioning under interference in two-stage randomizedvaccine trials.
Stat Probabil Lett :861–69.[17] Clemens J, Shin S, Ali M. New approaches to the assessment of vaccine herd protection in clinical trials. Lancet Infect Dis :482–87.[18] Halloran ME. The minicommunity design to assess indirect effects of vaccination. Epidemiol Methods :83–105.[19] Tchetgen Tchetgen EJ, VanderWeele TJ. On causal inference in the presence of interference. Stat Meth-ods Med Res :55–75.[20] VanderWeele TJ, Tchetgen Tchetgen EJ, Halloran M. Components of the indirect effect in vaccine trials:identification of contagion and infectiousness effects. Epidemiol :751–61.[21] Halloran ME, Hudgens MG. Dependents happenings: a recent methodological review. Curr EpidemiolRep :297-305.[22] Benjamin-Chung J, Arnold BF, Berger D, et al. Spillover effects in epidemiology: parameters, studydesigns and methodological considerations. Int J Epidemiol :332-347.[23] Gunasekara FI, Richardson K, Carter K, Blakely T. Fixed effects analysis of repeated measures data. Int J Epidemiol :264-269.[24] Imai K, Kim IS. When should we use unit fixed effects regression models for causal inference withlongitudinal data? Am J Pol Sci :467-490.[25] Black SE, Breining S, Figlio DN, et al. Sibling spillovers. Econ J
Sociol Methods Res
Epidemiol :37-48.[28] Pearl J. Causality: Models, Reasoning, and Inference. 2nd edn. Cambridge: Cambridge University Press,2009.[29] Shpitser I, VanderWeele T, Robins JM. On the validity of covariate adjustment for estimating causaleffects. arXiv, doi:1203.3515, 15 March 2012, preprint: not peer reviewed.[30] Elwert F. Graphical causal models. In: Morgan SL (ed). Handbook of Causal Analysis for Social Re-search. Dordrecht: Springer Netherlands, 2013, pp. 245-273.[31] Pearl J. Linear models: a useful “microscope” for causal analysis. J Causal Inference :155-170.[32] Morgan SL, Winship C. Counterfactuals and Causal Inference: Methods and Principles for Social Re-search. 2nd edn. Cambridge: Cambridge University Press, 2014.
33] Sobel ME. What do randomized studies of housing mobility demonstrate? Causal inference in the faceof interference.
J Am Stat Assoc
Adv Econ :429-77.[38] StataCorp. Stata Statistical Software: Release 16. College Station, TX: StataCorp LLC; 2019.[39] Mathiasen R, Hansen BM, Andersen AM, Forman JL, Greisen G. Gestational age and basic schoolachievements: a national follow- up study in Denmark. Pediatrics :e1553-e1561.[40] Mallinson DC, Grodsky E, Ehrenthal DB. Gestational age, kindergarten-level literacy, and effect modifi-cation by maternal socio-economic and demographic factors.
Paediatr Perinat Epidemiol :467-479.[41] Larson A, Berger LM, Mallinson DC, Grodsky E, Ehrenthal DB. Variable uptake of Medicaid-coveredPrenatal Care Coordination: the relevance of treatment level and service context. J Community Health :32-43.[42] Ford K, Invernizzi M. PALS Espa˜nol K Technical Reference. Charlottesville, VA: University of VirginiaCurry School of Education; 2014.[43] Invernizzi M, Juel C, Swank L, Meier J. PALS-K Technical Reference. Charlottesville: University ofVirginia Curry School of Education, 2015.[44] Wisconsin Department of Public Instruction. Admissions and early entrance to four- and five-year-oldkindergarten. https://dpi.wi.gov/early-childhood/kind/admission (1 September 2020, date last accessed).[45] Sj¨olander A, Zetterqvist J. Confounders, mediators, or colliders. Epidemiol :540-547. stimating Sibling Spillover Effects with UnobservedConfounding Using Gain-Scores Supplementary Material
Derivations for Models with One-Sided Spillover
Base model with one-sided spillover (Figure 1A)
We first consider models with one-sided exposure-to-outcome spillover ( T i → Y i , but T i does not directlyaffect Y i ). Under the data generating process in Figure 1A , we point identify the spillover effect θ . Forreference, β denotes a regression coefficient, ρ denotes a correlation coefficient, σ denotes standard deviation,and σ denotes variance. We compute the following partial regression coefficients: β D i T i · T i = ρ D i T i − ρ D i T i ∗ ρ T i T i − ρ T i T i ∗ σ D i σ T i β D i T i · T i = ρ D i T i − ρ D i T i ∗ ρ T i T i − ρ T i T i ∗ σ D i σ T i The correlation coefficient between two variables is equal to the sum of the products of path coefficientsand of the “root variable’s” variation divided by the product the variables’ standard deviations. A “rootvariable” is the variable on the path with no incoming arrows. See Pearl (2013) for details. We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ T i δ + σ T i θ + σ U i δχγ + σ U i ψχσ D i σ T i = − σ T i δ + σ T i θ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + σ U i θχγ + σ T i δ + σ U i ψγσ D i σ T i = − σ U i δχγ + σ U i θχγ + σ T i δσ D i σ T i ρ T i T i = σ U i χγσ T i σ T i We plug the correlation coefficients into the formulas for our partial regression coefficients, and we derive thefollowing: S1 D i T i · T i = − σ Ti δ + σ Ti θ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ + σ Ui θχγ + σ Ti δσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγσ Ti σ Ti ) ∗ σ D i σ T i = θ − δβ D i T i · T i = − σ Ui δχγ + σ Ui θχγ + σ Ti δσ Di σ Ti − − σ Ti δ + σ Ti θ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγσ Ti σ Ti ) ∗ σ D i σ T i = δ We can then compute the spillover coefficient ( SC ) to point identify θ : SC = β D i T i · T i + β D i T i · T i = θ − δ + δ = θ One-sided spillover and exposure-to-exposure spillover (Figures 1B-1C)
Under the data generating process in
Figure 1B , sibling 2’s exposure affects sibling 1’s exposure ( T i → T i ).Regardless, we still point identify the spillover effect θ . We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ U i ψγτ + ( − σ T i δ + σ T i θ + σ T i δτ + σ U i δχγ + σ U i ψγτ + σ U i ψχσ D i σ T i = − σ T i δ + σ T i θ + σ T i δτ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + ( − σ T i δτ + σ U i θχγ + σ T i θτ + σ T i δ + σ U i ψγσ D i σ T i = − σ U i δχγ − σ T i δτ + σ U i θχγ + σ T i θτ + σ T i δσ D i σ T i ρ T i T i = σ U i χγ + σ T i τσ T i σ T i We compute β D i T i · T i and β D i T i · T i : S2 D i T i · T i = − σ Ti δ + σ Ti θ + σ Ti δτ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ − σ Ti δτ + σ Ui θχγ + σ Ti θτ + σ Ti δσ Di σ Ti ∗ σ Ui χγ + σ Ti τσ Ti σ Ti − ( σ Ui χγ + σ Ti τσ Ti σ Ti ) ∗ σ D i σ T i = θ − δβ D i T i · T i = − σ Ui δχγ − σ Ti δτ + σ Ui θχγ + σ Ti θτ + σ Ti δσ Di σ Ti − − σ Ti δ + σ Ti θ + σ Ti δτ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγ + σ Ti τσ Ti σ Ti − ( σ Ui χγ + σ Ti τσ Ti σ Ti ) ∗ σ D i σ T i = δ Finally, we compute SC : SC = β D i T i · T i + β D i T i · T i = θ − δ + δ = θ The data generating process in
Figure 1C also has exposure-to-exposure spillover. In this setting, sibling1’s exposure affects sibling 2’s exposure ( T i → T i ). Still, we point identify the spillover effect θ . We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ T i δ + σ T i θ + σ T i δφ + σ U i δχγ + σ U i ψχσ D i σ T i = − δσ T i + σ T i θ + σ T i δφ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχφ + ( − σ U i δχγ + ( − σ T i δφ + σ U i θχγ + σ T i θφ + σ T i δ + σ U i ψχφ + σ U i ψγσ D i σ T i = − σ U i δχγ − σ T i δφ + σ U i θχγ + σ T i θφ + σ T i δσ D i σ T i ρ T i T i = σ U i χγ + σ T i φσ T i σ T i S3 e compute β D i T i · T i and β D i T i · T i : β D i T i · T i = − δσ Ti + σ Ti θ + σ Ti δφ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ − σ Ti δφ + σ Ui θχγ + σ Ti θφ + σ Ti δσ Di σ Ti ∗ σ Ui χγ + σ Ti φσ Ti σ Ti − ( σ Ui χγ + σ Ti φσ Ti σ Ti ) ∗ σ D i σ T i = θ − δβ D i T i · T i = − σ Ui δχγ − σ Ti δφ + σ Ui θχγ + σ Ti θφ + σ Ti δσ Di σ Ti − − δσ Ti + σ Ti θ + σ Ti δφ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγ + σ Ti φσ Ti σ Ti − ( σ Ui χγ + σ Ti φσ Ti σ Ti ) ∗ σ D i σ T i = δ Finally, we compute SC : SC = β D i T i · T i + β D i T i · T i = θ − δ + δ = θ S4 Derivations for Models with Two-Sided Spillover
Base model with two-sided spillover (Figure 2A)
We now consider models with two-sided exposure-to-outcome spillover ( T i → Y i and T i → Y i ). Underthe data generating process in Figure 2A , we can identify the difference between θ and κ . This may beinformative depending on the value of κ (for example, if κ >
0, then SC underestimates θ ). We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ T i δ + ( − σ U i κχγ + σ T i θ + σ U i δχγ + σ U i ψχσ D i σ T i = − σ T i δ − σ U i κχγ + σ T i θ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + ( − σ T i κ + σ U i θχγ + σ T i δ + σ U i ψγσ D i σ T i = − σ U i δχγ − σ T i κ + σ U i θχγ + σ T i δσ D i σ T i ρ T i T i = σ U i χγσ T i σ T i We compute β D i T i · T i and β D i T i · T i : β D i T i · T i = − σ Ti δ − σ Ui κχγ + σ Ti θ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ − σ Ti κ + σ Ui θχγ + σ Ti δσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγσ Ti σ Ti ) ∗ σ D i σ T i = θ − δβ D i T i · T i = − σ Ui δχγ − σ Ti κ + σ Ui θχγ + σ Ti δσ Di σ Ti − − σ Ti δ − σ Ui κχγ + σ Ti θ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγσ Ti σ Ti ) ∗ σ D i σ T i = δ − κ S5 inally, we compute SC : SC = β D i T i · T i + β D i T i · T i = θ − δ + δ − κ = θ − κ Two-sided spillover and exposure-to-exposure spillover (Figures 2B-2C)
Under the data generating process in
Figure 2B , we can still identify the difference between θ and κ evenwith exposure-to-exposure spillover. We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ U i ψτ γ + ( − σ T i δ + ( − σ T i κτ + ( − σ U i κχγ + σ U i θ + σ T i δτ + σ U i δχγ + σ U i ψτ γ + σ U i ψχσ D i σ T i = − σ T i δ − σ T i κτ − σ U i κχγ + σ U i θ + σ T i δτ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + ( − σ T i δτ + ( − σ T i κ + σ U i θχγ + σ T i θτ + σ T i δ + σ U i ψγσ D i σ T i = − σ U i δχγ − σ T i δτ − σ T i κ + σ U i θχγ + σ T i θτ + σ T i δσ D i σ T i ρ T i T i = σ U i χγ + σ T i τσ T i σ T i We compute β D i T i · T i and β D i T i · T i : β D i T i · T i = − σ Ti δ − σ Ti κτ − σ Ui κχγ + σ Ui θ + σ Ti δτ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ − σ Ti δτ − σ Ti κ + σ Ui θχγ + σ Ti θτ + σ Ti δσ Di σ Ti ∗ σ Ui χγ + σ Ti τσ Ti σ Ti − ( σ Ui χγ + σ Ti τσ Ti σ Ti ) ∗ σ D i σ T i = θ − δ S6 D i T i · T i = − σ Ui δχγ − σ Ti δτ − σ Ti κ + σ Ui θχγ + σ Ti θτ + σ Ti δσ Di σ Ti − − σ Ti δ − σ Ti κτ − σ Ui κχγ + σ Ui θ + σ Ti δτ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγ + σ Ti τσ Ti σ Ti − ( σ Ui χγ + σ Ti τσ Ti σ Ti ) ∗ σ D i σ T i = δ − κ Finally, we compute SC : SC = β D i T i · T i + β D i T i · T i = θ − δ + δ − κ = θ − κ We also consider the data generating process in
Figure 2C . This is identical to that of
Figure 2B except T i → T i with an effect of φ . Changing the direction of the exposure-to-exposure spillover does not affectthe partial regression coefficients and the spillover coefficient. S7 Derivations for Models with Spillovers from Outcomes
Past outcome affects future exposure (Figure 3A)
We lastly consider models in which spillovers originate from outcomes. Under the data generating process in
Figure 3A , we do not identify the spillover effect θ . We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ T i δ + σ T i θ + σ T i δ ω + σ U i δχγ + σ U i ψχσ D i σ T i = − σ T i δ + σ T i θ + σ T i δ ω + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + ( − σ Y i ω + σ U i θχγ + σ U i θωψχ + σ T i δ + σ U i δωψχ + σ U i ωψ + σ U i ψγσ D i σ T i = − σ U i δχγ − σ Y i ω + σ U i θχγ + σ U i θωψχ + σ T i δ + σ U i δωψχ + σ U i ωψ σ D i σ T i ρ T i T i = σ U i χγ + σ U i ωψχ + σ T i δωσ T i σ T i We compute β D i T i · T i and β D i T i · T i : β D i T i · T i = − σ Ti δ + σ Ti θ + σ Ti δ ω + σ Ui δχγσ Di σ Ti − − σ U i δχγ − σ Y i ω + σ U i θχγ + σ U i θωψχ + σ T i δ + σ U i δωψχ + σ U i ωψ σ Di σ Ti ∗ σ Ui χγ + σ Ui ωψχ + σ Ti δωσ Ti σ Ti − ( σ Ui χγ + σ Ui ωψχ + σ Ti δωσ Ti σ Ti ) ∗ σ D i σ T i β D i T i · T i = − σ Ui δχγ − σ Yi ω + σ Ui θχγ + σ Ui θωψχ + σ Ti δ + σ Ui δωψχ + σ Ui ωψ σ Di σ Ti − − σ Ti δ + σ Ti θ + σ Ti δ ω + σ Ui δχγσ Di σ Ti ∗ σ Ui χγ + σ Ui ωψχ + σ Ti δωσ Ti σ Ti − ( σ Ui χγ + σ Ui ωψχ + σ Ti δωσ Ti σ Ti ) ∗ σ D i σ T i In this data generating process, simplifying the partial regression coefficients will does not isolate the targetedeffect δ nor the spillover effect θ . Additionally, computing SC by summing the previously calculated valuesof β D i T i · T i and β D i T i · T i will not identify θ . S8 utcome-to-outcome spillover (Figures 3B-3C) Under the data generating process in
Figure 3B , sibling 1’s outcome affects sibling 2’s outcome ( Y i → Y i ),and we do not identify the spillover effect θ . We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ T i δ + σ U i ηψχ + σ T i ηδ + σ T i θ + σ U i δχγ + σ U i ψχσ D i σ T i = − σ T i δ + σ U i ηψχ + σ T i ηδ + σ T i θ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + σ U i ηψγ + σ T i ηδ + σ U i θχγ + σ T i δ + σ U i ψγσ D i σ T i = − σ U i δχγ + σ U i ηψγ + σ T i ηδ + σ U i θχγ + σ T i δσ D i σ T i ρ T i T i = σ U i χγσ T i σ T i We compute β D i T i · T i and β D i T i · T i : β D i T i · T i = − σ Ti δ + σ Ui ηψχ + σ Ti ηδ + σ Ti θ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ + σ Ui ηψγ + σ Ti ηδ + σ Ui θχγ + σ Ti δσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγ + σ Ti φσ Ti σ Ti ) ∗ σ D i σ T i β D i T i · T i = − σ Ui δχγ + σ Ui ηψγ + σ Ti ηδ + σ Ui θχγ + σ Ti δσ Di σ Ti − − σ Ti δ + σ Ui ηψχ + σ Ti ηδ + σ Ti θ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγσ Ti σ Ti ) ∗ σ D i σ T i In this data generating process, simplifying the partial regression coefficients will does not isolate the targetedeffect δ nor the spillover effect θ . Additionally, computing SC by summing the previously calculated valuesof β D i T i · T i and β D i T i · T i will not identify θ . S9 he data generating process in Figure 3C has outcome-to-outcome spillover in which sibling 2’s outcomeaffects sibling 1’s outcome ( Y i → Y i ). Similarly, we do not identify the spillover effect θ . We compute ρ D i T i , ρ D i T i , and ρ T i T i : ρ D i T i = ( − σ U i ψχ + ( − σ T i δ + ( − σ T i λθ + ( − σ U i λδχγ + σ U i δχγ + σ U i ψχσ D i σ T i = − σ T i δ − σ T i λθ − σ U i λδχγ + σ U i δχγσ D i σ T i ρ D i T i = ( − σ U i ψγ + ( − σ U i δχγ + ( − σ U i λθχγ + ( − σ T i λδ + σ U i θχγ + σ T i δ + σ U i ψγσ D i σ T i = − σ U i δχγ − σ U i λθχγ − σ T i λδ + σ U i θχγ + σ T i δσ D i σ T i ρ T i T i = σ U i χγσ T i σ T i We compute β D i T i · T i and β D i T i · T i : β D i T i · T i = − σ Ti δ − σ Ti λθ − σ Ui λδχγ + σ Ui δχγσ Di σ Ti − − σ Ui δχγ − σ Ui λθχγ − σ Ti λδ + σ Ui θχγ + σ Ti δσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγ + σ Ti φσ Ti σ Ti ) ∗ σ D i σ T i β D i T i · T i = − σ Ui δχγ − σ Ui λθχγ − σ Ti λδ + σ Ui θχγ + σ Ti δσ Di σ Ti − − σ Ti δ − σ Ti λθ − σ Ui λδχγ + σ Ui δχγσ Di σ Ti ∗ σ Ui χγσ Ti σ Ti − ( σ Ui χγσ Ti σ Ti ) ∗ σ D i σ T i In this data generating process, simplifying the partial regression coefficients will does not isolate the targetedeffect δ nor the spillover effect θ . Additionally, computing SC by summing the previously calculated valuesof β D i T i · T i and β D i T i · T i will not identify θ . S10
Sampling Description for Empirical Application
For our application, we estimated the effect of a younger sibling’s preterm birth (gestational age <
37 com-pleted weeks) on their older sibling’s Phonological Awareness Literacy Screening-Kindergarten (PALS-K) testscore. PALS-K evaluates fundamental literacy skills at kindergarten entry.
We used data from Big Datafor Little Kids (BD4LK), a cohort of birth records for live resident in-state deliveries in Wisconsin during2007-2016 (N > We sampled sibling pairs born sequentially to the same mother between January 1, 2007-September 1,2010 and took the English-language PALS-K test. We restricted eligibility on birthdate because children hadto be at least five years old by September 1, 2015 for PALS-K testing eligibility. PALS-K is available in twolanguages: English and Spanish.
However, test versions were developed separately and are not directlycomparable, so we only considered children who took the English-language test.BD4LK includes 252,883 unique deliveries during January 1, 2007-September 1, 2010. We identified806 records (0.3%) with multiple maternal or child identifiers that imperfectly matched to Medicaid claimsor PALS-K scores. Among those records, we randomly selected one match and excluded the remainder.We then identified 177,863 records (70.3%) that linked to PALS-K scores, of which 46,743 records were in-sample siblings (26.3% of test-linked records). Finally, we pulled eligible sibling pairs: sequentially-born fromdifferent deliveries, took the English-language PALS-K test, and completed information on control variables(maternal age, maternal education, and Medicaid delivery coverage, all of which were measured at the oldersibling’s delivery). This generated a final sample of 40,020 siblings (85.6% of tested siblings), or 20,010 siblingpairs.
S11
Supplemental Tables
Supplemental Table 1.
Descriptive statistics of sibling pairs for the empirical application (N =20,010 sibling pairs a ) Older Siblings Younger Siblings(N = 20,010) (N = 20,010)
PALS-K Score (Points) c , Mean (SD) 63.58 (24.12) 64.22 (23.83)Preterm Birth b , N (%) 1,357 (6.78%) 1,331 (6.65%)Maternal Age (Years) d , Mean (SD) 26.01 (5.12) –Maternal Education d , N (%) No high school degree
High school degree/equivalent only
4+ years college d , N (%) 7,528 (37.62%) – a Each sibling pair includes one older sibling and one younger sibling. The sample consists of 40,020children in total. b Preterm birth is defined as gestational age <
37 completed weeks. c PALS-K has a score range of 0-102 points. d Measured at the older sibling’s delivery.Notes: “PALS-K” Phonological Awareness Literacy Screening-Kindergarten, “SD” standard devia-tion. S12 upplemental Table 2.
Cross-tabulation of preterm birth a by sibling within two-sibling clusters(N = 20,010 pairs b ) Younger Sibling Younger Sibling TotalNot Preterm PretermOlder Sibling
Not Preterm (94.63%) (5.37%) (100.00%)(94.50%) (74.21%) (93.22%)
Older Sibling
Preterm (75.68%) (24.32%) (100.00%)(5.50%) (24.79%) (6.78%)
Total a Preterm birth is defined as gestational age <
37 completed weeks. b Each sibling pair includes one older sibling and one younger sibling. The sample consists of 40,020children in total.Notes: Whole numbers indicate the frequency of sibling pairs. Within a cell, the first percentageindicates the row percentage (i.e., the percent of younger siblings who are preterm or not pretermby the older sibling’s preterm birth status), and the second percentage is a column percentage (i.e.,the percent of older siblings who are preterm or not preterm by the younger sibling’s preterm birthstatus). S13
Simulation Code
We provide the Stata Statistical Software code for replicating the simulation in our main text. Note thatwe conducted simulations in Stata Statistical Software: Release 16. This code may not be compatible witholder versions of the software. clearset more off/*******************************************************/*0. Description/*******************************************************//*This code simulates identification of sibling spillover effects in a fixedeffects data generating process. Monte Carlo simulations run theregressions over many randomly generated samples and then stores thespillover coefficients in a separate data set.The spillover of interest is the effect of sibling 1’s exposure on sibling 2’soutcome (sp).We run nine simulations in total, each of which correspond to a uniquespillover model (see manuscript).*//*Definitions:T1 = sibling 1’s exposureY1 = sibling 1’s outcomeT2 = sibling 2’s exposureY2 = sibling 2’s outcome*//*Sections:0. Description
S14 . Input, variable, and output definitions2. Simulations3. Output for tables and figures*//*******************************************************/*1. Input, variable, and output definitions/*******************************************************//*Input Information:1. ie_1 = targeted effect2. sp = spillover effect of sibling 1’s exposure on sibling 2’s outcome3. ts = exposure-to-exposure spillover4. sp2 = spillover effect of sibling 2’s exposure on sibling 1’s outcome5. sp3 = spillover effect of sibling 1’s outcome on sibling 2’s exposure6. os = outcome-to-outcome spillover*//*Variable Information:1. cluster = identifier for family/sibling cluster2. fe = family fixed effects3. t_1 = sibling 1’s exposure4. t_2 = sibling 2’s exposure5. y_1 = sibling 1’s outcome6. y_2 = sibling 2’s outcome7. d = difference of outcomes8. e_1 = error term for sibling 1’s outcome9. e_2 = error term for sibling 2’s outcome*//*Output Information:1. sc_mc
S15 imulated samples (
S16 orvalues i = 1(1)‘mc’ {if floor((‘i’-1)/100)==((‘i’-1)/100) {noisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)*N/A*spillover effect (T2 to Y1)*N/A*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for sibling 1gen e_1=rnormal(0,1)*error term for sibling 2gen e_2=rnormal(0,1)
S17 generate sibling 1’s exposure**t_1 affected by fegen t_1=(fe_t1*fe)>0.5*generate sibling 2’s exposure**t_2 affected by fegen t_2=(fe_t2*fe)>0.2*generate sibling 1’s outcome**y_1 affected by fe, t_1, e_1gen y_1=(ie*t_1)+fe+e_1*generate sibling 2’s outcome**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore
S18 store simulation results in data setreplace sc_mc1=‘sc_sim’ in ‘i’replace sc_se_mc1=‘sc_se_sim’ in ‘i’replace sc_ub_mc1=‘sc_ub_sim’ in ‘i’replace sc_lb_mc1=‘sc_lb_sim’ in ‘i’replace cov1=0replace cov1=1 if sc_lb_mc1<=0.5 & sc_ub_mc1>=0.5}}/************/*SIMULATION 2*Spillover from T1 to Y2*Spillover from T2 to T1*Figure 1B/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc2=.**spillover coefficient standard errorgen sc_se_mc2=.**spillover coefficient 95% CI boundsgen sc_ub_mc2=.gen sc_lb_mc2=.**coverage indicatorgen cov2=.quietly{forvalues i = 1(1)‘mc’ {
S19 f floor((‘i’-1)/100)==((‘i’-1)/100) {noisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)gen ts=0.3*spillover effect (T2 to Y1)*N/A*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)
S20 generate T2**t_2 affected by fegen t_2=(fe_t2*fe)>0.2*generate T1**t_1 affected by fe, t_2gen t_1=(fe_t1*fe)+(ts*t_2)>0.5*generate Y1**y_1 affected by fe, t_1, e_1gen y_1=(ie*t_1)+fe+e_1*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore
S21 store simulation results in data setreplace sc_mc2=‘sc_sim’ in ‘i’replace sc_se_mc2=‘sc_se_sim’ in ‘i’replace sc_ub_mc2=‘sc_ub_sim’ in ‘i’replace sc_lb_mc2=‘sc_lb_sim’ in ‘i’replace cov2=0replace cov2=1 if sc_lb_mc2<=0.5 & sc_ub_mc2>=0.5}}/************/*SIMULATION 3*Spillover from T1 to Y2*Spillover from T1 to T2*Figure 1C/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc3=.**spillover coefficient standard errorgen sc_se_mc3=.**spillover coefficient 95% CI boundsgen sc_ub_mc3=.gen sc_lb_mc3=.**coverage indicatorgen cov3=.quietly{forvalues i = 1(1)‘mc’ {
S22 f floor((‘i’-1)/100)==((‘i’-1)/100) {noisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)gen ts=0.3*spillover effect (T2 to Y1)*N/A*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)*generate T1
S23 *t_1 affected by fegen t_1=(fe_t1*fe)>0.5*generate T2**t_2 affected by fe, t_1gen t_2=(fe_t2*fe)+(ts*t_1)>0.2*generate Y1**y_1 affected by fe, t_1, e_1gen y_1=(ie*t_1)+fe+e_1*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore*store simulation results in data set
S24 eplace sc_mc3=‘sc_sim’ in ‘i’replace sc_se_mc3=‘sc_se_sim’ in ‘i’replace sc_ub_mc3=‘sc_ub_sim’ in ‘i’replace sc_lb_mc3=‘sc_lb_sim’ in ‘i’replace cov3=0replace cov3=1 if sc_lb_mc3<=0.5 & sc_ub_mc3>=0.5}}/************/*SIMULATION 4*Spillover from T1 to Y2*Spillover from T2 to Y1*Figure 2A/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc4=.**spillover coefficient standard errorgen sc_se_mc4=.**spillover coefficient 95% CI boundsgen sc_ub_mc4=.gen sc_lb_mc4=.**coverage indicatorgen cov4=.quietly{forvalues i = 1(1)‘mc’ {if floor((‘i’-1)/100)==((‘i’-1)/100) {
S25 oisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)*N/A*spillover effect (T2 to Y1)gen sp2=0.3*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)*generate T1**t_1 affected by fe
S26 en t_1=(fe_t1*fe)>0.5*generate T2**t_2 affected by fegen t_2=(fe_t2*fe)>0.2*generate Y1**y_1 affected by fe, t_1, t_2, e_1gen y_1=(ie*t_1)+fe+(sp2*t_2)+e_1*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore*store simulation results in data setreplace sc_mc4=‘sc_sim’ in ‘i’
S27 eplace sc_se_mc4=‘sc_se_sim’ in ‘i’replace sc_ub_mc4=‘sc_ub_sim’ in ‘i’replace sc_lb_mc4=‘sc_lb_sim’ in ‘i’replace cov4=0replace cov4=1 if sc_lb_mc4<=0.5 & sc_ub_mc4>=0.5}}/************/*SIMULATION 5*Spillover from T1 to Y2*Spillover from T2 to Y1*Spillover from T2 to T1*Figure 2B/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc5=.**spillover coefficient standard errorgen sc_se_mc5=.**spillover coefficient 95% CI boundsgen sc_ub_mc5=.gen sc_lb_mc5=.**coverage indicatorgen cov5=.quietly{forvalues i = 1(1)‘mc’ {if floor((‘i’-1)/100)==((‘i’-1)/100) {
S28 oisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)gen ts=0.3*spillover effect (T2 to Y1)gen sp2=0.3*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)*generate T2**t_2 affected by fe
S29 en t_2=(fe_t2*fe)>0.2*generate T1**t_1 affected by fe, t_2gen t_1=(fe_t1*fe)+(ts*t_2)>0.5*generate Y1**y_1 affected by fe, t_1, t_2, e_1gen y_1=(ie*t_1)+fe+(sp2*t_2)+e_1*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore*store simulation results in data set
S30 eplace sc_mc5=‘sc_sim’ in ‘i’replace sc_se_mc5=‘sc_se_sim’ in ‘i’replace sc_ub_mc5=‘sc_ub_sim’ in ‘i’replace sc_lb_mc5=‘sc_lb_sim’ in ‘i’replace cov5=0replace cov5=1 if sc_lb_mc5<=0.5 & sc_ub_mc5>=0.5}}/************/*SIMULATION 6*Spillover from T1 to Y2*Spillover from T2 to Y1*Spillover from T1 to T2*Figure 2C/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc6=.**spillover coefficient standard errorgen sc_se_mc6=.**spillover coefficient 95% CI boundsgen sc_ub_mc6=.gen sc_lb_mc6=.**coverage indicatorgen cov6=.quietly{forvalues i = 1(1)‘mc’ {
S31 f floor((‘i’-1)/100)==((‘i’-1)/100) {noisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)gen ts=0.3*spillover effect (T2 to Y1)gen sp2=0.3*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)
S32 generate T1**t_1 affected by fegen t_1=(fe_t1*fe)>0.5*generate T2**t_2 affected by fe, t_1gen t_2=(fe_t2*fe)+(ts*t_1)>0.2*generate Y1**y_1 affected by fe, t_1, t_2, e_1gen y_1=(ie*t_1)+fe+(sp2*t_2)+e_1*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore
S33 store simulation results in data setreplace sc_mc6=‘sc_sim’ in ‘i’replace sc_se_mc6=‘sc_se_sim’ in ‘i’replace sc_ub_mc6=‘sc_ub_sim’ in ‘i’replace sc_lb_mc6=‘sc_lb_sim’ in ‘i’replace cov6=0replace cov6=1 if sc_lb_mc6<=0.5 & sc_ub_mc6>=0.5}}/************/*SIMULATION 7*Spillover from T1 to Y2*Spillover from Y1 to T2*Figure 3A/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc7=.**spillover coefficient standard errorgen sc_se_mc7=.**spillover coefficient 95% CI boundsgen sc_ub_mc7=.gen sc_lb_mc7=.**coverage indicatorgen cov7=.quietly{forvalues i = 1(1)‘mc’ {
S34 f floor((‘i’-1)/100)==((‘i’-1)/100) {noisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)*N/A*spillover effect (T2 to Y1)*N/A*spillover effect (Y1 to T2)gen sp3=0.3*spillover effect (outcome to outcome)*N/A*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)
S35 generate T1**t_1 affected by fegen t_1=(fe_t1*fe)>0.5*generate Y1**y_1 affected by fe, t_1, e_1gen y_1=(ie*t_1)+fe+e_1*generate T2**t_2 affected by fe, y_1gen t_2=(fe_t2*fe)+(sp3*y_1)>0.2*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore*store simulation results in data set
S36 eplace sc_mc7=‘sc_sim’ in ‘i’replace sc_se_mc7=‘sc_se_sim’ in ‘i’replace sc_ub_mc7=‘sc_ub_sim’ in ‘i’replace sc_lb_mc7=‘sc_lb_sim’ in ‘i’replace cov7=0replace cov7=1 if sc_lb_mc7<=0.5 & sc_ub_mc7>=0.5}}/************/*SIMULATION 8*Spillover from T1 to Y2*Spillover from Y1 to Y2*Figure 3B/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc8=.**spillover coefficient standard errorgen sc_se_mc8=.**spillover coefficient 95% CI boundsgen sc_ub_mc8=.gen sc_lb_mc8=.**coverage indicatorgen cov8=.quietly{forvalues i = 1(1)‘mc’ {if floor((‘i’-1)/100)==((‘i’-1)/100) {
S37 oisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}preserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)*N/A*spillover effect (T2 to Y1)*N/A*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)gen os=0.3*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)*generate T1**t_1 affected by fe
S38 en t_1=(fe_t1*fe)>0.5*generate T2**t_2 affected by fegen t_2=(fe_t2*fe)>0.2*generate Y1**y_1 affected by fe, t_1, e_1gen y_1=(ie*t_1)+fe+e_1*generate Y2**y_2 affected by fe, t_2, t_1, y_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+(os*y_1)+e_2*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore*store simulation results in data setreplace sc_mc8=‘sc_sim’ in ‘i’replace sc_se_mc8=‘sc_se_sim’ in ‘i’
S39 eplace sc_ub_mc8=‘sc_ub_sim’ in ‘i’replace sc_lb_mc8=‘sc_lb_sim’ in ‘i’replace cov8=0replace cov8=1 if sc_lb_mc8<=0.5 & sc_ub_mc8>=0.5}}/************/*SIMULATION 9*Spillover from T1 to Y2*Spillover from Y1 to Y2*Figure 3C/************/*set number of simulationslocal mc = 1000*set number of observations for the dataset that stores monte carlo output (i.e., the number of simulations)set obs ‘mc’*set monte carlo simulation ouputs**spillover coefficient valuegen sc_mc9=.**spillover coefficient standard errorgen sc_se_mc9=.**spillover coefficient 95% CI boundsgen sc_ub_mc9=.gen sc_lb_mc9=.**coverage indicatorgen cov9=.quietly{forvalues i = 1(1)‘mc’ {if floor((‘i’-1)/100)==((‘i’-1)/100) {noisily display "Working on ‘i’ out of ‘mc’ at $S_TIME"}
S40 reserveclear/*GENERATE SAMPLE*/*set number of observationsset obs 5000*set identifier for clustergen cluster=_n*targeted effect (both siblings)gen ie=1*spillover effect (T1 to Y2)gen sp=0.5*spillover effect (exposure to exposure)*N/A*spillover effect (T2 to Y1)*N/A*spillover effect (Y1 to T2)*N/A*spillover effect (outcome to outcome)gen os=0.3*fixed effectgen fe=rnormal(0,1)*fixed effect on T1gen fe_t1=2*fixed effect on T2gen fe_t2=3*error term for Y1gen e_1=rnormal(0,1)*error term for Y2gen e_2=rnormal(0,1)*generate T1**t_1 affected by fegen t_1=(fe_t1*fe)>0.5
S41 generate T2**t_2 affected by fegen t_2=(fe_t2*fe)>0.2*generate Y2**y_2 affected by fe, t_2, t_1, e_2gen y_2=(ie*t_2)+fe+(sp*t_1)+e_2*generate Y1**y_1 affected by fe, t_1, y_2, e_1gen y_1=(ie*t_1)+fe+(os*y_2)+e_1*generate difference scoresgen d=y_2-y_1/*RUN REGRESSION*/*regression**t_1 coefficient = Beta(DT1.T2)**t_2 coefficient = Beta(DT2.T1)regress d t_1 t_2*run lincom to generate spillover coefficient(t_1+t_2)**store coefficient, standard error, 95% CI boundslincom t_1+t_2local sc_sim=r(estimate)local sc_se_sim=r(se)local sc_ub_sim=r(ub)local sc_lb_sim=r(lb)restore*store simulation results in data setreplace sc_mc9=‘sc_sim’ in ‘i’replace sc_se_mc9=‘sc_se_sim’ in ‘i’replace sc_ub_mc9=‘sc_ub_sim’ in ‘i’replace sc_lb_mc9=‘sc_lb_sim’ in ‘i’
S42 eplace cov9=0replace cov9=1 if sc_lb_mc9<=0.5 & sc_ub_mc9>=0.5}}/*******************************************************/*3. Output for tables and figures/*******************************************************/**tabulate spillover coefficient and standard error by modeltabstat sc_mc1 sc_se_mc1, stat(mean)tabstat sc_mc2 sc_se_mc2, stat(mean)tabstat sc_mc3 sc_se_mc3, stat(mean)tabstat sc_mc4 sc_se_mc4, stat(mean)tabstat sc_mc5 sc_se_mc5, stat(mean)tabstat sc_mc6 sc_se_mc6, stat(mean)tabstat sc_mc7 sc_se_mc7, stat(mean)tabstat sc_mc8 sc_se_mc8, stat(mean)tabstat sc_mc9 sc_se_mc9, stat(mean)**tabulate coverage probability by modeltab cov1tab cov2tab cov3tab cov4tab cov5tab cov6tab cov7tab cov8tab cov9**create figure/*The figure will plot mean spillover coefficient estimates and95% CIs for each model. Save the mean spillover coefficient,standard errors, and CI bounds for each model.*/
S43 gen sc1=mean(sc_mc1)egen sc2=mean(sc_mc2)egen sc3=mean(sc_mc3)egen sc4=mean(sc_mc4)egen sc5=mean(sc_mc5)egen sc6=mean(sc_mc6)egen sc7=mean(sc_mc7)egen sc8=mean(sc_mc8)egen sc9=mean(sc_mc9)egen se1=mean(sc_se_mc1)egen se2=mean(sc_se_mc2)egen se3=mean(sc_se_mc3)egen se4=mean(sc_se_mc4)egen se5=mean(sc_se_mc5)egen se6=mean(sc_se_mc6)egen se7=mean(sc_se_mc7)egen se8=mean(sc_se_mc8)egen se9=mean(sc_se_mc9)egen ub1=mean(sc_ub_mc1)egen ub2=mean(sc_ub_mc2)egen ub3=mean(sc_ub_mc3)egen ub4=mean(sc_ub_mc4)egen ub5=mean(sc_ub_mc5)egen ub6=mean(sc_ub_mc6)egen ub7=mean(sc_ub_mc7)egen ub8=mean(sc_ub_mc8)egen ub9=mean(sc_ub_mc9)egen lb1=mean(sc_lb_mc1)egen lb2=mean(sc_lb_mc2)egen lb3=mean(sc_lb_mc3)egen lb4=mean(sc_lb_mc4)egen lb5=mean(sc_lb_mc5)egen lb6=mean(sc_lb_mc6)egen lb7=mean(sc_lb_mc7)egen lb8=mean(sc_lb_mc8)egen lb9=mean(sc_lb_mc9)
S44 *Keep the mean spillover coefficients, standard errors, andCI bounds only. Drop duplicates (one observation).*/keep sc1-sc9 se1-se9 ub1-ub9 lb1-lb9duplicates reportduplicates dropduplicates report/*Reshape data so that it’s vertical (one observation per model).*/gen id=1reshape long sc se ub lb, i(id)rename _j modellabel define model_lbl 1 "Figure 1A" 2 "Figure 1B" 3 "Figure 1C" ///4 "Figure 2A" 5 "Figure 2B" 6 "Figure 2C" ///7 "Figure 3A" 8 "Figure 3B" 9 "Figure 3C"label values model model_lbl/*Use a truncated label for models -- saves space on plot.*/label define model2_lbl 1 "Figure 1A" 2 "1B" 3 "1C" ///4 "Figure 2A" 5 "2B" 6 "2C" ///7 "Figure 3A" 8 "3B" 9 "3C"label values model model2_lbl/*Plot the figure.*/
S45 woway(scatter model sc, mcolor(black) ysc(reverse) ///graphregion(color(white)) ///xtitle(Mean Spillover Coefficient (95% CI)) ///ytitle(Sibling Spillover Model) ///xline(0.2, lcolor(black) lpattern(shortdash_dot)) ///xline(0.5, lcolor(black) lpattern(dash)) ///legend(off) xtick(woway(scatter model sc, mcolor(black) ysc(reverse) ///graphregion(color(white)) ///xtitle(Mean Spillover Coefficient (95% CI)) ///ytitle(Sibling Spillover Model) ///xline(0.2, lcolor(black) lpattern(shortdash_dot)) ///xline(0.5, lcolor(black) lpattern(dash)) ///legend(off) xtick(