[PDF] Accounting for selection bias due to death in estimating the effect of wealth shock on cognition for the Health and Retirement Study

Abstract

The Health and Retirement Study is a longitudinal study of US adults enrolled at age 50 and older. We were interested in investigating the effect of a sudden large decline in wealth on the cognitive score of subjects. Our analysis was complicated by the lack of randomization, confounding by indication, and a substantial fraction of the sample and population will die during follow-up leading to some of our outcomes being censored. Common methods to handle these problems for example marginal structural models, may not be appropriate because it upweights subjects who are more likely to die to obtain a population that over time resembles that would have been obtained in the absence of death. We propose a refined approach by comparing the treatment effect among subjects who would survive under both sets of treatment regimes being considered. We do so by viewing this as a large missing data problem and impute the survival status and outcomes of the counterfactual. To improve the robustness of our imputation, we used a modified version of the penalized spline of propensity methods in treatment comparisons approach. We found that our proposed method worked well in various simulation scenarios and our data analysis.

Full PDF

aa r X i v : . [ s t a t . A P ] D ec Accounting for selection bias due to death in estimatingthe eﬀect of wealth shock on cognition for the Healthand Retirement Study

Yaoyuan V. Tan ∗ Department of Biostatistics and Epidemiology, Rutgers UniversityandCarol A.C. FlannaganTransportation Research Institute, University of MichiganLindsay R. PoolDepartment of Preventive Medicine, Northwestern UniversityandMichael R. ElliottDepartment of Biostatistics, University of Michigan

Abstract

The Health and Retirement Study is a longitudinal study of US adults enrolledat age 50 and older. We were interested in investigating the eﬀect of a sudden largedecline in wealth on the cognitive score of subjects. Our analysis was complicated bythe lack of randomization, confounding by indication, and a substantial fraction of thesample and population will die during follow-up leading to some of our outcomes beingcensored. Common methods to handle these problems for example marginal structuralmodels, may not be appropriate because it upweights subjects who are more likely todie to obtain a population that over time resembles that would have been obtainedin the absence of death. We propose a reﬁned approach by comparing the treatmenteﬀect among subjects who would survive under both sets of treatment regimes being ∗ (e-mail: [email protected]) onsidered. We do so by viewing this as a large missing data problem and impute thesurvival status and outcomes of the counterfactual. To improve the robustness of ourimputation, we used a modiﬁed version of the penalized spline of propensity methodsin treatment comparisons approach. We found that our proposed method worked wellin various simulation scenarios and our data analysis. Keywords:

Bayesian additive regression trees; Causal inference; Confounding by in-dication; Longitudinal Study; Missing data; Penalized spline of propensity methods in treat-ment comparisons.

Late middle age adults commonly experience chronic health conditions like high blood pres-sure or diabetes as well as declining cognitive abilities. Factors known to be associatedwith accelerated decrease in cognitive abilities include smoking, high alcohol consumption,physical inactivity, high dietary intake of sodium and saturated fats, low dietary intakeof fruits and vegetables (Lee et al., 2010; Stuck et al., 1999); hypertension, elevated serumcholesterol, diabetes, obesity, cerebrovascular and cardiovascular disease (Plassman et al.,2010); depression, lower socioeconomic status, and exposure to acute stressful life eventsand chronic perceived stress (Krieger, 2001). In particular, the acute stress of a suddendecrease in wealth – “a negative wealth shock” – may have a negative impact on the cog-nitive ability of late middle aged adults. Because income typically exceeds consumptionat this stage in life, sudden decreases in wealth during this period not only decrease theamount of wealth saved for retirement, but there are fewer remaining years left to replenish2he lost wealth (Butrica et al., 2010). The stress of losing substantial wealth during thesavings period of the life cycle coupled with the pressure to replenish the lost wealth canlead to stress-related health conditions which in turn reduces the cognitive ability of an in-dividual (Shrira et al., 2011). In addition, individuals who have received a negative wealthshock may have to reduce consumption of health-enhancing goods and services which in turnleads to poor management of existing chronic conditions, further reducing cognitive abilities(Friedman, 1956).Three issues arise when trying to estimate the causal eﬀect of a negative wealth shockon cognitive ability. The ﬁrst of these is the lack of randomization: negative wealth shocksare not randomly distributed in the population, but rather are confounded by factors suchas gender and socio-economic status. The second issue is confounding by indication: the riskof the wealth shock at any point in time may depend on the prior cognitive ability up to thepoint. Finally, a suﬃciently large fraction of the sample and the population will die duringour follow-up, leading to “censoring by death”. Those observed to have survived a negativewealth shock include those who would survive under either condition together with thosethat would survive only if they experienced a negative wealth shock (if any), while thoseobserved to have survived in the absence of a negative wealth shock include those that wouldsurvive under either condition together with those that would survive only in the absenceof a negative wealth shock. These “missing values” associated with cognition among thedeceased are diﬀerent from the measure of cognition being “missing” due to dropout, wherethe cognitive ability measure exists but is unobserved. As with wealth shock, death is nota random occurrence, and is positively associated with demographic measures that increasethe risk of a negative wealth shock, increased cognitive ability decline, and the experience3f a negative wealth shock. Hence, the measure for cognitive ability may be confounded bydeath if not considered appropriately.Methods have been developed to deal with these barriers to causal inference. To dealwith the lack of randomization, we might hope that, conditional on available covariates,negative wealth shocks would truly be random. In this case, conditioning on the probabilityof receiving a negative wealth shock as a function of these covariates – the propensity scores(Rosenbaum and Rubin, 1983) – can be used to remove the eﬀect of confounding, either byregression, matching, or weighting (Imbens and Rubin, 2015). For the second issue – con-founding by indication – marginal structural models (MSM, Robins et al., 2000) and morerecently, penalized spline of propensity methods in treatment comparisons (PENCOMP,Zhou et al., 2018), have been used to account for confounding by the time-dependence as-sociation of the cognitive measures, either by weighting using the inverse probability oftreatment actually received based on the previous values of the time-varying covariates andoutcomes (MSM), or by imputation of the missing counterfactual values (PENCOMP). Forcensoring by death, MSMs have typically been extended by multiplying the treatment as-signment weights with the inverse of the predicted probability of death (Weuve et al., 2012).The issue with this approach – perhaps under appreciated – is that the resulting pseudo-population is not only balanced with respect to exposure “assignment”, but also “immortal”,in the sense that those more likely to die are upweighted so that the population over timeresembles that would have been obtained in the absence of death up till time t (Chaix et al.,2012). This is arguably not a sensible population for inference, at least from a policy andpublic health perspective.A more reﬁned approach would be to compare the diﬀerence in the eﬀect of negative4ealth shock on cognitive ability among subjects who would have survived whether theyexperienced a negative wealth shock or not. This approach is consistent with the potentialoutcomes approach of Neyman (1934) and Rubin (1974), which deﬁnes causal eﬀects as thewithin-subject diﬀerence of an outcome at a particular time under diﬀerent exposure ortreatment regimen, averaged over the population. This idea is not new (Elliott et al., 2006)and can be viewed as a speciﬁc example of the principal stratiﬁcation (PS) method discussedin Frangakis and Rubin (2002). Our innovation here is to embed this in a longitudinal settingwhere confounding by indication is present. We view this as a large missing data problemwhere survival status and, among survivors, unobserved outcomes under a given treatmentpattern, are imputed. We extend the method proposed in Example 3 of Elliott and Little(2015), which provides a Bayesian MSM approach to compare two treatments at two timepoints. This approach was further extended by PENCOMP in Zhou et al. (2018) which, likeaugmented inverse probability weighting (AIPWT, Robins et al., 1994), has a doubly-robustproperty in that if either the mean or propensity model is correctly speciﬁed, consistentestimates of the causal eﬀect will be obtained. We modiﬁed PENCOMP slightly usingBayesian additive regression trees (BART), a ﬂexible model to ease the burden of modelspeciﬁcation by the researcher, and apply this to our proposed method.We organize our paper as follows. We set up the framework for our problem, and pro-vide a brief review of of MSM, PENCOMP, and Bayesian additive regression trees (BART)in Section 2. We develop our proposed method in Section 3. We then explore some of theempirical properties of our proposed method compared to a na¨ıve method and MSM usinga simulation study in Section 4. Section 5 describes the HRS data and the results of ournegative wealth shock analysis. Section 6 concludes with a discussion of the implication of5ur results as well as future work. Let V = { V , V , . . . , V p } be p baseline covariates, Z t be the treatment allocation at time t = 1 , . . . , T where Z t = 1 indicates a subject receiving a negative wealth shock at t and Z t = 0 indicates no negative wealth shock, and W t = { W t , W t , . . . , W qt } be q covariatesthat may vary with time, but are unaﬀected by a given treatment regimen. For exam-ple, ﬁxed covariates by deﬁnition would belong to this class. Let Y Z ,...,Z t be the potentialoutcome under treatments Z , . . . , Z t and X Z ,...,Z t = { X Z ,...,Z t , , X Z ,...,Z t , , . . . , X Z ,...,Z t ,r } be the time-varying covariates aﬀected by treatments Z , . . . , Z t . Similarly, we deﬁne thepotential survival indicator S Z ,...,Z t − , for survival at time t . The survival outcome at t measures whether a subject would survive after being exposed to treatment Z , . . . , Z t − ;hence, the lagged notation for the potential survival outcome, S Z ,...,Z t − . v , z t , w t , y z ,...,z t , x z ,...,z t , and s z ,...,z t indicate the observed baseline, treatment allocation, time varying covari-ates unaﬀected by a given treatment regimen, outcome, time-varying covariates aﬀected bya given treatment regime, and survival status variables respectively. As in Pool et al. (2018),we assume that a negative wealth shock is an “absorbing state” so that once a subject re-ceives a negative wealth shock at time t , i.e. Z t = 1, the subject is “forever” shocked, i.e. Z t +1 = . . . = Z T = 1. Note that this need not be the case for a more general set up wherewe could have Z t = 0 when Z j = 1 for any j = 1 , . . . , t −

1. In our context, the potentialoutcomes for time t = 2 are then Y Z =0 ,Z =0 = Y , Y Z =0 ,Z =1 = Y , and Y Z =1 ,Z =1 = Y ;6imilarly, X Z =0 ,Z =0 = X , X Z =0 ,Z =1 = X , and X Z =1 ,Z =1 = X for time-varying co-variates under the various treatment regimes; and S Z =0 = S , S Z =1 = S for survival states.Subjects who die at time t have structurally missing data for outcomes and covariates i.e., S = 0 implies that Y = Y = N A and X = X = N A , while S = 0 implies that Y = N A and X = N A , where ‘NA’ indicates a structurally missing observation.

To estimate the causal eﬀect for confounding by indication and censoring by death problems,MSM makes the following assumptions. First, MSM assumes that P ( S z ,...,z t − | z , . . . , z t − , y z , . . . , y z ,...,z t − , x z , . . . , x z ,...,z t − , w , . . . , w t − , v ) > . (1)and P ( Z t | z , . . . , z t − , y z , . . . , y z ,...,z t − , x z , . . . , x z ,...,z t − , w , . . . , w t − , v ) > z t i.e. the probability of survival under treatment proﬁle z , . . . , z t − and the prob-ability of treatment allocation for time t is bounded away from 0. This is an extension ofthe standard positivity assumption to allow that at least some subjects will survive under agiven treatment regimen. Second, MSM assumes that there is no interference between sub-jects i.e. the potential outcome of subject i , Y i,Z ,...,Z t = Y i,z ,...,z t , is independent of whatevertreatment regimen subject j is allocated to i = j . Third, MSM assumes no unmeasuredconfounding and sequential randomization condition Y Z ,...,Z t ⊥ Z t | z , . . . , z t − , y z ,...,z t − , . . . , y z , x z ,...,z t − , . . . , x z , w , . . . , w t − , v. Finally, MSM assumes that the model speciﬁcations for Equations 1, 2, and Y z ,...,z t | z , . . . , z t , y z ,...,z t − , . . . , y z , x z ,...,z t − , . . . , x z , w , . . . , w t − , v E [ Y z ,...,z t − Y z ′ ,...,z ′ t ] (note that this estimand is notconditioned on the survival status) is obtained by maximizing the weighted likelihood of n Y i =1 f ( Y i ; z ,...,z t | θ it ) w it , (3)where i indexes the subjects and θ it are the parameters involved in the model for Y i ; z ,...,z t and w it = [ t Y j =1 P ( Z ij = z ij | z i , . . . , z i,j − , y i , . . . , y i,j − , x i , . . . , x i,j − , w i , . . . , w i,j − , v i ; τ j )] − . (4) By weighting using the inverse probability of receiving the observed treatment regime givenall covariates and previous treatments, the association between treatment and all observedconfounders, including confounding by indication, are broken. Under these four assump-tions, inference about the treatment eﬀects under a pseudo-population in which treatmentis randomized can then be obtained.Similarly, this weighting method can be used to remove bias due to dropout. Let R i = 1indicate that the subject’s cognitive score is observed and R i = 0 indicate that the subject’scognitive score is missing. The weight used to account for missing cognitive score is w rit = [ t Y j =1 P ( R ij = r ij | r i , . . . , r i,j − , z i , . . . , z i,j − , y i , . . . , y i,j − , x i , . . . , x i,j − , w i , . . . , w i,j − , v i ; γ j )] − . (5) Finally, death is typically treated as equivalent to dropout in MSM (Do et al., 2013;Pool et al., 2018). Let D it = 1 indicate that subject i is dead at time t and D it = 0 indicatethat the subject survived at time t (thus D it = 1 − S it ). The weight for death censoring isthen w dit = [ t Y j =1 P ( D ij = d ij | z i , . . . , z i,j − , y i , . . . , y i,j − , x i , . . . , x i,j − , w i , . . . , w i,j − , v i ; λ j )] − . (6) w fit = w it w dit w rit . To stabilize the weights, the numerators of Equations4, 5, and 6 are replaced by the marginal probabilities of treatment, dropout, and death atbaseline given by t Y j =1 P ( Z ij = z ij | z i , . . . , z i,j − , v i ; τ ′ j ) , t Y j =1 P ( R ij = r ij | r i , . . . , r i,j − , v i ; γ ′ j ) , and t Y j =1 P ( D ij = d ij | v i ; λ ′ j )respectively. We use the stabilized weights in our simulations and analysis. PENCOMP uses the same four assumptions made by MSM excluding Equation 1 for con-founding by indication problems. Full details of PENCOMP can be found in Zhou et al.(2018). We brieﬂy describe the algorithm for PENCOMP using multiple imputation (MI)with longitudinal treatment assignments here. Without loss of generality, we assume notime-varying covariates in the data.1. For b = 1 , . . . , B , generate a bootstrap sample S ( b ) from the original data S by samplingunits with replacement, stratiﬁed on treatment group. For each sample b , carry outsteps 2-7.2. Estimate a logistic regression model for the distribution of Z given baseline covariates9 with regression parameters γ z . Estimate the propensity to be assigned treatment Z = z as ˆ P z ( V ) = P r ( Z = z | V ; ˆ γ bz ), where ˆ γ bz is the maximum likelihood (ML)estimate of γ z . Deﬁne ˆ P ∗ z = log[ ˆ P z ( V )1 − ˆ P z ( V ) ].3. Using the cases assigned to treatment group Z = z , estimate a normal linear regres-sion of Y z on V , with mean E ( Y z | V, Z = z , θ z , β z ) = s ( ˆ P ∗ z | θ z ) + g z ( ˆ P ∗ z , V ; β z ) , (7)where s ( ˆ P ∗ z | θ z ) denotes a penalized spline with ﬁxed knots and parameters θ z and g z ( . ) represents a parametric function of other predictors of the outcome, indexed byparameters β z . One of the covariates might be omitted to avoid collinearity in thecovariates in Equation 7.4. For z = 0 ,

1, impute the values of Y z for subjects in treatment group 1 − z inthe original data with draws from the predictive distribution of Y z given V from theregression in Step 3, with the ML estimates ˆ θ ( b ) z , ˆ β ( b ) z substituted for the parameters θ ( b ) z , β ( b ) z .5. Estimate a logistic regression model for the distribution of Z given V, Z , ( Y , Y ),with regression parameters γ z and missing values of ( Y , Y ) imputed from Step 4.Estimate the propensity to be assigned treatment Z = z given Z , Y Z , and V asˆ P z ( Z , Y Z , V ) = P r ( Z = z | Z = z , Y z , V ; ˆ γ ( b ) z ), where ˆ γ ( b ) z is the ML estimate of γ z . The probability of treatment regimen ( Z = z , Z = z ) is denoted as ˆ P z z =ˆ P z ( V ) ˆ P z ( Z , Y Z , V ), and deﬁne ˆ P ∗ z ,z = log[ ˆ P z z − ˆ P z z ].6. Using the cases assigned to treatment group ( z , z ), estimate a normal linear regressionof Y z ,z on Z , Z , Y Z , and V with mean E ( Y z ,z | V, Y z , Z = z , Z = z , θ z ,z , β z ,z ) = s ( ˆ P ∗ z ,z | θ z ,z ) + g z ,z ( ˆ P ∗ z ,z , Z , Z , Y Z , V ; β z ,z ) . (8)

10. For each combination of ( z , z ) impute the values of Y z ,z for subjects not assignedthis treatment combination in the original data with draws from the predictive distri-bution of Y z ,z in Step 6, with ML estimates ˆ θ ( b ) z ,z , ˆ β ( b ) z ,z substituted for the parameters θ ( b ) z ,z , β ( b ) z ,z . Let ˆ∆ ( b )01 , = E [ Y − Y ], ˆ∆ ( b )11 , = E [ Y − Y ], and ˆ∆ ( b )11 , = E [ Y − Y ]denote the average treatment eﬀects, ˆ∆ ( b ) jk,lm , with associated pooled variance estimates W ( b ) jk,lm , based on the observed and imputed values of Y for each treatment regimen.8. The MI estimate of ∆ jk,lm is then ¯∆ jk,lm,B = P Bb =1 ˆ∆ ( b ) jk,lm , and the MI estimateof the variance of ¯∆ jk,lm is T B = ¯ W jk,lm,B + (1 + 1 /B ) D jk,lm,B , where ¯ W jk,lm,B = P Bb =1 W ( b ) jk,lm /B , D jk,lm,B = P Bb =1 ( ˆ∆ ( b ) jk,lm − ¯∆ jk,lm,B ) B − . The estimate ∆ jk,lm follows a t distribution with degree of freedom ν , ∆ jk,lm − ¯∆ jk,lm,B √ T B ∼ t ν , where ν = ( B − ¯ W jk,lm,B D jk,lm,B ( B +1) ) . BART (Chipman et al., 2010) is a ﬂexible estimation technique for any arbitrary function.Suppose we have a continuous outcome Y and corresponding p predictors X = ( X , . . . , X p ).Suppose Y is related to X via Y = f ( X ) + e (9)where f ( . ) is any arbitrary function which could involve complicated non-linear and multiple-way interactions and e ∼ N (0 , σ ). Formally, BART is written as Y = m X j =1 g ( X, T j , M j ) + e (10)where ( T j , M j ) is the joint distribution of the j th binary tree structure T j with its corre-sponding b j terminal node parameters M j = ( µ j , . . . , µ b j j ). m is the number of regression11rees used to estimate f ( X ) and it is usually ﬁxed at 200.BART is able to model multiple-way interactions by using regression trees. In essence,a binary regression tree in BART may be viewed as a penalized form of an Analysis ofVariance (ANOVA) model. When the binary regression tree only splits on one variable forthe whole tree, a main eﬀects model is obtained. When the regression tree involve splitson many diﬀerent variables, a multiple-way interaction model is obtained. BART combinesall m regression trees together in an additive manner to obtain non-linear estimates of themain and interaction eﬀects. This additive procedure is done by ﬁrst ‘breaking’ Y into m equal ‘pieces’ and ﬁtting a regression tree to each piece. Subsequently, the regression treein each m piece is then estimated by looking at the residual produced by the other m − f ( X ). When the default priors of BART suggested by Chipman et al. (2010)are assumed, the MCMC ensures that the eventual distribution of the the sum of regressiontrees is concentrated around the true distribution of the model (Rockova and van der Pas,2017).For binary outcomes, BART uses a probit link where P ( Y = 1 | X ) = Φ( m X j =1 g [ X, T j , M j ]) (11)where Φ( . ) is the cdf of a standard normal distribution. Estimation of the posterior dis-tribution is similar to that of continuous outcomes but with the use of data augmentationmethods, i.e. draw a continuous latent variable based on whether Y = 1 or Y = 0 and thenrun the BART algorithm on the drawn latent variables.Kapelner and Bleich (2015) suggested a procedure to allow the BART algorithm to12nclude covariates that might contain missing values. In brief, the missingness in the co-variates are not imputed but instead, viewed as a ‘value level’ in the MCMC algorithm.The MCMC algorithm then ‘sends’ missing data to terminal nodes in the regression treesthat would maximize the likelihood. This is termed as “Missing Incorporated in Attributes”(MIA, Twala et al., 2008, Section 2). Kapelner and Bleich (2015) showed using simulationexamples that incorporating MIA within BART allows the appropriate handling of diﬀerenttypes of missing mechanism, MCAR, MAR, and NMAR, for each covariate. We utilize thisapproach to accommodate the missingness in our covariates for the data analysis. To determine the principal strata deﬁnition, we ﬁrst investigated what the data for ourproblem could potentially look like. We constructed Table 1 for t = 3, p = 1, and no time-varying covariates without loss of generality. In this table, ‘x’ indicates an observed value, ‘?’represent a missing observation which needs to be imputed, and ‘NA’ indicates a structurallymissing observation. For the potential survival outcomes, we did not indicate whether theywere missing or observed because we wanted to use Table 1 to help us decide how we shouldbe stratifying our subjects once our proposed method imputes the counterfactual survivalstatus.From Table 1, we can see that the goal of our analysis is to provide inference about E [ Y Z ,...,Z t − Y Z ′ ,...,Z ′ t | S Z ,...,Z t − = S Z ′ ,...,Z ′ t − = 1], where Z l = Z ′ l for at least one l with l =13 , . . . , t i.e. we condition on subjects who would potentially survive under two diﬀerent treat-ment regimes Z , . . . , Z t − and Z ′ , . . . , Z ′ t − . Thus, the distribution of ( S Z ,...,Z t − , S Z ′ ,...,Z ′ t − )form our principal strata and meaningful contrasts are deﬁned only in the stratum where S Z ,...,Z t − = S Z ′ ,...,Z ′ t − = 1 since the potential outcomes for the two diﬀerent treatmentregimes exist only in this stratum. For example, if we want to estimate the eﬀect for a nega-tive wealth shock at t = 2 versus no negative wealth shock by t = 2 that is E [ Y − Y | S = 1],we restrict to subjects who survive if they did not receive a negative wealth shock at t = 1 i.e. subjects with S = 1 (Subjects 1-12 in Table 1). Note that the deﬁnition, E [ Y Z ,...,Z t − Y Z ′ ,...,Z ′ t | S Z ,...,Z t − = S Z ′ ,...,Z ′ t − = 1], is diﬀerent from the parameter MSM esti-mates which is E [ Y Z ,...,Z t − Y Z ′ ,...,Z ′ t ]. We make the same four assumptions used by MSM (See Section 2.2). We impose a furtherﬁfth assumption of strict monotonicity in thatIf Z ≤ Z ′ , Z ≤ Z ′ , . . . , Z t − ≤ Z ′ t − , Z i = Z ′ i for any i, then S Z ,...,Z t − ≥ S Z ′ ,...,Z ′ t − . (12)As a consequence, we have for example, when t = 2, if S = 0 then S = 0 and if S = 1then S = 1. This means that we rule out the possibility of a subject who does not receive anegative wealth shock and dies but would survive if having received a neagtive wealth shock.Conversely, if a subject survives after having received a negative wealth shock, we rule outthe possibility that this same subject would die if he or she did not receive a negative wealthshock.Our proposed method then estimates E [ Y Z ,...,Z t − Y Z ′ ,...,Z ′ t | S Z ,...,Z t − = S Z ′ ,...,Z ′ t − = 1]14y imputing the survival status of each subject at the current time t and then combine the im-puted counterfactual survival status together with the observed survival status to determinewhich principal stratum a subject belongs to. We then use a slightly modiﬁed PENCOMPto impute the counterfactual outcomes among the potentially surviving subjects to accountfor the bias due to confounding by indication. This approach is doubly robust and reducesthe burden of model speciﬁcation by the researcher. Subsequently, the average diﬀerence inthe treatment eﬀect within the desired principal strata is calculated. Variance is estimatedusing Rubin’s combine rule to account for the imputation uncertainty (Heitjan and Little,1991). Detailed steps for our method are given below.1. Generate a bootstrap sample b from the data by sampling the units with replacement.2. Estimate the model X ( b ) z ( b )1 | Z ( b )1 = z ( b )1 , W ( b )1 , V ( b ) . Use this model to compute the coun-terfactual of X ( b ) z ( b )1 for bootstrap sample b .3. Estimate the distribution of Z ( b )1 | W ( b )1 , V ( b ) . Use this model to estimate the propensityto be assigned treatment Z ( b )1 = z ( b )1 as P ∗ z ( b )1 = P r ( Z ( b )1 = z ( b )1 | W ( b )1 , V ( b ) ). Note that wedid not perform a logit transformation to obtain P ∗ z ( b )1 (See PENCOMP Steps 2 and 5).This is because by using PENCOMP modiﬁed with BART to predict the outcomes, thenon-linear eﬀect of the propensity of assigned treatment will be handled automatically.Hence, any non-linear transformation on the propensity of assigned treatment wouldnot be needed.4. Estimate the model Y ( b ) z ( b )1 | P ∗ z ( b )1 , Z ( b )1 = z ( b )1 , X ( b ) z ( b )1 , W ( b )1 , V ( b ) . As mentioned, we usedPENCOMP modiﬁed with BART to estimate this model. The advantage of usingBART is the researcher no longer needs to specify the model. BART automatically15akes care of any linear or non-linear main eﬀects as well as linear or non-linear inter-actions. If we observe Equations 7 and 8, we can see that these two equations are con-structed using a non-linear spline speciﬁcation on the propensity of assigned treatmentcombined with possible linear interactions between the propensity of assigned treat-ment and remaining covariates. This ﬁts well with the type of estimation problemsthat BART was designed to solve. We then use the model produced by BART-modiﬁedPENCOMP to compute the counterfactual of Y ( b ) z ( b )1 for bootstrap sample b .5. Estimate the distribution for S ( b ) z ( b )1 | Z ( b )1 = z ( b )1 , Y ( b ) z ( b )1 , X ( b ) z ( b )1 , W ( b )1 , V ( b ) at t = 2. Use thismodel to generate a survival status for the counterfactual of S ( b ) z b taking into accountthe assumption of monotonicity in Equation (12) i.e. if S is observed and S = 0 then S = 0. Similarly, if S is observed and S = 1 then S = 1.6. Estimate the model X ( b ) z ( b )1 ,z ( b )2 | Z ( b )1 = z ( b )1 , Z ( b )2 = z ( b )2 , Y ( b ) z ( b )1 , X ( b ) z ( b )1 , W ( b )1 , W ( b )2 , V ( b ) . Usethe respective models to impute the counterfactual of X ( b ) z ( b )1 ,z ( b )2 , using any previouslyimputed values for the unobserved treatment regimes and restricting to the subjectsthat are observed and predicted to survive under the given treatment regimen of interestat t = 1.7. Estimate the distribution of Z ( b )2 | Z ( b )1 = z ( b )1 , Y ( b ) z ( b )1 , X ( b ) z ( b )1 , W ( b )1 , W ( b )2 , V ( b ) . Use this modelto estimate the propensity to be assigned treatment Z ( b )2 = z ( b )2 as P z ( b )2 = P r ( Z ( b )1 = z ( b )1 | X ( b ) z ( b )1 , Z ( b )1 = z ( b )1 , W ( b )1 , V ( b ) ). The probability of treatment regimen ( Z ( b )1 = z ( b )1 , Z ( b )2 = z ( b )2 ) is denoted as P ∗ z ( b )2 = P z ( b )2 P ∗ z ( b )1 .8. Estimate the model Y ( b ) z ( b )1 ,z ( b )2 | P ∗ z ( b )2 , Z ( b )1 = z ( b )1 , Z ( b )2 = z ( b )2 , Y ( b ) z ( b )1 , X ( b ) z ( b )1 , X ( b ) z ( b )1 ,z ( b )2 , W ( b )1 , W ( b )2 , V ( b ) t = 2. Use the respective models to impute thecounterfactual of Y ( b ) z ( b )1 ,z ( b )2 .9. Using a similar procedure for steps 5-8 with the restriction determined by S ( b ) z ( b )1 ,...,z ( b ) t − = S ( b ) z ′ ( b )1 ,...,z ′ ( b ) t − = 1 for time t where at least one z ( b ) t = z ′ ( b ) t , extend the estimation untilthe desired time point t = T .10. Repeat Steps 1-9 to obtain B bootstrap values forˆ∆ ( b ) z ( b )1 ,...,z ( b ) t − ,z ′ ( b )1 ,...,z ′ ( b ) t − = E [ Y ( b ) z ( b )1 ,...,z ( b ) t − − Y ( b ) z ′ ( b )1 ,...,z ′ ( b ) t − | S ( b ) z ( b )1 ,...,z ( b ) t − = S ( b ) z ′ ( b )1 ,...,z ′ ( b ) t − = 1] . with associated pooled variance W ( b ) z ( b )1 ,...,z ( b ) t − ,z ′ ( b )1 ,...,z ′ ( b ) t − .11. The estimate of∆ Z ,...,Z t ,Z ′ ,...,Z ′ t = E [ Y Z ,...,Z t − Y Z ′ ,...,Z ′ t | S Z ,...,Z t − = S Z ′ ,...,Z ′ t − = 1]is then ¯∆ z ,...,z t ,z ′ ,...,z ′ t ,B = B X b =1 ( ˆ∆ ( b ) z ( b )1 ,...,z ( b ) t − ,z ′ ( b )1 ,...,z ′ ( b ) t − ) /B, and the estimate of the variance of ¯∆ z ,...,z t ,z ′ ,...,z ′ t ,B is T B = ¯ W z ,...,z t ,z ′ ,...,z ′ t ,B + (1 + 1 /B ) D z ,...,z t ,z ′ ,...,z ′ t ,B , where ¯ W z ,...,z t ,z ′ ,...,z ′ t ,B = B X b =1 ( W ( b ) z ( b )1 ,...,z ( b ) t − ,z ′ ( b )1 ,...,z ′ ( b ) t − ) /B and D z ,...,z t ,z ′ ,...,z ′ t ,B = B X b =1 ( ˆ∆ ( b ) z ( b )1 ,...,z ( b ) t − ,z ′ ( b )1 ,...,z ′ ( b ) t − − ¯∆ z ,...,z t ,z ′ ,...,z ′ t ,B ) B − . Z ,...,Z t ,Z ′ ,...,Z ′ t follows a t distribution with degree of freedom ν ,∆ Z ,...,Z t ,Z ′ ,...,Z ′ t − ¯∆ z ,...,z t ,z ′ ,...,z ′ t ,B √ T B ∼ t ν , where ν = ( B − ¯ W z ,...,zt,z ′ ,...,z ′ t,B D z ,...,zt,z ′ ,...,z ′ t,B ( B +1) ) . Remark . The idea of including the BART estimated propensity score within BARTas a predictor in Steps 4 and 8 is not new. Hahn et al. (2018) showed that including aBART estimated propensity score as a predictor within BART improved the estimation ofheterogenous treatment eﬀects for observational studies. Tan et al. (2018) also reportedthat the inclusion of the BART estimated propensity score as a predictor within BART toimpute missing data, under the missing at random assumption, worked well in situationswhere the non-linear main and interaction eﬀects are complex for the mean and propensitymodel. For situations with simpler non-linear eﬀects like a quadratic relationship, usingBART to estimate the propensity score and imputing the missing values using penalizedsplines of propensity prediction (Zhang and Little, 2009, PENCOMP version for missingdata) worked better. Using PENCOMP with a BART estimated propensity score for Steps4 and 8 would be an interesting alternative. However, our aim of Steps 4 and 8 was to easethe implementation burden on the researcher. Hence, we suggest the use of PENCOMP witha BART estimated propensity score for Steps 4 and 8 only if the researcher is certain thatthe non-linear eﬀect has a simple form for example, a quadratic or cubic relationship.

We conducted a simulation study to determine how well our proposed method would per-form compared to the na¨ıve method and MSM in three scenarios: 1) where there is low18ssociation between treatment allocation and confounder as well as treatment and survivalstatus; 2) where there is a strong association between treatment and confounder as well astreatment and survival status; and ﬁnally 3) where there is a strong association betweentreatment and confounder, treatment and survival status, and an interaction between treat-ment, confounder, and survival status. We expect all three methods to perform well in theﬁrst scenario because there is little to no confounding. For the second scenario, we expectMSM and our proposed method to perform well because there is no diﬀerence in the treat-ment eﬀect between the principal strata, and other stratiﬁcation groups. The na¨ıve methodshould not perform well due to the strong association between treatment and confounderas well as treatment and survival status. Finally, for scenario three, we expect only ourproposed method to perform well because an association between the treatment eﬀect andprincipal strata, S Z ,...,Z t − = S Z ′ ,...,Z ′ t − = 1, is induced by the stronger interaction eﬀectbetween treatment, confounder, and survival status. We ﬁt standard linear and logistic re-gression models rather than BART and PENCOMP with BART since our focus is not onmodel misspeciﬁcation but rather, the eﬀect of confounding by indication and censoring bydeath. To set up our simulation study, we set the size of our target population as 1 million. Wethen generate a single baseline variable V from a normal distribution. We set T = 3 andmodel our treatment allocation, Z , as logit [ P ( Z = 1 | V )] = γ + γ V. (13)19or the potential outcome at t = 1, Y Z , we model it as Y Z = β + β Z I { Z = 1 } + β V V + β V Z

V I { Z = 1 } + e, (14)where e ∼ N (0 , t = 2, S Z as logit ( P [ S Z = 1 | V, Y Z ]) = α + α Y Y I { Z = 1 } + α Y Y [1 − I { Z = 1 } ]+ α Z I { Z = 1 } + α V V + α V Z

V I { Z = 1 } . (15)Monotonicity is imposed by setting S = 1 if S = 1. Because a negative wealth shock is anabsorbing state, if Z = 1, then Z = 1. So when Z = 0, we have logit ( P [ Z = 1 | V, Y ]) = γ + γ Y , Y + γ V. (16)We model the potential outcome at t = 2, Y Z ,Z as Y Z ,Z = β + β Z I { Z = 0 , Z = 1 } + β Z I { Z = 1 , Z = 1 } + β Y Z Y I { Z = 0 , Z = 0 } + β Y Z Y I { Z = 0 , Z = 1 } + β Y Z Y I { Z = 1 , Z = 1 } + β V V + β V Z V I { Z = 0 , Z = 1 } + β V Z V I { Z = 1 , Z = 1 } + e, (17)where e ∼ N (0 , t = 3, S Z ,Z , if S Z = 0, then S Z ,Z = 0. When20 Z = 1, we have logit ( P [ S Z ,Z = 1 | X, Y Z ,Z , S Z = 1]) = α + α Z I { Z = 0 , Z = 1 } + α Z I { Z = 1 , Z = 1 } + α Y Z Y I { Z = 0 , Z = 0 } + α Y Z Y I { Z = 0 , Z = 1 } + α Y Z Y I { Z = 1 , Z = 1 } + α V V + α V Z V I { Z = 0 , Z = 1 } + α V Z V I { Z = 1 , Z = 1 } . (18)Again, we impose monotonicity by setting S = S = 1 if S = 1 and S = 1 if S = 1.For the treatment allocation at t = 3, Z , if Z = Z = 0, we have logit ( P [ Z = 1 | X, Y ]) = γ + γ Y Y + γ Y , Y + γ V. (19)For the potential outcome at t = 3, Y Z ,Z ,Z , we have Y Z ,Z ,Z = β + β Z I { Z = 0 , Z = 0 , Z = 1 } + β Z I { Z = 0 , Z = 1 , Z = 1 } + β Z I { Z = 1 , Z = 1 , Z = 1 } + β Y Z Y I { Z = 0 , Z = 0 , Z = 0 } + β Y Z Y I { Z = 0 , Z = 0 , Z = 1 } + β Y Z Y I { Z = 0 , Z = 1 , Z = 1 } + β Y Z Y I { Z = 1 , Z = 1 , Z = 1 } + β Y Z Y I { Z = 0 } + β Y Z Y I { Z = 1 } + β V V + β V Z

V I { Z = 0 , Z = 0 , Z = 1 } + β V Z

V I { Z = 0 , Z = 1 , Z = 1 } + β V Z

V I { Z = 1 , Z = 1 , Z = 1 } + e. (20)21able 2 shows the parameters we used to achieve the three diﬀerent simulation sce-narios. Scenario 1 is achieved by setting γ , α Z , γ , γ Y , , α Z , α Z , γ , γ Y , , and γ Y tobe about 10 times smaller than the values in Scenarios 2 and 3. The rest of the diﬀerencesbetween Scenario 1 versus 2 and 3 were to ensure the resulting simulated population wouldhave enough deaths and subjects in the various diﬀerent treatment regimes for the assump-tions used by MSM and our proposed method to be valid. The diﬀerence between Scenario2 versus 3 lie in β V Z , α Y , α Y , β Y Z , β Y Z , β Y Z , α Y Z , α Y Z , α Y Z , β Y Z , β Y Z , β Y Z , and β Y Z where the values for Scenario 2 is about 10 times smaller compared toScenario 3.To calculate the true parameters, we used the generated population data (size 1 mil-lion), and then took:1. ∆ , = ¯ Y − ¯ Y ;2. ∆ , = ¯ Y − ¯ Y given S = 1;3. ∆ , = ¯ Y − ¯ Y given S = S = 1;4. ∆ , = ¯ Y − ¯ Y given S = S = 1;5. ∆ , = ¯ Y − ¯ Y given S = 1;6. ∆ , = ¯ Y − ¯ Y given S = S = 1;7. ∆ , = ¯ Y − ¯ Y given S = S = 1;8. ∆ , = ¯ Y − ¯ Y given S = S = 1;9. ∆ , = ¯ Y − ¯ Y given S = S = 1; and220. ∆ , = ¯ Y − ¯ Y given S = S = 1.We measured performance using the empirical bias, root mean squared error (RMSE),95% coverage, and the average 95% Conﬁdence Interval (CI) length (AIL). 1000 simulationswere used to estimate these quantities. Under each simulation, a simple random sample of4,000 or 8,000 subjects was drawn from the target population data. All methods were thenimplemented on the sampled data to obtain the eﬀect estimates. For MSM and our proposedmethod, the models were speciﬁed using Equations 13 to 20 respectively. For our proposedmethod, because our focus is not on model misspeciﬁcation but rather, confounding by in-dication and censoring by death, we chose to implement a simpler version of our methodby skipping Steps 3 and 7 of our algorithm and using Y ( b ) z ( b )1 | Z ( b )1 = z ( b )1 , X ( b ) z ( b )1 , W ( b )1 , V ( b ) and Y ( b ) z ( b )1 ,z ( b )2 | Z ( b )1 = z ( b )1 , Z ( b )2 = z ( b )2 , Y ( b ) z ( b )1 , X ( b ) z ( b )1 , X ( b ) z ( b )1 ,z ( b )2 , W ( b )1 , W ( b )2 , V ( b ) for Steps 4 and 8 respec-tively. We also simpliﬁed the prediction of the potential outcomes and survival status byusing linear and logistic regression instead of BART. Table 3 shows the simulation results for sample size of 4,000. As expected, under Scenario 1,all three methods were relatively unbiased with all three methods achieving similar RMSE.MSM and our proposed method reported slightly greater than nominal coverage due to thewider AIL. Under Scenario 2, the absolute bias of the na¨ıve method was always larger thanMSM and our proposed method. RMSE was larger as well in comparison and coveragewas often far below the nominal 95% value. For this scenario MSM produced the lessconservative coverage while our proposed method suggested better bias performance andreduced RMSE. Finally, under Scenario 3, the na¨ıve method was clearly biased with poor23MSE and coverage. MSM performed slightly better compared to the na¨ıve method butabsolute bias clearly increased compared to Scenario 2. Coverage for some treatment eﬀectswere poor as well. Our proposed method remained unbiased, produced a lower RMSEcompared to the other two methods, and reached nominal coverage under Scenario 3. Allmethods behaved as expected under these three scenarios.Table 4 shows the results with the sample size increased to 8,000, approximately thesample size in our application. The simulation results for all three methods under Scenario1 remained relatively similar. Under Scenario 2, an increase in sample size did not aﬀect theabsolute bias of all three methods but, the coverage of the na¨ıve method was clearly aﬀectedwith huge decreases in the coverage for all parameters. Coverage for MSM and our proposedmethod remained fairly similar. Finally, under Scenario 3, we observe once again that theamount of bias for the three methods remained the same but, coverage for the na¨ıve methodand MSM decreased for most of the treatment eﬀects when the sample size increased to8,000. Coverage for our proposed method remained relatively similar to the results observedfor the sample size of 4,000. In summary, bias for the three methods was rather stable whenthe sample size changed. However, if the method is poor in the estimation of the particulartreatment eﬀect, increasing the sample size can cause large decreases in coverage.24

Determining the eﬀect of a negative wealth shockon cognitive score for Health and Retirement Studysub jects

To investigate the association between negative wealth shock and cognitive ability in latemiddle aged US adults, we used data from the Health and Retirement Study (HRS). HRSis a longitudinal study of US adults, enrolled at age 50 and older. These individuals havebeen surveyed biennially since 1992 with detailed modules on ﬁnancial status and health(Sonnega et al., 2014).We use HRS data collected from 1996 to 2002 for our analysis. Subjects were obtainedfrom the original HRS cohort, born in the years 1931-1941. Although data collection beganin 1992, consistent collection of a subject’s cognitive ability only began in 1996. Hence, weexcluded the data collected before 1996 and treated the variables collected in 1996 as thebaseline for our analysis. We excluded subjects who did not have longitudinal measurementsfor net worth because we were unable to distinguish whether they have already experienceda negative wealth shock. Subjects with zero or negative net worth at baseline were excludedsince we did not know if these subjects have lifelong asset poverty or experienced a negativewealth shock prior to study entry. We also removed subjects who experienced a negativewealth shock and death between 1992 to 1996. These subjects were removed because theywere no longer at risk for a negative wealth shock or death. There were 9,750 participantsin the original HRS cohort, and of these, 7,106 participants (72.9%) were eligible for this25nalysis. These participants consists of a representative sample of the 1996 US populationaged 55 to 65 who had not experienced a negative wealth shock in the previous ﬁve years.

To determine whether a subject experienced a negative wealth shock from the previous follow-up period to the current follow-up period, we ﬁrst obtained data from the module assessingnet worth administered at every wave of HRS. Measured assets include housing value, netvalue of businesses, individual retirement accounts, checking/savings accounts, certiﬁcatesof deposits and savings bonds, investment holdings, net value of vehicles, and the value ofany other substantial assets. From this asset total, debts were subtracted, including homemortgages, other home equity loans, and unsecured debt values, like credit card balances,student loans, and medical debts. Missing values for wealth were imputed at the level ofeach asset or debt, using an unfolding bracket imputation method (Juster and Smith, 1997).Wealth data were not imputed for those who do not participate in a given wave. Negativewealth shock was measured and then dichotomized (yes or no) for each time point. Lossof 75% or more of total wealth between two consecutive waves was used as the cut-pointfor negative wealth shock (Pool et al., 2017). Subjects were considered at risk for negativewealth shock until they have experienced a negative wealth shock or reached age 65.

The cognitive ability of a subject is assessed in HRS using the Telephone Interview forCognitive Status (TICS). Unfortunately, the full HRS cognitive battery is not available forparticipants under 65. Hence, we used an abbreviated measure that included questions26bout episodic memory (Immediate Word recall [10 points] and Delayed Word recall [10points]) and mental status (Serial 7’s [5 points], backwards counting from 20 [2 points])(Crimmins et al., 2011). All responses were combined to create a composite score rangingfrom 0 to 27, with a higher score indicating higher cognitive ability. Some of these measuresmay be imputed implying that the cognitive summary score may include one or more imputedscores (Fisher et al., 2018). We treated this measure as continuous and normally distributed.

Tables 5 to 6 show the descriptive statistics of the subjects at baseline by whether or not theyexperienced a negative wealth shock over the next six years regardless of survival status. Atbaseline, aside from whether the subject eventually survived until 2002 and health conditionslike whether the subject ever had heart problems, high blood pressure, and stroke, all theother variables in Tables 5 to 6 were signiﬁcantly associated with experiencing a negativewealth shock. A typical subject who would eventually experience a wealth shock would have alower cognitive score at baseline; slightly higher BMI; lower opinion about his or her health;lower word recall score; likely still smoking; not insured; have depression; slightly lowerincome; either working, unemployed, or disabled; divorced or never married; lower wealthrank; have diabetes and/or psychological problems; younger; lesser years of education; andlikely non-White.Table 7 shows the change in unadjusted mean cognitive score between consecutivewaves for subjects who did not receive a wealth shock versus those who ever received anegative wealth shock. Follow-up surveys occurred at years 2, 4, and 6. We can see thatfor a subject who ever got shocked, the largest observed decline in cognitive score occurs27rom Baseline to Wave 1. Subsequently, the decline in cognitive score is no longer as largebetween waves. Similarly, the bulk of our subjects were shocked at Wave 1 (second year offollow up). In later waves, the proportion of new subjects who received a negative wealthshock decreases.

We were interested in how a negative wealth shock would aﬀect the cognitive ability of latemiddle aged adults in the HRS during the six years of follow-up as well as how the durationof a negative wealth shock aﬀects cognitive ability accounting for missingness in the cognitiveoutcome as well as censoring by death. We employed four diﬀerent methods to estimate thiseﬀect and make inference. The four methods were the na¨ıve method, where all subjects whodied under their observed negative wealth shock status were removed from analysis; baselineadjusted method, where similar to the na¨ıve method, all subjects who died were removedfrom analysis but the mean cognitive score was adjusted using a model that included allbaseline covariates; MSM, where negative wealth shock allocation, missingness, and censor-ing by death were accounted for by inverse probability weighting; and our proposed methodincluding the PENCOMP modiﬁcation described in Subsection 3.2. We assumed that de-pression was the time-varying covariate that depends on the negative wealth shock status( X Z ,...,Z t in Section 2) and the rest of the time-varying covariates are: self-reported healthstatus, whether subject was insured, labor force status of subject, income, level of alcoholconsumption, current smoking status, and number of health conditions ( W t in Section 3).We also assumed that the cognitive score is missing at random given the baseline variablespresented in Tables 5 to 6, past negative wealth shock status, time-varying covariates, and28ognitive score. For MSM, we accounted for this missingness by modeling the propensity ofresponse while for our proposed method, we imputed the missing cognitive score by usingthe modiﬁed version of PENCOMP discussed in Subsection 3.2. All our models (baseline ad-justed, MSM, and our proposed method) were speciﬁed using BART. For the na¨ıve, baselineadjusted, and MSM method, we employed 1,000 bootstrap samples to calculate the meanand the 95% Conﬁdence Interval (CI). The 95% CI was determined by taking the 2 . . Table 8 shows the adjusted eﬀect estimate of a negative wealth shock on cognitive scoredepending on the duration of the shock for late middle aged adults in the original HRScohort from 1996 to 2002. In general, the na¨ıve and baseline adjusted method suggests thatexperiencing a negative wealth shock has a much larger negative eﬀect on the cognitive scoreof subjects in our sample compared to the adjusted estimates reported by MSM and ourproposed method. The na¨ıve and baseline adjusted method produced very similar resultssuggesting low association between cognitive score and the baseline covariates. The eﬀectfor subjects who experienced a negative wealth shock within the ﬁrst 2 years of follow upversus no shock (6 years vs. no shock), subjects who experienced a negative wealth shockwithin the ﬁrst 2 years of follow up versus subjects who experienced a negative wealth shockbetween the second and fourth year of follow up (6 years vs. 2 years), and subjects whoexperienced a negative wealth shock within the ﬁrst 2 years of follow up versus subjects whoexperienced a negative wealth shock between the fourth and sixth year of follow up (6 years29s. no shock), were signiﬁcantly larger than 0 under the na¨ıve and baseline adjusted method.For MSM and our proposed method all eﬀects were reported to be not signiﬁcant.

In this paper, we were interested in how a negative wealth shock aﬀects the cognitive abilityof late middle aged Americans participating in the HRS from 1996 to 2002. The maindiﬃculty we faced was the presence of death in some subjects causing their cognitive scoreto be censored. Under situations where we believe death does not depend on the cognitiveability or whether a subject received a negative wealth shock, removing subjects who havedied from our analysis would yield an unbiased estimate of the eﬀect of negative wealth shockon cognitive ability as our simulation results suggest. Unfortunately, it is very possible thatsubjects with lower cognitive ability and/or have experienced a negative wealth shock wouldhave a higher risk of death. In this situation, accounting for the censoring by death would beneeded. This is because without randomization, there is a high likelihood that the proportionof deaths between subjects who did not receive a negative wealth shock versus those whoreceived a wealth shock, would be imbalanced. In addition, subjects who die are more likelyto have a lower cognition score. As a result, if we remove the subjects who died from ouranalysis, the eﬀect of the negative wealth shock on cognitive ability that we measure wouldbe confounded by death. Although MSM is commonly employed to weight the subjects whosurvived, this approach is arguably not sensible and would likely produce biased estimateswhen the eﬀect depends on the principal strata as well as when adjustments on the weightshave to be employed in order to stabilize the MSM estimate. To overcome these issues, wepropose a new method to estimate the eﬀect by imputing the counterfactual survival status30f each subject in order to compare outcomes among individuals who would survive onlyunder both sets of treatments being considered. Our method remained unbiased for all thesimulation scenarios we tried and produced reasonable coverage. When applied to the HRSdataset, our method suggested that the eﬀect of a negative wealth shock on the cognitiveability is close to null whereas the na¨ıve method and MSM suggested an estimate with aslightly larger eﬀect.One shortcoming of our approach is our failure to incorporate the HRS sample de-sign, in particular the sampling weights, in our inference. Given that a key use of weights inregression-type analysis is to reduce the eﬀect of model misspeciﬁcation (Korn and Graubard,1995), we hope that our use of BART will minimize the degree of model misspeciﬁcation.We leave the incorporation of such features in a general approach to future work. Anotheraspect of our method which could be improved is to allow our method to be applicable tostudies where the follow-up time is not ﬁxed. In such a situation, Cox based survival modelswould have to be employed and time would have to be included as a covariate in the survivaland outcome models. The diﬃculty in this extension would be how to develop a systematicway, applicable to all subjects, to determine the relation in time between the allocation ofthe treatment, measuring the outcome, and death.

References

Butrica, B., Smith, K., and Toder, E. (2010). What the 2008 stock market crash means forretirement security.

Journal of Aging & Social Policy

Epidemiology

The Annals of Applied Statistics The Journals of Gerontology, Series B: PsychologicalSciences and Social Sciences

SocialScience and Medicine

Biometrics

Biometrics http://hrsonline.isr.umich.edu/modules/meta/xyear/cogimp/desc/COGIMPdd.pdf

The University of Michigan Institute of Survey Research, Survey Research Center,AnnArbor, MI . 32rangakis, C. and Rubin, D. (2002). Principal Stratiﬁcation in Causal Inference.

Biometrics

A Theory of the Consumption Function . Princeton University Press,Princeton, NJ.Hahn, P., Murray, J., and Carvalho, C. (2018). Bayesian Regression Tree Models forCausal Inference: Regularization, Confounding, and Heterogeneous Eﬀects. arXiv page1706.09523v2.Heitjan, D. and Little, R. (1991). Multiple Imputation for the Fatal Accident ReportingSystem.

Applied Statistician

Causal Inference for Statistical, Social, and BiomedicalSciences: An Introduction . Cambridge University Press, New York, NY.Juster, F. and Smith, J. (1997). Improving the Quality of Economic Data: Lessons from theHRS and AHEAD.

Journal of the American Statistical Association

The Canadian Journal of Statistics

The American Statistician

International Journal of Epidemiology

InternationalPsychogeriatrics

Journal of theRoyal Statistical Society

Annals of Internal Medicine

Journal of the American Medical Association

Journal of Epidemiology and Community Health

Epidemiology

Journal of the American Statistical Association arXiv page 1708.08734.34osenbaum, P. and Rubin, D. (1983). The Central Role of the Propensity Score in Obser-vational Studies for Causal Eﬀects.

Biometrika

Journal of Educational Psychology

The Journals of Gerontology, Series B: Psychological Sciences andSocial Sciences

International Journal of Epidemiology

Social Science & Medicine arXiv page1801.03147.Twala, B., Jones, M., and Hand, D. (2008). Good Methods for Coping with Missing Datain Decision Trees.

Pattern Recognition Letters

Epidemiology

Biometrics

Journal of the American Statistical Association page In press.36able 1: Sample example of a censoring by death dataset until t = 3 where Z t = 1 indicates a subject having experienced a negativewealth shock and Z t = 0 indicates a subject have not experienced any negative wealth shock till time tV Z Y Y S S Z Y Y Y S S S Z Y Y Y Y Subject 1 x 1 x ? 1 1 1 ? ? x 1 1 1 1 ? ? ? xSubject 2 x 0 ? x 1 1 1 ? x ? 1 1 1 1 ? ? x ?Subject 3 x 1 x ? 1 1 1 ? ? x 1 1 0 NA ? ? ? NASubject 4 x 0 ? x 1 1 1 ? x ? 1 1 0 1 ? ? x NASubject 5 x 0 ? x 1 1 0 x ? ? 1 0 1 0 x ? NA ?Subject 6 x 0 ? x 1 1 0 x ? ? 0 1 1 NA NA NA ? ?Subject 7 x 0 ? x 1 1 0 x ? ? 0 1 1 NA NA NA ? ?Subject 8 x 0 ? x 1 1 0 x ? ? 1 0 0 0 x ? NA NASubject 9 x 1 x ? 0 1 NA ? ? NA 1 1 0 NA ? ? ? NASubject 10 x 1 x ? 0 1 NA ? ? NA 0 1 0 NA NA NA ? NASubject 11 x 0 ? x 0 1 1 ? x NA 0 1 0 1 NA NA x NASubject 12 x 0 ? x 0 1 0 x ? NA 0 1 0 NA NA NA ? NASubject 13 x 1 x ? 1 0 1 NA NA x 0 0 1 1 NA NA NA xSubject 14 x 0 ? x 1 0 NA NA NA ? 0 0 1 NA NA NA NA ? able 2: Table of parameters for simulation Scenario 1 Scenario 2 Scenario 3

V N (0 , ) N (17 , ) N (17 , ) γ γ -0.02 -0.2 -0.2 β β Z -1.5 -1.5 -1.5 β V β V Z -0.005 -0.11 -0.05 α α Y α Y α Z -0.01 -0.2 -0.2 α V α V Z -0.002 -0.02 -0.02 γ -0.002 -0.02 -0.02 γ Y , -0.02 -0.2 -0.2 β Z -1.5 -1.5 -1.5 β Z -1 -1 -1 β Y Z β Y Z β Y Z β V Z -0.00011 -0.011 -0.011 β V Z -0.00005 -0.005 -0.005 α Z -0.01 -0.2 -0.2 α Z -0.015 -0.1 -0.1 α Y Z α Y Z α Y Z α V Z -0.0001 -0.02 -0.02 α V Z -0.0005 -0.05 -0.05 γ -0.0002 -0.002 -0.002 γ Y , -0.002 -0.02 -0.02 γ Y -0.02 -0.2 -0.2 β Z -1.5 -1.5 -1.5 β Z -1 -1 -1 β Z -0.5 -0.5 -0.5 β Y Z β Y Z β Y Z β Y Z β Y Z β Y Z β V Z -0.00011 -0.011 -0.011 β V Z -0.00005 -0.005 -0.005 β V Z -0.00003 -0.003 -0.003

Scenario 1 Na¨ıve MSM ProposedParameter True value Bias RMSE 95% Coverage AIL Bias RMSE 95% Coverage AIL Bias RMSE 95% Coverage AIL∆ , -1.497 -0.001 0.032 95.4 0.123 -0.0002 0.032 95.1 0.123 -0.0001 0.032 97.0 0.143∆ , -1.499 -0.003 0.050 95.3 0.202 -0.003 0.050 95.3 0.202 -0.003 0.050 95.7 0.214∆ , -1.005 -0.003 0.049 95.0 0.189 -0.001 0.049 94.7 0.189 -0.001 0.049 99.2 0.262∆ , , -1.502 0.005 0.081 93.9 0.314 0.005 0.081 98.9 0.411 0.005 0.082 94.6 0.333∆ , -1.008 0.004 0.074 94.8 0.284 0.004 0.074 99.0 0.370 0.004 0.075 97.8 0.350∆ , -0.504 0.006 0.072 95.2 0.284 0.007 0.072 99.4 0.371 0.007 0.074 100.0 0.529∆ , , , , -3.367 -0.047 0.061 78.5 0.154 0.002 0.041 93.8 0.160 0.002 0.041 96.1 0.177∆ , -1.727 -0.037 0.054 83.2 0.149 -0.032 0.051 86.9 0.150 -0.002 0.037 96.4 0.161∆ , -1.199 -0.136 0.146 24.5 0.202 -0.020 0.057 92.5 0.204 -0.004 0.053 96.5 0.229∆ , , -1.727 -0.029 0.062 91.9 0.220 -0.023 0.060 94.8 0.240 0.001 0.053 96.1 0.227∆ , -1.183 -0.065 0.082 75.0 0.199 -0.047 0.069 87.5 0.217 0.0004 0.048 97.8 0.220∆ , -1.169 -0.167 0.181 33.8 0.273 -0.042 0.084 93.2 0.305 -0.004 0.071 98.7 0.350∆ , , , , -2.347 -0.123 0.130 14.5 0.160 0.002 0.042 94.0 0.160 0.002 0.042 95.9 0.177∆ , -2.559 -0.114 0.122 23.9 0.165 -0.060 0.074 70.2 0.164 -0.001 0.038 96.4 0.163∆ , -3.062 -0.231 0.239 2.8 0.232 -0.033 0.068 89.9 0.226 -0.004 0.058 97.0 0.260∆ , -0.502 -0.118 0.132 48.9 0.233 0.026 0.065 92.2 0.227 -0.003 0.059 96.7 0.260∆ , -2.820 -0.125 0.139 47.9 0.242 -0.062 0.087 88.7 0.273 -0.0004 0.054 95.9 0.224∆ , -3.605 -0.143 0.152 19.6 0.198 -0.087 0.101 69.2 0.225 -0.006 0.045 96.4 0.202∆ , -4.032 -0.290 0.301 5.2 0.319 -0.082 0.117 89.4 0.376 -0.009 0.080 98.6 0.400∆ , -0.785 -0.019 0.060 93.3 0.225 -0.026 0.063 95.0 0.256 -0.006 0.052 96.7 0.226∆ , -1.217 -0.160 0.181 54.4 0.336 -0.015 0.087 97.2 0.396 -0.009 0.083 99.3 0.442∆ , -0.432 -0.141 0.160 54.9 0.306 0.011 0.080 97.4 0.363 -0.006 0.075 98.7 0.373 Scenario 1 Na¨ıve MSM ProposedParameter True value Bias RMSE 95% Coverage AIL Bias RMSE 95% Coverage AIL Bias RMSE 95% Coverage AIL∆ , -1.497 -0.001 0.023 94.2 0.087 0.0003 0.023 94.0 0.087 0.0003 0.023 96.2 0.100∆ , -1.499 -0.002 0.036 95.3 0.143 -0.001 0.036 94.8 0.143 -0.001 0.037 95.5 0.151∆ , -1.005 -0.002 0.034 94.8 0.134 -0.0007 0.034 95.1 0.134 -0.001 0.035 98.7 0.183∆ , , -1.502 0.005 0.057 95.0 0.222 0.005 0.057 99.0 0.289 0.005 0.058 95.4 0.235∆ , -1.008 0.004 0.051 94.5 0.201 0.004 0.051 98.4 0.260 0.004 0.052 97.1 0.246∆ , -0.504 0.005 0.051 95.1 0.201 0.007 0.052 98.3 0.261 0.006 0.053 99.7 0.369∆ , , , , -3.367 -0.047 0.055 59.6 0.109 0.002 0.029 94.0 0.113 0.003 0.029 95.9 0.125∆ , -1.727 -0.036 0.045 73.4 0.105 -0.031 0.041 78.8 0.106 -0.001 0.026 96.1 0.113∆ , -1.199 -0.134 0.139 4.0 0.142 -0.018 0.041 92.4 0.144 -0.001 0.036 96.9 0.161∆ , , -1.727 -0.029 0.049 87.9 0.156 -0.024 0.047 93.2 0.170 0.0001 0.038 96.2 0.160∆ , -1.183 -0.066 0.075 54.3 0.141 -0.048 0.060 77.7 0.153 -0.001 0.036 97.7 0.156∆ , -1.169 -0.166 0.173 7.8 0.193 -0.040 0.065 90.4 0.215 -0.003 0.049 98.9 0.246∆ , , , , -2.347 -0.123 0.126 1.5 0.113 0.002 0.029 94.5 0.113 0.003 0.029 95.7 0.124∆ , -2.559 -0.114 0.118 3.4 0.117 -0.060 0.067 49.3 0.113 -0.001 0.028 96.1 0.115∆ , -3.062 -0.230 0.234 0.1 0.164 -0.032 0.052 86.5 0.159 -0.004 0.040 97.3 0.183∆ , -0.502 -0.118 0.125 19.4 0.164 0.026 0.048 91.0 0.160 -0.003 0.040 98.2 0.184∆ , -2.820 -0.125 0.133 16.6 0.171 -0.063 0.076 78.3 0.192 -0.002 0.039 95.6 0.157∆ , -3.605 -0.143 0.147 2.2 0.140 -0.087 0.093 40.8 0.159 -0.007 0.032 96.7 0.142∆ , -4.032 -0.290 0.296 0.1 0.225 -0.081 0.099 81.5 0.265 -0.010 0.057 98.7 0.282∆ , -0.785 -0.018 0.044 93.4 0.159 -0.024 0.047 94.7 0.181 -0.005 0.037 97.0 0.161∆ , -1.217 -0.160 0.171 22.4 0.238 -0.013 0.062 97.1 0.278 -0.008 0.059 98.7 0.311∆ , -0.432 -0.142 0.152 26.4 0.216 0.011 0.056 97.9 0.255 -0.006 0.052 98.7 0.264 No wealth shock Ever wealth shockVariables Mean/Frequency (S.E./%) Mean/Frequency (S.E./%) p -valueEventually survived?: 0.57Yes 6,207 (94.7) 516 (94.0)No 350 (5.3) 33 (6.0)Cognitive score 17.07 (4.07) 16.26 (4.35) < . < . < . < . < . < . < . < . < . < .

010 1,728 (26.4) 326 (59.4)1 2,360 (36.0) 124 (22.6)2 2,469 (37.7) 99 (18.0)Gender: 0.08Male 3,113 (47.5) 239 (43.5)Female 3,444 (52.5) 310 (56.5)