Accounting for selection bias due to death in estimating the effect of wealth shock on cognition for the Health and Retirement Study
Yaoyuan Vincent Tan, Carol A.C. Flannagan, Lindsay R. Pool, Michael R. Elliott
aa r X i v : . [ s t a t . A P ] D ec Accounting for selection bias due to death in estimatingthe effect of wealth shock on cognition for the Healthand Retirement Study
Yaoyuan V. Tan ∗ Department of Biostatistics and Epidemiology, Rutgers UniversityandCarol A.C. FlannaganTransportation Research Institute, University of MichiganLindsay R. PoolDepartment of Preventive Medicine, Northwestern UniversityandMichael R. ElliottDepartment of Biostatistics, University of Michigan
Abstract
The Health and Retirement Study is a longitudinal study of US adults enrolledat age 50 and older. We were interested in investigating the effect of a sudden largedecline in wealth on the cognitive score of subjects. Our analysis was complicated bythe lack of randomization, confounding by indication, and a substantial fraction of thesample and population will die during follow-up leading to some of our outcomes beingcensored. Common methods to handle these problems for example marginal structuralmodels, may not be appropriate because it upweights subjects who are more likely todie to obtain a population that over time resembles that would have been obtainedin the absence of death. We propose a refined approach by comparing the treatmenteffect among subjects who would survive under both sets of treatment regimes being ∗ (e-mail: [email protected]) onsidered. We do so by viewing this as a large missing data problem and impute thesurvival status and outcomes of the counterfactual. To improve the robustness of ourimputation, we used a modified version of the penalized spline of propensity methodsin treatment comparisons approach. We found that our proposed method worked wellin various simulation scenarios and our data analysis. Keywords:
Bayesian additive regression trees; Causal inference; Confounding by in-dication; Longitudinal Study; Missing data; Penalized spline of propensity methods in treat-ment comparisons.
Late middle age adults commonly experience chronic health conditions like high blood pres-sure or diabetes as well as declining cognitive abilities. Factors known to be associatedwith accelerated decrease in cognitive abilities include smoking, high alcohol consumption,physical inactivity, high dietary intake of sodium and saturated fats, low dietary intakeof fruits and vegetables (Lee et al., 2010; Stuck et al., 1999); hypertension, elevated serumcholesterol, diabetes, obesity, cerebrovascular and cardiovascular disease (Plassman et al.,2010); depression, lower socioeconomic status, and exposure to acute stressful life eventsand chronic perceived stress (Krieger, 2001). In particular, the acute stress of a suddendecrease in wealth – “a negative wealth shock” – may have a negative impact on the cog-nitive ability of late middle aged adults. Because income typically exceeds consumptionat this stage in life, sudden decreases in wealth during this period not only decrease theamount of wealth saved for retirement, but there are fewer remaining years left to replenish2he lost wealth (Butrica et al., 2010). The stress of losing substantial wealth during thesavings period of the life cycle coupled with the pressure to replenish the lost wealth canlead to stress-related health conditions which in turn reduces the cognitive ability of an in-dividual (Shrira et al., 2011). In addition, individuals who have received a negative wealthshock may have to reduce consumption of health-enhancing goods and services which in turnleads to poor management of existing chronic conditions, further reducing cognitive abilities(Friedman, 1956).Three issues arise when trying to estimate the causal effect of a negative wealth shockon cognitive ability. The first of these is the lack of randomization: negative wealth shocksare not randomly distributed in the population, but rather are confounded by factors suchas gender and socio-economic status. The second issue is confounding by indication: the riskof the wealth shock at any point in time may depend on the prior cognitive ability up to thepoint. Finally, a sufficiently large fraction of the sample and the population will die duringour follow-up, leading to “censoring by death”. Those observed to have survived a negativewealth shock include those who would survive under either condition together with thosethat would survive only if they experienced a negative wealth shock (if any), while thoseobserved to have survived in the absence of a negative wealth shock include those that wouldsurvive under either condition together with those that would survive only in the absenceof a negative wealth shock. These “missing values” associated with cognition among thedeceased are different from the measure of cognition being “missing” due to dropout, wherethe cognitive ability measure exists but is unobserved. As with wealth shock, death is nota random occurrence, and is positively associated with demographic measures that increasethe risk of a negative wealth shock, increased cognitive ability decline, and the experience3f a negative wealth shock. Hence, the measure for cognitive ability may be confounded bydeath if not considered appropriately.Methods have been developed to deal with these barriers to causal inference. To dealwith the lack of randomization, we might hope that, conditional on available covariates,negative wealth shocks would truly be random. In this case, conditioning on the probabilityof receiving a negative wealth shock as a function of these covariates – the propensity scores(Rosenbaum and Rubin, 1983) – can be used to remove the effect of confounding, either byregression, matching, or weighting (Imbens and Rubin, 2015). For the second issue – con-founding by indication – marginal structural models (MSM, Robins et al., 2000) and morerecently, penalized spline of propensity methods in treatment comparisons (PENCOMP,Zhou et al., 2018), have been used to account for confounding by the time-dependence as-sociation of the cognitive measures, either by weighting using the inverse probability oftreatment actually received based on the previous values of the time-varying covariates andoutcomes (MSM), or by imputation of the missing counterfactual values (PENCOMP). Forcensoring by death, MSMs have typically been extended by multiplying the treatment as-signment weights with the inverse of the predicted probability of death (Weuve et al., 2012).The issue with this approach – perhaps under appreciated – is that the resulting pseudo-population is not only balanced with respect to exposure “assignment”, but also “immortal”,in the sense that those more likely to die are upweighted so that the population over timeresembles that would have been obtained in the absence of death up till time t (Chaix et al.,2012). This is arguably not a sensible population for inference, at least from a policy andpublic health perspective.A more refined approach would be to compare the difference in the effect of negative4ealth shock on cognitive ability among subjects who would have survived whether theyexperienced a negative wealth shock or not. This approach is consistent with the potentialoutcomes approach of Neyman (1934) and Rubin (1974), which defines causal effects as thewithin-subject difference of an outcome at a particular time under different exposure ortreatment regimen, averaged over the population. This idea is not new (Elliott et al., 2006)and can be viewed as a specific example of the principal stratification (PS) method discussedin Frangakis and Rubin (2002). Our innovation here is to embed this in a longitudinal settingwhere confounding by indication is present. We view this as a large missing data problemwhere survival status and, among survivors, unobserved outcomes under a given treatmentpattern, are imputed. We extend the method proposed in Example 3 of Elliott and Little(2015), which provides a Bayesian MSM approach to compare two treatments at two timepoints. This approach was further extended by PENCOMP in Zhou et al. (2018) which, likeaugmented inverse probability weighting (AIPWT, Robins et al., 1994), has a doubly-robustproperty in that if either the mean or propensity model is correctly specified, consistentestimates of the causal effect will be obtained. We modified PENCOMP slightly usingBayesian additive regression trees (BART), a flexible model to ease the burden of modelspecification by the researcher, and apply this to our proposed method.We organize our paper as follows. We set up the framework for our problem, and pro-vide a brief review of of MSM, PENCOMP, and Bayesian additive regression trees (BART)in Section 2. We develop our proposed method in Section 3. We then explore some of theempirical properties of our proposed method compared to a na¨ıve method and MSM usinga simulation study in Section 4. Section 5 describes the HRS data and the results of ournegative wealth shock analysis. Section 6 concludes with a discussion of the implication of5ur results as well as future work. Let V = { V , V , . . . , V p } be p baseline covariates, Z t be the treatment allocation at time t = 1 , . . . , T where Z t = 1 indicates a subject receiving a negative wealth shock at t and Z t = 0 indicates no negative wealth shock, and W t = { W t , W t , . . . , W qt } be q covariatesthat may vary with time, but are unaffected by a given treatment regimen. For exam-ple, fixed covariates by definition would belong to this class. Let Y Z ,...,Z t be the potentialoutcome under treatments Z , . . . , Z t and X Z ,...,Z t = { X Z ,...,Z t , , X Z ,...,Z t , , . . . , X Z ,...,Z t ,r } be the time-varying covariates affected by treatments Z , . . . , Z t . Similarly, we define thepotential survival indicator S Z ,...,Z t − , for survival at time t . The survival outcome at t measures whether a subject would survive after being exposed to treatment Z , . . . , Z t − ;hence, the lagged notation for the potential survival outcome, S Z ,...,Z t − . v , z t , w t , y z ,...,z t , x z ,...,z t , and s z ,...,z t indicate the observed baseline, treatment allocation, time varying covari-ates unaffected by a given treatment regimen, outcome, time-varying covariates affected bya given treatment regime, and survival status variables respectively. As in Pool et al. (2018),we assume that a negative wealth shock is an “absorbing state” so that once a subject re-ceives a negative wealth shock at time t , i.e. Z t = 1, the subject is “forever” shocked, i.e. Z t +1 = . . . = Z T = 1. Note that this need not be the case for a more general set up wherewe could have Z t = 0 when Z j = 1 for any j = 1 , . . . , t −
1. In our context, the potentialoutcomes for time t = 2 are then Y Z =0 ,Z =0 = Y , Y Z =0 ,Z =1 = Y , and Y Z =1 ,Z =1 = Y ;6imilarly, X Z =0 ,Z =0 = X , X Z =0 ,Z =1 = X , and X Z =1 ,Z =1 = X for time-varying co-variates under the various treatment regimes; and S Z =0 = S , S Z =1 = S for survival states.Subjects who die at time t have structurally missing data for outcomes and covariates i.e., S = 0 implies that Y = Y = N A and X = X = N A , while S = 0 implies that Y = N A and X = N A , where ‘NA’ indicates a structurally missing observation.
To estimate the causal effect for confounding by indication and censoring by death problems,MSM makes the following assumptions. First, MSM assumes that P ( S z ,...,z t − | z , . . . , z t − , y z , . . . , y z ,...,z t − , x z , . . . , x z ,...,z t − , w , . . . , w t − , v ) > . (1)and P ( Z t | z , . . . , z t − , y z , . . . , y z ,...,z t − , x z , . . . , x z ,...,z t − , w , . . . , w t − , v ) > z t i.e. the probability of survival under treatment profile z , . . . , z t − and the prob-ability of treatment allocation for time t is bounded away from 0. This is an extension ofthe standard positivity assumption to allow that at least some subjects will survive under agiven treatment regimen. Second, MSM assumes that there is no interference between sub-jects i.e. the potential outcome of subject i , Y i,Z ,...,Z t = Y i,z ,...,z t , is independent of whatevertreatment regimen subject j is allocated to i = j . Third, MSM assumes no unmeasuredconfounding and sequential randomization condition Y Z ,...,Z t ⊥ Z t | z , . . . , z t − , y z ,...,z t − , . . . , y z , x z ,...,z t − , . . . , x z , w , . . . , w t − , v. Finally, MSM assumes that the model specifications for Equations 1, 2, and Y z ,...,z t | z , . . . , z t , y z ,...,z t − , . . . , y z , x z ,...,z t − , . . . , x z , w , . . . , w t − , v E [ Y z ,...,z t − Y z ′ ,...,z ′ t ] (note that this estimand is notconditioned on the survival status) is obtained by maximizing the weighted likelihood of n Y i =1 f ( Y i ; z ,...,z t | θ it ) w it , (3)where i indexes the subjects and θ it are the parameters involved in the model for Y i ; z ,...,z t and w it = [ t Y j =1 P ( Z ij = z ij | z i , . . . , z i,j − , y i , . . . , y i,j − , x i , . . . , x i,j − , w i , . . . , w i,j − , v i ; τ j )] − . (4) By weighting using the inverse probability of receiving the observed treatment regime givenall covariates and previous treatments, the association between treatment and all observedconfounders, including confounding by indication, are broken. Under these four assump-tions, inference about the treatment effects under a pseudo-population in which treatmentis randomized can then be obtained.Similarly, this weighting method can be used to remove bias due to dropout. Let R i = 1indicate that the subject’s cognitive score is observed and R i = 0 indicate that the subject’scognitive score is missing. The weight used to account for missing cognitive score is w rit = [ t Y j =1 P ( R ij = r ij | r i , . . . , r i,j − , z i , . . . , z i,j − , y i , . . . , y i,j − , x i , . . . , x i,j − , w i , . . . , w i,j − , v i ; γ j )] − . (5) Finally, death is typically treated as equivalent to dropout in MSM (Do et al., 2013;Pool et al., 2018). Let D it = 1 indicate that subject i is dead at time t and D it = 0 indicatethat the subject survived at time t (thus D it = 1 − S it ). The weight for death censoring isthen w dit = [ t Y j =1 P ( D ij = d ij | z i , . . . , z i,j − , y i , . . . , y i,j − , x i , . . . , x i,j − , w i , . . . , w i,j − , v i ; λ j )] − . (6) w fit = w it w dit w rit . To stabilize the weights, the numerators of Equations4, 5, and 6 are replaced by the marginal probabilities of treatment, dropout, and death atbaseline given by t Y j =1 P ( Z ij = z ij | z i , . . . , z i,j − , v i ; τ ′ j ) , t Y j =1 P ( R ij = r ij | r i , . . . , r i,j − , v i ; γ ′ j ) , and t Y j =1 P ( D ij = d ij | v i ; λ ′ j )respectively. We use the stabilized weights in our simulations and analysis. PENCOMP uses the same four assumptions made by MSM excluding Equation 1 for con-founding by indication problems. Full details of PENCOMP can be found in Zhou et al.(2018). We briefly describe the algorithm for PENCOMP using multiple imputation (MI)with longitudinal treatment assignments here. Without loss of generality, we assume notime-varying covariates in the data.1. For b = 1 , . . . , B , generate a bootstrap sample S ( b ) from the original data S by samplingunits with replacement, stratified on treatment group. For each sample b , carry outsteps 2-7.2. Estimate a logistic regression model for the distribution of Z given baseline covariates9 with regression parameters γ z . Estimate the propensity to be assigned treatment Z = z as ˆ P z ( V ) = P r ( Z = z | V ; ˆ γ bz ), where ˆ γ bz is the maximum likelihood (ML)estimate of γ z . Define ˆ P ∗ z = log[ ˆ P z ( V )1 − ˆ P z ( V ) ].3. Using the cases assigned to treatment group Z = z , estimate a normal linear regres-sion of Y z on V , with mean E ( Y z | V, Z = z , θ z , β z ) = s ( ˆ P ∗ z | θ z ) + g z ( ˆ P ∗ z , V ; β z ) , (7)where s ( ˆ P ∗ z | θ z ) denotes a penalized spline with fixed knots and parameters θ z and g z ( . ) represents a parametric function of other predictors of the outcome, indexed byparameters β z . One of the covariates might be omitted to avoid collinearity in thecovariates in Equation 7.4. For z = 0 ,
1, impute the values of Y z for subjects in treatment group 1 − z inthe original data with draws from the predictive distribution of Y z given V from theregression in Step 3, with the ML estimates ˆ θ ( b ) z , ˆ β ( b ) z substituted for the parameters θ ( b ) z , β ( b ) z .5. Estimate a logistic regression model for the distribution of Z given V, Z , ( Y , Y ),with regression parameters γ z and missing values of ( Y , Y ) imputed from Step 4.Estimate the propensity to be assigned treatment Z = z given Z , Y Z , and V asˆ P z ( Z , Y Z , V ) = P r ( Z = z | Z = z , Y z , V ; ˆ γ ( b ) z ), where ˆ γ ( b ) z is the ML estimate of γ z . The probability of treatment regimen ( Z = z , Z = z ) is denoted as ˆ P z z =ˆ P z ( V ) ˆ P z ( Z , Y Z , V ), and define ˆ P ∗ z ,z = log[ ˆ P z z − ˆ P z z ].6. Using the cases assigned to treatment group ( z , z ), estimate a normal linear regressionof Y z ,z on Z , Z , Y Z , and V with mean E ( Y z ,z | V, Y z , Z = z , Z = z , θ z ,z , β z ,z ) = s ( ˆ P ∗ z ,z | θ z ,z ) + g z ,z ( ˆ P ∗ z ,z , Z , Z , Y Z , V ; β z ,z ) . (8)
10. For each combination of ( z , z ) impute the values of Y z ,z for subjects not assignedthis treatment combination in the original data with draws from the predictive distri-bution of Y z ,z in Step 6, with ML estimates ˆ θ ( b ) z ,z , ˆ β ( b ) z ,z substituted for the parameters θ ( b ) z ,z , β ( b ) z ,z . Let ˆ∆ ( b )01 , = E [ Y − Y ], ˆ∆ ( b )11 , = E [ Y − Y ], and ˆ∆ ( b )11 , = E [ Y − Y ]denote the average treatment effects, ˆ∆ ( b ) jk,lm , with associated pooled variance estimates W ( b ) jk,lm , based on the observed and imputed values of Y for each treatment regimen.8. The MI estimate of ∆ jk,lm is then ¯∆ jk,lm,B = P Bb =1 ˆ∆ ( b ) jk,lm , and the MI estimateof the variance of ¯∆ jk,lm is T B = ¯ W jk,lm,B + (1 + 1 /B ) D jk,lm,B , where ¯ W jk,lm,B = P Bb =1 W ( b ) jk,lm /B , D jk,lm,B = P Bb =1 ( ˆ∆ ( b ) jk,lm − ¯∆ jk,lm,B ) B − . The estimate ∆ jk,lm follows a t distribution with degree of freedom ν , ∆ jk,lm − ¯∆ jk,lm,B √ T B ∼ t ν , where ν = ( B − ¯ W jk,lm,B D jk,lm,B ( B +1) ) . BART (Chipman et al., 2010) is a flexible estimation technique for any arbitrary function.Suppose we have a continuous outcome Y and corresponding p predictors X = ( X , . . . , X p ).Suppose Y is related to X via Y = f ( X ) + e (9)where f ( . ) is any arbitrary function which could involve complicated non-linear and multiple-way interactions and e ∼ N (0 , σ ). Formally, BART is written as Y = m X j =1 g ( X, T j , M j ) + e (10)where ( T j , M j ) is the joint distribution of the j th binary tree structure T j with its corre-sponding b j terminal node parameters M j = ( µ j , . . . , µ b j j ). m is the number of regression11rees used to estimate f ( X ) and it is usually fixed at 200.BART is able to model multiple-way interactions by using regression trees. In essence,a binary regression tree in BART may be viewed as a penalized form of an Analysis ofVariance (ANOVA) model. When the binary regression tree only splits on one variable forthe whole tree, a main effects model is obtained. When the regression tree involve splitson many different variables, a multiple-way interaction model is obtained. BART combinesall m regression trees together in an additive manner to obtain non-linear estimates of themain and interaction effects. This additive procedure is done by first ‘breaking’ Y into m equal ‘pieces’ and fitting a regression tree to each piece. Subsequently, the regression treein each m piece is then estimated by looking at the residual produced by the other m − f ( X ). When the default priors of BART suggested by Chipman et al. (2010)are assumed, the MCMC ensures that the eventual distribution of the the sum of regressiontrees is concentrated around the true distribution of the model (Rockova and van der Pas,2017).For binary outcomes, BART uses a probit link where P ( Y = 1 | X ) = Φ( m X j =1 g [ X, T j , M j ]) (11)where Φ( . ) is the cdf of a standard normal distribution. Estimation of the posterior dis-tribution is similar to that of continuous outcomes but with the use of data augmentationmethods, i.e. draw a continuous latent variable based on whether Y = 1 or Y = 0 and thenrun the BART algorithm on the drawn latent variables.Kapelner and Bleich (2015) suggested a procedure to allow the BART algorithm to12nclude covariates that might contain missing values. In brief, the missingness in the co-variates are not imputed but instead, viewed as a ‘value level’ in the MCMC algorithm.The MCMC algorithm then ‘sends’ missing data to terminal nodes in the regression treesthat would maximize the likelihood. This is termed as “Missing Incorporated in Attributes”(MIA, Twala et al., 2008, Section 2). Kapelner and Bleich (2015) showed using simulationexamples that incorporating MIA within BART allows the appropriate handling of differenttypes of missing mechanism, MCAR, MAR, and NMAR, for each covariate. We utilize thisapproach to accommodate the missingness in our covariates for the data analysis. To determine the principal strata definition, we first investigated what the data for ourproblem could potentially look like. We constructed Table 1 for t = 3, p = 1, and no time-varying covariates without loss of generality. In this table, ‘x’ indicates an observed value, ‘?’represent a missing observation which needs to be imputed, and ‘NA’ indicates a structurallymissing observation. For the potential survival outcomes, we did not indicate whether theywere missing or observed because we wanted to use Table 1 to help us decide how we shouldbe stratifying our subjects once our proposed method imputes the counterfactual survivalstatus.From Table 1, we can see that the goal of our analysis is to provide inference about E [ Y Z ,...,Z t − Y Z ′ ,...,Z ′ t | S Z ,...,Z t − = S Z ′ ,...,Z ′ t − = 1], where Z l = Z ′ l for at least one l with l =13 , . . . , t i.e. we condition on subjects who would potentially survive under two different treat-ment regimes Z , . . . , Z t − and Z ′ , . . . , Z ′ t − . Thus, the distribution of ( S Z ,...,Z t − , S Z ′ ,...,Z ′ t − )form our principal strata and meaningful contrasts are defined only in the stratum where S Z ,...,Z t − = S Z ′ ,...,Z ′ t − = 1 since the potential outcomes for the two different treatmentregimes exist only in this stratum. For example, if we want to estimate the effect for a nega-tive wealth shock at t = 2 versus no negative wealth shock by t = 2 that is E [ Y − Y | S = 1],we restrict to subjects who survive if they did not receive a negative wealth shock at t = 1 i.e. subjects with S = 1 (Subjects 1-12 in Table 1). Note that the definition, E [ Y Z ,...,Z t − Y Z ′ ,...,Z ′ t | S Z ,...,Z t − = S Z ′ ,...,Z ′ t − = 1], is different from the parameter MSM esti-mates which is E [ Y Z ,...,Z t − Y Z ′ ,...,Z ′ t ]. We make the same four assumptions used by MSM (See Section 2.2). We impose a furtherfifth assumption of strict monotonicity in thatIf Z ≤ Z ′ , Z ≤ Z ′ , . . . , Z t − ≤ Z ′ t − , Z i = Z ′ i for any i, then S Z ,...,Z t − ≥ S Z ′ ,...,Z ′ t − . (12)As a consequence, we have for example, when t = 2, if S = 0 then S = 0 and if S = 1then S = 1. This means that we rule out the possibility of a subject who does not receive anegative wealth shock and dies but would survive if having received a neagtive wealth shock.Conversely, if a subject survives after having received a negative wealth shock, we rule outthe possibility that this same subject would die if he or she did not receive a negative wealthshock.Our proposed method then estimates E [ Y Z ,...,Z t − Y Z ′ ,...,Z ′ t | S Z ,...,Z t − = S Z ′ ,...,Z ′ t − = 1]14y imputing the survival status of each subject at the current time t and then combine the im-puted counterfactual survival status together with the observed survival status to determinewhich principal stratum a subject belongs to. We then use a slightly modified PENCOMPto impute the counterfactual outcomes among the potentially surviving subjects to accountfor the bias due to confounding by indication. This approach is doubly robust and reducesthe burden of model specification by the researcher. Subsequently, the average difference inthe treatment effect within the desired principal strata is calculated. Variance is estimatedusing Rubin’s combine rule to account for the imputation uncertainty (Heitjan and Little,1991). Detailed steps for our method are given below.1. Generate a bootstrap sample b from the data by sampling the units with replacement.2. Estimate the model X ( b ) z ( b )1 | Z ( b )1 = z ( b )1 , W ( b )1 , V ( b ) . Use this model to compute the coun-terfactual of X ( b ) z ( b )1 for bootstrap sample b .3. Estimate the distribution of Z ( b )1 | W ( b )1 , V ( b ) . Use this model to estimate the propensityto be assigned treatment Z ( b )1 = z ( b )1 as P ∗ z ( b )1 = P r ( Z ( b )1 = z ( b )1 | W ( b )1 , V ( b ) ). Note that wedid not perform a logit transformation to obtain P ∗ z ( b )1 (See PENCOMP Steps 2 and 5).This is because by using PENCOMP modified with BART to predict the outcomes, thenon-linear effect of the propensity of assigned treatment will be handled automatically.Hence, any non-linear transformation on the propensity of assigned treatment wouldnot be needed.4. Estimate the model Y ( b ) z ( b )1 | P ∗ z ( b )1 , Z ( b )1 = z ( b )1 , X ( b ) z ( b )1 , W ( b )1 , V ( b ) . As mentioned, we usedPENCOMP modified with BART to estimate this model. The advantage of usingBART is the researcher no longer needs to specify the model. BART automatically15akes care of any linear or non-linear main effects as well as linear or non-linear inter-actions. If we observe Equations 7 and 8, we can see that these two equations are con-structed using a non-linear spline specification on the propensity of assigned treatmentcombined with possible linear interactions between the propensity of assigned treat-ment and remaining covariates. This fits well with the type of estimation problemsthat BART was designed to solve. We then use the model produced by BART-modifiedPENCOMP to compute the counterfactual of Y ( b ) z ( b )1 for bootstrap sample b .5. Estimate the distribution for S ( b ) z ( b )1 | Z ( b )1 = z ( b )1 , Y ( b ) z ( b )1 , X ( b ) z ( b )1 , W ( b )1 , V ( b ) at t = 2. Use thismodel to generate a survival status for the counterfactual of S ( b ) z b taking into accountthe assumption of monotonicity in Equation (12) i.e. if S is observed and S = 0 then S = 0. Similarly, if S is observed and S = 1 then S = 1.6. Estimate the model X ( b ) z ( b )1 ,z ( b )2 | Z ( b )1 = z ( b )1 , Z ( b )2 = z ( b )2 , Y ( b ) z ( b )1 , X ( b ) z ( b )1 , W ( b )1 , W ( b )2 , V ( b ) . Usethe respective models to impute the counterfactual of X ( b ) z ( b )1 ,z ( b )2 , using any previouslyimputed values for the unobserved treatment regimes and restricting to the subjectsthat are observed and predicted to survive under the given treatment regimen of interestat t = 1.7. Estimate the distribution of Z ( b )2 | Z ( b )1 = z ( b )1 , Y ( b ) z ( b )1 , X ( b ) z ( b )1 , W ( b )1 , W ( b )2 , V ( b ) . Use this modelto estimate the propensity to be assigned treatment Z ( b )2 = z ( b )2 as P z ( b )2 = P r ( Z ( b )1 = z ( b )1 | X ( b ) z ( b )1 , Z ( b )1 = z ( b )1 , W ( b )1 , V ( b ) ). The probability of treatment regimen ( Z ( b )1 = z ( b )1 , Z ( b )2 = z ( b )2 ) is denoted as P ∗ z ( b )2 = P z ( b )2 P ∗ z ( b )1 .8. Estimate the model Y ( b ) z ( b )1 ,z ( b )2 | P ∗ z ( b )2 , Z ( b )1 = z ( b )1 , Z ( b )2 = z ( b )2 , Y ( b ) z ( b )1 , X ( b ) z ( b )1 , X ( b ) z ( b )1 ,z ( b )2 , W ( b )1 , W ( b )2 , V ( b ) t = 2. Use the respective models to impute thecounterfactual of Y ( b ) z ( b )1 ,z ( b )2 .9. Using a similar procedure for steps 5-8 with the restriction determined by S ( b ) z ( b )1 ,...,z ( b ) t − = S ( b ) z ′ ( b )1 ,...,z ′ ( b ) t − = 1 for time t where at least one z ( b ) t = z ′ ( b ) t , extend the estimation untilthe desired time point t = T .10. Repeat Steps 1-9 to obtain B bootstrap values forˆ∆ ( b ) z ( b )1 ,...,z ( b ) t − ,z ′ ( b )1 ,...,z ′ ( b ) t − = E [ Y ( b ) z ( b )1 ,...,z ( b ) t − − Y ( b ) z ′ ( b )1 ,...,z ′ ( b ) t − | S ( b ) z ( b )1 ,...,z ( b ) t − = S ( b ) z ′ ( b )1 ,...,z ′ ( b ) t − = 1] . with associated pooled variance W ( b ) z ( b )1 ,...,z ( b ) t − ,z ′ ( b )1 ,...,z ′ ( b ) t − .11. The estimate of∆ Z ,...,Z t ,Z ′ ,...,Z ′ t = E [ Y Z ,...,Z t − Y Z ′ ,...,Z ′ t | S Z ,...,Z t − = S Z ′ ,...,Z ′ t − = 1]is then ¯∆ z ,...,z t ,z ′ ,...,z ′ t ,B = B X b =1 ( ˆ∆ ( b ) z ( b )1 ,...,z ( b ) t − ,z ′ ( b )1 ,...,z ′ ( b ) t − ) /B, and the estimate of the variance of ¯∆ z ,...,z t ,z ′ ,...,z ′ t ,B is T B = ¯ W z ,...,z t ,z ′ ,...,z ′ t ,B + (1 + 1 /B ) D z ,...,z t ,z ′ ,...,z ′ t ,B , where ¯ W z ,...,z t ,z ′ ,...,z ′ t ,B = B X b =1 ( W ( b ) z ( b )1 ,...,z ( b ) t − ,z ′ ( b )1 ,...,z ′ ( b ) t − ) /B and D z ,...,z t ,z ′ ,...,z ′ t ,B = B X b =1 ( ˆ∆ ( b ) z ( b )1 ,...,z ( b ) t − ,z ′ ( b )1 ,...,z ′ ( b ) t − − ¯∆ z ,...,z t ,z ′ ,...,z ′ t ,B ) B − . Z ,...,Z t ,Z ′ ,...,Z ′ t follows a t distribution with degree of freedom ν ,∆ Z ,...,Z t ,Z ′ ,...,Z ′ t − ¯∆ z ,...,z t ,z ′ ,...,z ′ t ,B √ T B ∼ t ν , where ν = ( B − ¯ W z ,...,zt,z ′ ,...,z ′ t,B D z ,...,zt,z ′ ,...,z ′ t,B ( B +1) ) . Remark . The idea of including the BART estimated propensity score within BARTas a predictor in Steps 4 and 8 is not new. Hahn et al. (2018) showed that including aBART estimated propensity score as a predictor within BART improved the estimation ofheterogenous treatment effects for observational studies. Tan et al. (2018) also reportedthat the inclusion of the BART estimated propensity score as a predictor within BART toimpute missing data, under the missing at random assumption, worked well in situationswhere the non-linear main and interaction effects are complex for the mean and propensitymodel. For situations with simpler non-linear effects like a quadratic relationship, usingBART to estimate the propensity score and imputing the missing values using penalizedsplines of propensity prediction (Zhang and Little, 2009, PENCOMP version for missingdata) worked better. Using PENCOMP with a BART estimated propensity score for Steps4 and 8 would be an interesting alternative. However, our aim of Steps 4 and 8 was to easethe implementation burden on the researcher. Hence, we suggest the use of PENCOMP witha BART estimated propensity score for Steps 4 and 8 only if the researcher is certain thatthe non-linear effect has a simple form for example, a quadratic or cubic relationship.
We conducted a simulation study to determine how well our proposed method would per-form compared to the na¨ıve method and MSM in three scenarios: 1) where there is low18ssociation between treatment allocation and confounder as well as treatment and survivalstatus; 2) where there is a strong association between treatment and confounder as well astreatment and survival status; and finally 3) where there is a strong association betweentreatment and confounder, treatment and survival status, and an interaction between treat-ment, confounder, and survival status. We expect all three methods to perform well in thefirst scenario because there is little to no confounding. For the second scenario, we expectMSM and our proposed method to perform well because there is no difference in the treat-ment effect between the principal strata, and other stratification groups. The na¨ıve methodshould not perform well due to the strong association between treatment and confounderas well as treatment and survival status. Finally, for scenario three, we expect only ourproposed method to perform well because an association between the treatment effect andprincipal strata, S Z ,...,Z t − = S Z ′ ,...,Z ′ t − = 1, is induced by the stronger interaction effectbetween treatment, confounder, and survival status. We fit standard linear and logistic re-gression models rather than BART and PENCOMP with BART since our focus is not onmodel misspecification but rather, the effect of confounding by indication and censoring bydeath. To set up our simulation study, we set the size of our target population as 1 million. Wethen generate a single baseline variable V from a normal distribution. We set T = 3 andmodel our treatment allocation, Z , as logit [ P ( Z = 1 | V )] = γ + γ V. (13)19or the potential outcome at t = 1, Y Z , we model it as Y Z = β + β Z I { Z = 1 } + β V V + β V Z
V I { Z = 1 } + e, (14)where e ∼ N (0 , t = 2, S Z as logit ( P [ S Z = 1 | V, Y Z ]) = α + α Y Y I { Z = 1 } + α Y Y [1 − I { Z = 1 } ]+ α Z I { Z = 1 } + α V V + α V Z
V I { Z = 1 } . (15)Monotonicity is imposed by setting S = 1 if S = 1. Because a negative wealth shock is anabsorbing state, if Z = 1, then Z = 1. So when Z = 0, we have logit ( P [ Z = 1 | V, Y ]) = γ + γ Y , Y + γ V. (16)We model the potential outcome at t = 2, Y Z ,Z as Y Z ,Z = β + β Z I { Z = 0 , Z = 1 } + β Z I { Z = 1 , Z = 1 } + β Y Z Y I { Z = 0 , Z = 0 } + β Y Z Y I { Z = 0 , Z = 1 } + β Y Z Y I { Z = 1 , Z = 1 } + β V V + β V Z V I { Z = 0 , Z = 1 } + β V Z V I { Z = 1 , Z = 1 } + e, (17)where e ∼ N (0 , t = 3, S Z ,Z , if S Z = 0, then S Z ,Z = 0. When20 Z = 1, we have logit ( P [ S Z ,Z = 1 | X, Y Z ,Z , S Z = 1]) = α + α Z I { Z = 0 , Z = 1 } + α Z I { Z = 1 , Z = 1 } + α Y Z Y I { Z = 0 , Z = 0 } + α Y Z Y I { Z = 0 , Z = 1 } + α Y Z Y I { Z = 1 , Z = 1 } + α V V + α V Z V I { Z = 0 , Z = 1 } + α V Z V I { Z = 1 , Z = 1 } . (18)Again, we impose monotonicity by setting S = S = 1 if S = 1 and S = 1 if S = 1.For the treatment allocation at t = 3, Z , if Z = Z = 0, we have logit ( P [ Z = 1 | X, Y ]) = γ + γ Y Y + γ Y , Y + γ V. (19)For the potential outcome at t = 3, Y Z ,Z ,Z , we have Y Z ,Z ,Z = β + β Z I { Z = 0 , Z = 0 , Z = 1 } + β Z I { Z = 0 , Z = 1 , Z = 1 } + β Z I { Z = 1 , Z = 1 , Z = 1 } + β Y Z Y I { Z = 0 , Z = 0 , Z = 0 } + β Y Z Y I { Z = 0 , Z = 0 , Z = 1 } + β Y Z Y I { Z = 0 , Z = 1 , Z = 1 } + β Y Z Y I { Z = 1 , Z = 1 , Z = 1 } + β Y Z Y I { Z = 0 } + β Y Z Y I { Z = 1 } + β V V + β V Z
V I { Z = 0 , Z = 0 , Z = 1 } + β V Z
V I { Z = 0 , Z = 1 , Z = 1 } + β V Z
V I { Z = 1 , Z = 1 , Z = 1 } + e. (20)21able 2 shows the parameters we used to achieve the three different simulation sce-narios. Scenario 1 is achieved by setting γ , α Z , γ , γ Y , , α Z , α Z , γ , γ Y , , and γ Y tobe about 10 times smaller than the values in Scenarios 2 and 3. The rest of the differencesbetween Scenario 1 versus 2 and 3 were to ensure the resulting simulated population wouldhave enough deaths and subjects in the various different treatment regimes for the assump-tions used by MSM and our proposed method to be valid. The difference between Scenario2 versus 3 lie in β V Z , α Y , α Y , β Y Z , β Y Z , β Y Z , α Y Z , α Y Z , α Y Z , β Y Z , β Y Z , β Y Z , and β Y Z where the values for Scenario 2 is about 10 times smaller compared toScenario 3.To calculate the true parameters, we used the generated population data (size 1 mil-lion), and then took:1. ∆ , = ¯ Y − ¯ Y ;2. ∆ , = ¯ Y − ¯ Y given S = 1;3. ∆ , = ¯ Y − ¯ Y given S = S = 1;4. ∆ , = ¯ Y − ¯ Y given S = S = 1;5. ∆ , = ¯ Y − ¯ Y given S = 1;6. ∆ , = ¯ Y − ¯ Y given S = S = 1;7. ∆ , = ¯ Y − ¯ Y given S = S = 1;8. ∆ , = ¯ Y − ¯ Y given S = S = 1;9. ∆ , = ¯ Y − ¯ Y given S = S = 1; and220. ∆ , = ¯ Y − ¯ Y given S = S = 1.We measured performance using the empirical bias, root mean squared error (RMSE),95% coverage, and the average 95% Confidence Interval (CI) length (AIL). 1000 simulationswere used to estimate these quantities. Under each simulation, a simple random sample of4,000 or 8,000 subjects was drawn from the target population data. All methods were thenimplemented on the sampled data to obtain the effect estimates. For MSM and our proposedmethod, the models were specified using Equations 13 to 20 respectively. For our proposedmethod, because our focus is not on model misspecification but rather, confounding by in-dication and censoring by death, we chose to implement a simpler version of our methodby skipping Steps 3 and 7 of our algorithm and using Y ( b ) z ( b )1 | Z ( b )1 = z ( b )1 , X ( b ) z ( b )1 , W ( b )1 , V ( b ) and Y ( b ) z ( b )1 ,z ( b )2 | Z ( b )1 = z ( b )1 , Z ( b )2 = z ( b )2 , Y ( b ) z ( b )1 , X ( b ) z ( b )1 , X ( b ) z ( b )1 ,z ( b )2 , W ( b )1 , W ( b )2 , V ( b ) for Steps 4 and 8 respec-tively. We also simplified the prediction of the potential outcomes and survival status byusing linear and logistic regression instead of BART. Table 3 shows the simulation results for sample size of 4,000. As expected, under Scenario 1,all three methods were relatively unbiased with all three methods achieving similar RMSE.MSM and our proposed method reported slightly greater than nominal coverage due to thewider AIL. Under Scenario 2, the absolute bias of the na¨ıve method was always larger thanMSM and our proposed method. RMSE was larger as well in comparison and coveragewas often far below the nominal 95% value. For this scenario MSM produced the lessconservative coverage while our proposed method suggested better bias performance andreduced RMSE. Finally, under Scenario 3, the na¨ıve method was clearly biased with poor23MSE and coverage. MSM performed slightly better compared to the na¨ıve method butabsolute bias clearly increased compared to Scenario 2. Coverage for some treatment effectswere poor as well. Our proposed method remained unbiased, produced a lower RMSEcompared to the other two methods, and reached nominal coverage under Scenario 3. Allmethods behaved as expected under these three scenarios.Table 4 shows the results with the sample size increased to 8,000, approximately thesample size in our application. The simulation results for all three methods under Scenario1 remained relatively similar. Under Scenario 2, an increase in sample size did not affect theabsolute bias of all three methods but, the coverage of the na¨ıve method was clearly affectedwith huge decreases in the coverage for all parameters. Coverage for MSM and our proposedmethod remained fairly similar. Finally, under Scenario 3, we observe once again that theamount of bias for the three methods remained the same but, coverage for the na¨ıve methodand MSM decreased for most of the treatment effects when the sample size increased to8,000. Coverage for our proposed method remained relatively similar to the results observedfor the sample size of 4,000. In summary, bias for the three methods was rather stable whenthe sample size changed. However, if the method is poor in the estimation of the particulartreatment effect, increasing the sample size can cause large decreases in coverage.24
Determining the effect of a negative wealth shockon cognitive score for Health and Retirement Studysub jects
To investigate the association between negative wealth shock and cognitive ability in latemiddle aged US adults, we used data from the Health and Retirement Study (HRS). HRSis a longitudinal study of US adults, enrolled at age 50 and older. These individuals havebeen surveyed biennially since 1992 with detailed modules on financial status and health(Sonnega et al., 2014).We use HRS data collected from 1996 to 2002 for our analysis. Subjects were obtainedfrom the original HRS cohort, born in the years 1931-1941. Although data collection beganin 1992, consistent collection of a subject’s cognitive ability only began in 1996. Hence, weexcluded the data collected before 1996 and treated the variables collected in 1996 as thebaseline for our analysis. We excluded subjects who did not have longitudinal measurementsfor net worth because we were unable to distinguish whether they have already experienceda negative wealth shock. Subjects with zero or negative net worth at baseline were excludedsince we did not know if these subjects have lifelong asset poverty or experienced a negativewealth shock prior to study entry. We also removed subjects who experienced a negativewealth shock and death between 1992 to 1996. These subjects were removed because theywere no longer at risk for a negative wealth shock or death. There were 9,750 participantsin the original HRS cohort, and of these, 7,106 participants (72.9%) were eligible for this25nalysis. These participants consists of a representative sample of the 1996 US populationaged 55 to 65 who had not experienced a negative wealth shock in the previous five years.
To determine whether a subject experienced a negative wealth shock from the previous follow-up period to the current follow-up period, we first obtained data from the module assessingnet worth administered at every wave of HRS. Measured assets include housing value, netvalue of businesses, individual retirement accounts, checking/savings accounts, certificatesof deposits and savings bonds, investment holdings, net value of vehicles, and the value ofany other substantial assets. From this asset total, debts were subtracted, including homemortgages, other home equity loans, and unsecured debt values, like credit card balances,student loans, and medical debts. Missing values for wealth were imputed at the level ofeach asset or debt, using an unfolding bracket imputation method (Juster and Smith, 1997).Wealth data were not imputed for those who do not participate in a given wave. Negativewealth shock was measured and then dichotomized (yes or no) for each time point. Lossof 75% or more of total wealth between two consecutive waves was used as the cut-pointfor negative wealth shock (Pool et al., 2017). Subjects were considered at risk for negativewealth shock until they have experienced a negative wealth shock or reached age 65.
The cognitive ability of a subject is assessed in HRS using the Telephone Interview forCognitive Status (TICS). Unfortunately, the full HRS cognitive battery is not available forparticipants under 65. Hence, we used an abbreviated measure that included questions26bout episodic memory (Immediate Word recall [10 points] and Delayed Word recall [10points]) and mental status (Serial 7’s [5 points], backwards counting from 20 [2 points])(Crimmins et al., 2011). All responses were combined to create a composite score rangingfrom 0 to 27, with a higher score indicating higher cognitive ability. Some of these measuresmay be imputed implying that the cognitive summary score may include one or more imputedscores (Fisher et al., 2018). We treated this measure as continuous and normally distributed.
Tables 5 to 6 show the descriptive statistics of the subjects at baseline by whether or not theyexperienced a negative wealth shock over the next six years regardless of survival status. Atbaseline, aside from whether the subject eventually survived until 2002 and health conditionslike whether the subject ever had heart problems, high blood pressure, and stroke, all theother variables in Tables 5 to 6 were significantly associated with experiencing a negativewealth shock. A typical subject who would eventually experience a wealth shock would have alower cognitive score at baseline; slightly higher BMI; lower opinion about his or her health;lower word recall score; likely still smoking; not insured; have depression; slightly lowerincome; either working, unemployed, or disabled; divorced or never married; lower wealthrank; have diabetes and/or psychological problems; younger; lesser years of education; andlikely non-White.Table 7 shows the change in unadjusted mean cognitive score between consecutivewaves for subjects who did not receive a wealth shock versus those who ever received anegative wealth shock. Follow-up surveys occurred at years 2, 4, and 6. We can see thatfor a subject who ever got shocked, the largest observed decline in cognitive score occurs27rom Baseline to Wave 1. Subsequently, the decline in cognitive score is no longer as largebetween waves. Similarly, the bulk of our subjects were shocked at Wave 1 (second year offollow up). In later waves, the proportion of new subjects who received a negative wealthshock decreases.
We were interested in how a negative wealth shock would affect the cognitive ability of latemiddle aged adults in the HRS during the six years of follow-up as well as how the durationof a negative wealth shock affects cognitive ability accounting for missingness in the cognitiveoutcome as well as censoring by death. We employed four different methods to estimate thiseffect and make inference. The four methods were the na¨ıve method, where all subjects whodied under their observed negative wealth shock status were removed from analysis; baselineadjusted method, where similar to the na¨ıve method, all subjects who died were removedfrom analysis but the mean cognitive score was adjusted using a model that included allbaseline covariates; MSM, where negative wealth shock allocation, missingness, and censor-ing by death were accounted for by inverse probability weighting; and our proposed methodincluding the PENCOMP modification described in Subsection 3.2. We assumed that de-pression was the time-varying covariate that depends on the negative wealth shock status( X Z ,...,Z t in Section 2) and the rest of the time-varying covariates are: self-reported healthstatus, whether subject was insured, labor force status of subject, income, level of alcoholconsumption, current smoking status, and number of health conditions ( W t in Section 3).We also assumed that the cognitive score is missing at random given the baseline variablespresented in Tables 5 to 6, past negative wealth shock status, time-varying covariates, and28ognitive score. For MSM, we accounted for this missingness by modeling the propensity ofresponse while for our proposed method, we imputed the missing cognitive score by usingthe modified version of PENCOMP discussed in Subsection 3.2. All our models (baseline ad-justed, MSM, and our proposed method) were specified using BART. For the na¨ıve, baselineadjusted, and MSM method, we employed 1,000 bootstrap samples to calculate the meanand the 95% Confidence Interval (CI). The 95% CI was determined by taking the 2 . . Table 8 shows the adjusted effect estimate of a negative wealth shock on cognitive scoredepending on the duration of the shock for late middle aged adults in the original HRScohort from 1996 to 2002. In general, the na¨ıve and baseline adjusted method suggests thatexperiencing a negative wealth shock has a much larger negative effect on the cognitive scoreof subjects in our sample compared to the adjusted estimates reported by MSM and ourproposed method. The na¨ıve and baseline adjusted method produced very similar resultssuggesting low association between cognitive score and the baseline covariates. The effectfor subjects who experienced a negative wealth shock within the first 2 years of follow upversus no shock (6 years vs. no shock), subjects who experienced a negative wealth shockwithin the first 2 years of follow up versus subjects who experienced a negative wealth shockbetween the second and fourth year of follow up (6 years vs. 2 years), and subjects whoexperienced a negative wealth shock within the first 2 years of follow up versus subjects whoexperienced a negative wealth shock between the fourth and sixth year of follow up (6 years29s. no shock), were significantly larger than 0 under the na¨ıve and baseline adjusted method.For MSM and our proposed method all effects were reported to be not significant.
In this paper, we were interested in how a negative wealth shock affects the cognitive abilityof late middle aged Americans participating in the HRS from 1996 to 2002. The maindifficulty we faced was the presence of death in some subjects causing their cognitive scoreto be censored. Under situations where we believe death does not depend on the cognitiveability or whether a subject received a negative wealth shock, removing subjects who havedied from our analysis would yield an unbiased estimate of the effect of negative wealth shockon cognitive ability as our simulation results suggest. Unfortunately, it is very possible thatsubjects with lower cognitive ability and/or have experienced a negative wealth shock wouldhave a higher risk of death. In this situation, accounting for the censoring by death would beneeded. This is because without randomization, there is a high likelihood that the proportionof deaths between subjects who did not receive a negative wealth shock versus those whoreceived a wealth shock, would be imbalanced. In addition, subjects who die are more likelyto have a lower cognition score. As a result, if we remove the subjects who died from ouranalysis, the effect of the negative wealth shock on cognitive ability that we measure wouldbe confounded by death. Although MSM is commonly employed to weight the subjects whosurvived, this approach is arguably not sensible and would likely produce biased estimateswhen the effect depends on the principal strata as well as when adjustments on the weightshave to be employed in order to stabilize the MSM estimate. To overcome these issues, wepropose a new method to estimate the effect by imputing the counterfactual survival status30f each subject in order to compare outcomes among individuals who would survive onlyunder both sets of treatments being considered. Our method remained unbiased for all thesimulation scenarios we tried and produced reasonable coverage. When applied to the HRSdataset, our method suggested that the effect of a negative wealth shock on the cognitiveability is close to null whereas the na¨ıve method and MSM suggested an estimate with aslightly larger effect.One shortcoming of our approach is our failure to incorporate the HRS sample de-sign, in particular the sampling weights, in our inference. Given that a key use of weights inregression-type analysis is to reduce the effect of model misspecification (Korn and Graubard,1995), we hope that our use of BART will minimize the degree of model misspecification.We leave the incorporation of such features in a general approach to future work. Anotheraspect of our method which could be improved is to allow our method to be applicable tostudies where the follow-up time is not fixed. In such a situation, Cox based survival modelswould have to be employed and time would have to be included as a covariate in the survivaland outcome models. The difficulty in this extension would be how to develop a systematicway, applicable to all subjects, to determine the relation in time between the allocation ofthe treatment, measuring the outcome, and death.
References
Butrica, B., Smith, K., and Toder, E. (2010). What the 2008 stock market crash means forretirement security.
Journal of Aging & Social Policy
Epidemiology
The Annals of Applied Statistics The Journals of Gerontology, Series B: PsychologicalSciences and Social Sciences
SocialScience and Medicine
Biometrics
Biometrics http://hrsonline.isr.umich.edu/modules/meta/xyear/cogimp/desc/COGIMPdd.pdf
The University of Michigan Institute of Survey Research, Survey Research Center,AnnArbor, MI . 32rangakis, C. and Rubin, D. (2002). Principal Stratification in Causal Inference.
Biometrics
A Theory of the Consumption Function . Princeton University Press,Princeton, NJ.Hahn, P., Murray, J., and Carvalho, C. (2018). Bayesian Regression Tree Models forCausal Inference: Regularization, Confounding, and Heterogeneous Effects. arXiv page1706.09523v2.Heitjan, D. and Little, R. (1991). Multiple Imputation for the Fatal Accident ReportingSystem.
Applied Statistician
Causal Inference for Statistical, Social, and BiomedicalSciences: An Introduction . Cambridge University Press, New York, NY.Juster, F. and Smith, J. (1997). Improving the Quality of Economic Data: Lessons from theHRS and AHEAD.
Journal of the American Statistical Association
The Canadian Journal of Statistics
The American Statistician
International Journal of Epidemiology
InternationalPsychogeriatrics
Journal of theRoyal Statistical Society
Annals of Internal Medicine
Journal of the American Medical Association
Journal of Epidemiology and Community Health
Epidemiology
Journal of the American Statistical Association arXiv page 1708.08734.34osenbaum, P. and Rubin, D. (1983). The Central Role of the Propensity Score in Obser-vational Studies for Causal Effects.
Biometrika
Journal of Educational Psychology
The Journals of Gerontology, Series B: Psychological Sciences andSocial Sciences
International Journal of Epidemiology
Social Science & Medicine arXiv page1801.03147.Twala, B., Jones, M., and Hand, D. (2008). Good Methods for Coping with Missing Datain Decision Trees.
Pattern Recognition Letters
Epidemiology
Biometrics
Journal of the American Statistical Association page In press.36able 1: Sample example of a censoring by death dataset until t = 3 where Z t = 1 indicates a subject having experienced a negativewealth shock and Z t = 0 indicates a subject have not experienced any negative wealth shock till time tV Z Y Y S S Z Y Y Y S S S Z Y Y Y Y Subject 1 x 1 x ? 1 1 1 ? ? x 1 1 1 1 ? ? ? xSubject 2 x 0 ? x 1 1 1 ? x ? 1 1 1 1 ? ? x ?Subject 3 x 1 x ? 1 1 1 ? ? x 1 1 0 NA ? ? ? NASubject 4 x 0 ? x 1 1 1 ? x ? 1 1 0 1 ? ? x NASubject 5 x 0 ? x 1 1 0 x ? ? 1 0 1 0 x ? NA ?Subject 6 x 0 ? x 1 1 0 x ? ? 0 1 1 NA NA NA ? ?Subject 7 x 0 ? x 1 1 0 x ? ? 0 1 1 NA NA NA ? ?Subject 8 x 0 ? x 1 1 0 x ? ? 1 0 0 0 x ? NA NASubject 9 x 1 x ? 0 1 NA ? ? NA 1 1 0 NA ? ? ? NASubject 10 x 1 x ? 0 1 NA ? ? NA 0 1 0 NA NA NA ? NASubject 11 x 0 ? x 0 1 1 ? x NA 0 1 0 1 NA NA x NASubject 12 x 0 ? x 0 1 0 x ? NA 0 1 0 NA NA NA ? NASubject 13 x 1 x ? 1 0 1 NA NA x 0 0 1 1 NA NA NA xSubject 14 x 0 ? x 1 0 NA NA NA ? 0 0 1 NA NA NA NA ? able 2: Table of parameters for simulation Scenario 1 Scenario 2 Scenario 3
V N (0 , ) N (17 , ) N (17 , ) γ γ -0.02 -0.2 -0.2 β β Z -1.5 -1.5 -1.5 β V β V Z -0.005 -0.11 -0.05 α α Y α Y α Z -0.01 -0.2 -0.2 α V α V Z -0.002 -0.02 -0.02 γ -0.002 -0.02 -0.02 γ Y , -0.02 -0.2 -0.2 β Z -1.5 -1.5 -1.5 β Z -1 -1 -1 β Y Z β Y Z β Y Z β V Z -0.00011 -0.011 -0.011 β V Z -0.00005 -0.005 -0.005 α Z -0.01 -0.2 -0.2 α Z -0.015 -0.1 -0.1 α Y Z α Y Z α Y Z α V Z -0.0001 -0.02 -0.02 α V Z -0.0005 -0.05 -0.05 γ -0.0002 -0.002 -0.002 γ Y , -0.002 -0.02 -0.02 γ Y -0.02 -0.2 -0.2 β Z -1.5 -1.5 -1.5 β Z -1 -1 -1 β Z -0.5 -0.5 -0.5 β Y Z β Y Z β Y Z β Y Z β Y Z β Y Z β V Z -0.00011 -0.011 -0.011 β V Z -0.00005 -0.005 -0.005 β V Z -0.00003 -0.003 -0.003
Scenario 1 Na¨ıve MSM ProposedParameter True value Bias RMSE 95% Coverage AIL Bias RMSE 95% Coverage AIL Bias RMSE 95% Coverage AIL∆ , -1.497 -0.001 0.032 95.4 0.123 -0.0002 0.032 95.1 0.123 -0.0001 0.032 97.0 0.143∆ , -1.499 -0.003 0.050 95.3 0.202 -0.003 0.050 95.3 0.202 -0.003 0.050 95.7 0.214∆ , -1.005 -0.003 0.049 95.0 0.189 -0.001 0.049 94.7 0.189 -0.001 0.049 99.2 0.262∆ , , -1.502 0.005 0.081 93.9 0.314 0.005 0.081 98.9 0.411 0.005 0.082 94.6 0.333∆ , -1.008 0.004 0.074 94.8 0.284 0.004 0.074 99.0 0.370 0.004 0.075 97.8 0.350∆ , -0.504 0.006 0.072 95.2 0.284 0.007 0.072 99.4 0.371 0.007 0.074 100.0 0.529∆ , , , , -3.367 -0.047 0.061 78.5 0.154 0.002 0.041 93.8 0.160 0.002 0.041 96.1 0.177∆ , -1.727 -0.037 0.054 83.2 0.149 -0.032 0.051 86.9 0.150 -0.002 0.037 96.4 0.161∆ , -1.199 -0.136 0.146 24.5 0.202 -0.020 0.057 92.5 0.204 -0.004 0.053 96.5 0.229∆ , , -1.727 -0.029 0.062 91.9 0.220 -0.023 0.060 94.8 0.240 0.001 0.053 96.1 0.227∆ , -1.183 -0.065 0.082 75.0 0.199 -0.047 0.069 87.5 0.217 0.0004 0.048 97.8 0.220∆ , -1.169 -0.167 0.181 33.8 0.273 -0.042 0.084 93.2 0.305 -0.004 0.071 98.7 0.350∆ , , , , -2.347 -0.123 0.130 14.5 0.160 0.002 0.042 94.0 0.160 0.002 0.042 95.9 0.177∆ , -2.559 -0.114 0.122 23.9 0.165 -0.060 0.074 70.2 0.164 -0.001 0.038 96.4 0.163∆ , -3.062 -0.231 0.239 2.8 0.232 -0.033 0.068 89.9 0.226 -0.004 0.058 97.0 0.260∆ , -0.502 -0.118 0.132 48.9 0.233 0.026 0.065 92.2 0.227 -0.003 0.059 96.7 0.260∆ , -2.820 -0.125 0.139 47.9 0.242 -0.062 0.087 88.7 0.273 -0.0004 0.054 95.9 0.224∆ , -3.605 -0.143 0.152 19.6 0.198 -0.087 0.101 69.2 0.225 -0.006 0.045 96.4 0.202∆ , -4.032 -0.290 0.301 5.2 0.319 -0.082 0.117 89.4 0.376 -0.009 0.080 98.6 0.400∆ , -0.785 -0.019 0.060 93.3 0.225 -0.026 0.063 95.0 0.256 -0.006 0.052 96.7 0.226∆ , -1.217 -0.160 0.181 54.4 0.336 -0.015 0.087 97.2 0.396 -0.009 0.083 99.3 0.442∆ , -0.432 -0.141 0.160 54.9 0.306 0.011 0.080 97.4 0.363 -0.006 0.075 98.7 0.373 Scenario 1 Na¨ıve MSM ProposedParameter True value Bias RMSE 95% Coverage AIL Bias RMSE 95% Coverage AIL Bias RMSE 95% Coverage AIL∆ , -1.497 -0.001 0.023 94.2 0.087 0.0003 0.023 94.0 0.087 0.0003 0.023 96.2 0.100∆ , -1.499 -0.002 0.036 95.3 0.143 -0.001 0.036 94.8 0.143 -0.001 0.037 95.5 0.151∆ , -1.005 -0.002 0.034 94.8 0.134 -0.0007 0.034 95.1 0.134 -0.001 0.035 98.7 0.183∆ , , -1.502 0.005 0.057 95.0 0.222 0.005 0.057 99.0 0.289 0.005 0.058 95.4 0.235∆ , -1.008 0.004 0.051 94.5 0.201 0.004 0.051 98.4 0.260 0.004 0.052 97.1 0.246∆ , -0.504 0.005 0.051 95.1 0.201 0.007 0.052 98.3 0.261 0.006 0.053 99.7 0.369∆ , , , , -3.367 -0.047 0.055 59.6 0.109 0.002 0.029 94.0 0.113 0.003 0.029 95.9 0.125∆ , -1.727 -0.036 0.045 73.4 0.105 -0.031 0.041 78.8 0.106 -0.001 0.026 96.1 0.113∆ , -1.199 -0.134 0.139 4.0 0.142 -0.018 0.041 92.4 0.144 -0.001 0.036 96.9 0.161∆ , , -1.727 -0.029 0.049 87.9 0.156 -0.024 0.047 93.2 0.170 0.0001 0.038 96.2 0.160∆ , -1.183 -0.066 0.075 54.3 0.141 -0.048 0.060 77.7 0.153 -0.001 0.036 97.7 0.156∆ , -1.169 -0.166 0.173 7.8 0.193 -0.040 0.065 90.4 0.215 -0.003 0.049 98.9 0.246∆ , , , , -2.347 -0.123 0.126 1.5 0.113 0.002 0.029 94.5 0.113 0.003 0.029 95.7 0.124∆ , -2.559 -0.114 0.118 3.4 0.117 -0.060 0.067 49.3 0.113 -0.001 0.028 96.1 0.115∆ , -3.062 -0.230 0.234 0.1 0.164 -0.032 0.052 86.5 0.159 -0.004 0.040 97.3 0.183∆ , -0.502 -0.118 0.125 19.4 0.164 0.026 0.048 91.0 0.160 -0.003 0.040 98.2 0.184∆ , -2.820 -0.125 0.133 16.6 0.171 -0.063 0.076 78.3 0.192 -0.002 0.039 95.6 0.157∆ , -3.605 -0.143 0.147 2.2 0.140 -0.087 0.093 40.8 0.159 -0.007 0.032 96.7 0.142∆ , -4.032 -0.290 0.296 0.1 0.225 -0.081 0.099 81.5 0.265 -0.010 0.057 98.7 0.282∆ , -0.785 -0.018 0.044 93.4 0.159 -0.024 0.047 94.7 0.181 -0.005 0.037 97.0 0.161∆ , -1.217 -0.160 0.171 22.4 0.238 -0.013 0.062 97.1 0.278 -0.008 0.059 98.7 0.311∆ , -0.432 -0.142 0.152 26.4 0.216 0.011 0.056 97.9 0.255 -0.006 0.052 98.7 0.264 No wealth shock Ever wealth shockVariables Mean/Frequency (S.E./%) Mean/Frequency (S.E./%) p -valueEventually survived?: 0.57Yes 6,207 (94.7) 516 (94.0)No 350 (5.3) 33 (6.0)Cognitive score 17.07 (4.07) 16.26 (4.35) < . < . < . < . < . < . < . < . < . < .
010 1,728 (26.4) 326 (59.4)1 2,360 (36.0) 124 (22.6)2 2,469 (37.7) 99 (18.0)Gender: 0.08Male 3,113 (47.5) 239 (43.5)Female 3,444 (52.5) 310 (56.5)