[PDF] A Bayesian semi-parametric approach for inference on the population partly conditional mean from longitudinal data with dropout

Abstract

Studies of memory trajectories using longitudinal data often result in highly non-representative samples due to selective study enrollment and attrition. An additional bias comes from practice effects that result in improved or maintained performance due to familiarity with test content or context. These challenges may bias study findings and severely distort the ability to generalize to the target population. In this study we propose an approach for estimating the finite population mean of a longitudinal outcome conditioning on being alive at a specific time point. We develop a flexible Bayesian semi-parametric predictive estimator for population inference when longitudinal auxiliary information is known for the target population. We evaluate sensitivity of the results to untestable assumptions and further compare our approach to other methods used for population inference in a simulation study. The proposed approach is motivated by 15-year longitudinal data from the Betula longitudinal cohort study. We apply our approach to estimate lifespan trajectories in episodic memory, with the aim to generalize findings to a target population.

Full PDF

AA Bayesian semi-parametric approach for inference on thepopulation partly conditional mean from longitudinaldata with dropout

Josefsson, Maria , , , ∗ , Daniels, Michael J. , & Pudas, Sara , Department of Statistics, USBE, Ume˚a University, Sweden. Centre for Demographic & Ageing Research, Ume˚a University, Sweden. Ume˚a Center for Functional Brain Imaging, Ume˚a University, Sweden. Department of Statistics, University of Florida, USA. Department of Integrative Medical Biology, Ume˚a University, Sweden. ∗ Corresponding author, E-mail: [email protected]

Summary

Studies of memory trajectories using longitudinal data often result in highly non-representative samples due to selective study enrollment and attrition. An additional bias comesfrom practice eﬀects that result in improved or maintained performance due to familiaritywith test content or context. These challenges may bias study ﬁndings and severely distortthe ability to generalize to the target population. In this study we propose an approachfor estimating the ﬁnite population mean of a longitudinal outcome conditioning on beingalive at a speciﬁc time point. We develop a ﬂexible Bayesian semi-parametric predictiveestimator for population inference when longitudinal auxiliary information is known forthe target population. We evaluate sensitivity of the results to untestable assumptionsand further compare our approach to other methods used for population inference ina simulation study. The proposed approach is motivated by 15-year longitudinal datafrom the Betula longitudinal cohort study. We apply our approach to estimate lifespantrajectories in episodic memory, with the aim to generalize ﬁndings to a target population.1 a r X i v : . [ s t a t . M E ] N ov ey words: BART, Memory, MNAR, Non-ignorable dropout, Population inference, Sen-sitivity analysis, Truncation by death.

Studies of lifespan trajectories in memory using longitudinal data present numerousmethodological challenges including highly non-representative samples, due to selectivestudy enrollment and attrition (Weuve et al. 2015), and practice eﬀects, which results inimproved or maintained performance due to familiarity with test content or context (Salt-house 2010). These challenges may bias study ﬁndings and severely distort the ability togeneralize to the target population, even in well-designed studies.Attrition is inevitable especially if individuals in the studied population are followedover a long time period. Standard methods often rely on the assumption of missing atrandom (MAR) which is invalid if the missingness is missing not at random (MNAR) ordue to death. One natural way of dealing with MNAR missingness in any analysis ofincomplete data is to explore sensitivity to deviations from the MAR assumption. Thiscan be accomplished in a Bayesian setting, by introducing sensitivity parameters thatincorporate prior beliefs about the diﬀerences between respondents and non-respondentsand assigning an appropriate prior distribution (Daniels and Hogan 2008). However,attrition due to death must be treated diﬀerently if a participant’s death during follow-up truncates the outcome process (Kurland et al. 2009). For example, in research onaging patterns in memory, memory performance after death is not deﬁned and not ofinterest to study.Joint models have been proposed to address the combination of dropout and death(Rizopoulos 2012). This so called mortal cohort inference truncate participants outcomesafter death, either by conditioning on the sub-population who are still alive at that time2oint, i.e. partly conditional inference (Kurland 2005), or by conditioning on the subpop-ulation that would survive irrespective of exposure, i.e. principal stratiﬁcation (Frangakisand Rubin 2002; Frangakis et al. 2007). The latter approach has been proposed to handletruncation by death when interest is in estimating the survivors average causal eﬀect (e.g.Josefsson et al. 2016, McGuinness et al. 2019, Shardell and Ferrucci 2018). In contrast,partly conditional inference has been proposed for estimating non-causal associations. Forexample by using augmented inverse probability weighting for longitudinal cohort datawith truncation by death and MAR dropout (Wen, Terrera, and Seaman 2018) as wellas MNAR dropout (Wen and Seaman 2018). In addition, Li and Su (2018) proposed anapproach for longitudinal semicompeting risk data with MNAR missingness and death.This type of inference can for example be of interest, when the studied outcome is re-lated to health of the elderly and thereby the need for health promotion and/or diseaseprevention programs. Since this concerns all individuals who are alive. These methods,however, fails to account for the complex nature of the sampling design such as selectivestudy enrollment.Post-stratiﬁcation (PS) is a technique in survey analysis that uses auxiliary informa-tion on the ﬁnite population to adjust for known or expected discrepancies between thesample and the population. A classical weighting estimator for ﬁnite population inferenceis the Horvitz-Thompson (HT) estimator (Horvitz and Thompson 1952). Although theHT estimator is design unbiased, it can potentially be very ineﬃcient. In contrast, amodel-based (MB) approach speciﬁes a model for the study outcome, usually a regres-sion model, which is then used to make predictions for the population, and hence, ﬁnitepopulation quantities. Predictions are calculated by plugging in the auxiliary variablesfor all units in the population in the working model. Although, model-based inferencesgenerally will outperform design-based approaches, valid inference requires correct model3peciﬁcation. This can be diﬃcult when there is a large set of regressors, the relationshipis non-linear and/or includes interaction terms, and there are multiple observation times.Several approaches for poststratiﬁcation adjustment in a cross-sectional setting haveshown improved performance compared to the HT and the MB regression estimator.Multilevel regression combined with poststratiﬁcation (MRP; e.g. Gelman and Little1997, Park, Gelman, and Bafumi 2004) have shown improved performance comparedto standard weighting and MB regression estimators, and was used for producing ac-curate population estimates from a nonrepresentative sample of Xbox computer gamesusers (Wang et al. 2015). The general regression estimator (GREG; Deville and S¨arndal1992) is a dual-modelling strategy that combines prediction and weighting. The approachrequire both a model for the outcome and the participation mechanism, and is doublerobust in the sense that it remains consistent if either one of the models are correctly spec-iﬁed. Bisbee (2019) combined a semi-parametric machine learning approach (BayesianAdditive Regression Trees (BART); Chipman, George, McCulloch, et al. 2010) with post-stratiﬁcation for predicting opinions using cross-sectional data. Kern et al. (2016) showedin a simulation study that BART and inverse probability weighting using random forestsperformed better than approximately doubly robust estimators for estimating the targetpopulation average treatment eﬀect. Although showing improved performance comparedto MRP and GREG, previous semi-parametric approaches for population inference are not valid for longitudinal data with dropout and deaths .In this study we propose an approach for estimating the ﬁnite population mean ofa longitudinal continuous outcome conditioning on being alive at a speciﬁc time point,i.e. the population partly conditional mean (PPCM). Speciﬁcally, we develop a ﬂexibleBayesian semi-parametric predictive estimator, when longitudinal auxiliary informationis known for all units in the target population. The approach is to specify observed data4odels using BART and then to use assumptions with embedded sensitivity parametersto identify and estimate the PPCM. We evaluate sensitivity of the results to untestableassumptions on MNAR dropout and practice eﬀects, and further compare our approachto other methods used for population inference in a simulation study.We are motivated by the Betula study, a prospective cohort study on memory, healthand aging. The aim of the current paper is to extend previous results on cognitivelifespan trajectories (e.g. R¨onnlund et al. 2005, Gorbach et al. 2017) by consideringpopulation partly conditional inference with MNAR missingness and practice eﬀects. Byusing longitudinal micro-data from Statistics Sweden and the National Board of Healthand Welfare, for both the sample and the target population, we are able to adjust forpotential discrepancies in auxiliary variables and thereby improve generalizability of studyﬁndings.The remainder of the paper is organized as follows. In Section 2, we present a moti-vating example. In Section 3 we present a MB approach for estimating the PPCM usinglongitudinal data with dropout and deaths and in Section 4 we describe a Bayesian semi-parametric modeling approach. In Section 5 we provide results from a simulation studyand in Section 6 results from the empirical example using the Betula data. Conclusionsare given in Section 7.

The aim of the empirical study is to estimate lifespan trajectories in episodic memory,with the goal to generalize ﬁndings to a target population. Two separate sources ofdata are available for this study; a longitudinal cohort study and a longitudinal databasecovering the target population.The Betula study is a population-based cohort study with the objective to study how5emory functions change over time and to identify risk factors for dementia (Nilsson et al.1997). The participants were randomly recruited, stratiﬁed by age, from the populationregistry in the Ume˚a municipality of Sweden. We consider longitudinal data from the ﬁrstsample (S1) and four waves of data collection (T1-T4). There was ﬁve years in betweeneach wave and the ﬁrst wave of data collection was initiated in 1988-1990. A totalof n=1000 participants were included, 100 participants from each of the 10 age-cohorts:35 , , . . . ,

80. In order to obtain a total of 100 subjects in each of the 10 diﬀerent cohorts,1,976 persons had to be contacted. Of the 976 that never entered the study, 259 couldnot be reached, 130 had a illness to the extent that they could not participate, and 481declined to participate (Nilsson et al., 1997). Memory was assessed at each wave usinga composite of ﬁve episodic memory tasks, range: 0 - 76, where a higher score indicatebetter memory (Josefsson et al. 2012).The second data source is the Linneaus Database (Malmberg, Nilsson, and Weinehall2010), a longitudinal database covering every Swedish resident. The database includesannual data from Statistics Sweden and the National Board of Health and Welfare (similarinformation as for the Betula sample). In this study we consider micro data for every unitof the population in the Ume˚a municipality who were alive and non-demented in 1990 asthe target population ( N = 9203). Although longitudinal data is available annually werestrict data to the years: 1990, 1995, 2000, and 2005, approximately corresponding tothe years of testing in the Betula study.A set of continuous and categorical auxiliary variables, linked to both selective studyenrolment and memory, are included in both data sources. From the cause of deathregister we know death year for each deceased individual. In the Betula sample 29.1%died during the study period and 20.0% of the target population, and there were 128 and806 dementia cases (12.8% and 8.8%) in the sample and population respectively.6 A model-based approach for population inference

Consider a ﬁnite population U = { , , . . . , N } . For each individual i ∈ U , a [1 × K ]vector of auxiliary information x i is observed. A probability sample c of size n is drawnfrom U . For individual i , denote the participation probability Pr { i ∈ c | x i } by π i . Let y i be the continuous study variable which is observed if i ∈ c . Suppose interest is instudying the ﬁnite population mean, µ U = N − (cid:80) i ∈ U y i . If we assume the participationmechanism is ignorable in the sense that y i and i ∈ c are conditionally independentgiven x i (Rubin 1976), the joint distribution of the outcome, participation mechanismand auxiliary information can be factored into three conditional distributions (cid:81) i ∈ U p ( y i | x i ) p ( i ∈ c | x i ) p ( x i ).A model-based (MB) prediction estimator incorporate the relationship between x and y into the estimation of the population mean by introducing a model and considers theﬁnite population values { y i ; i ∈ U } to be realizations of the model. Conditional on theauxiliary variables x i , let y i = m ( x i ) + ε i , where ε i are uncorrelated and E ( ε i ) = 0 and V ar ( ε i ) = σ . The predictions for each individual in the population, ˆ y i , are calculatedby plugging in their auxiliary information x i for all i ∈ U in the working model m ( x i ). AMB estimator of the population mean is obtained by ˆ µ MB = N (cid:80) i ∈ U ˆ m ( x i ). We considera setting with two separate data sources. Thus, we can not separate out the sampleparticipants in the data covering the target population. Predictions must therefore bemade for all participants in the target population. We now consider the problem of ﬁnite population inference in the context of longitudinaldata with dropout and death. Death must be treated diﬀerently than non-response7ince post-death outcomes are truncated (and do not exist). Here we consider partlyconditional inference and, as such, are interested in estimating the ﬁnite populationmean given survival up to that time point, that is, the population partly conditionalmean (PPCM).First we need some additional notation. For individual i at time point t = 0 , , . . . T ,denote the outcome variable and the vector of auxiliary information by y it and x it re-spectively. The history of the time-varying variables are denoted with an overbar. Forexample, the outcome history for individual i up to and including time point t is denotedby ¯ y it = { y i , y i , . . . , y it } . Let s it denote survival, where s it = 1 if an individual is aliveat time t and 0 otherwise. Let r it be a response indicator, where r it = 1 if individual i participates in the study at time t and 0 otherwise. We have monotone missingness, so if r it = 0, r ik = 0 for k > t , and of course similarly for s it . Note that, s it is observed for all i ∈ U and r it is observed if i ∈ c . Furthermore, let T ri denote the number of time pointsa participant has completed the testing, observed for all i ∈ c . Similarly, T si denotes thenumber of time points an individual is alive during the studied period, observed for all i ∈ U . Note that, the number of individuals in the population decreases over time dueto deaths.Initially we further assume the non-response mechanism to be missing at randomconditional on being alive at time t (MARS). The working model for making predictionsunder MARS becomesˆ y it = (cid:90) ¯ y t − ˆ m t (¯ y it − , ¯ s it = 1 , ¯ x it ) × t (cid:89) k =0 p ( y ik − | y ik − , ¯ s ik − = 1 , ¯ x ik − ) d ¯ y it − , (1)where ˆ m t (¯ y it − , ¯ s it = 1 , ¯ x it ) is the estimated model for predicting y it from ¯ y it − and ¯ x it ,for all i ∈ U given ¯ s it = 1, and is obtained by integrating over the outcome history ¯ y t − .8he MB estimator of the PPCM at time t is given by PPCM MBt = (cid:80) i ∈ U s it (cid:80) i ∈ U ˆ y it s it .Note that, in context of the Betula study interest is not the PPCM t at a speciﬁc testwave, but rather the age speciﬁc PPCM aggregated over test waves. That is,PPCM MB ( age ) = (cid:80) i ∈ U aget s it (cid:80) Tt =1 (cid:80) i ∈ U aget s it T (cid:88) t =1 PPCM

MBt ( age ) , where i ∈ U age t is the subset of individuals in age-cohort age at test wave t , andPPCM MBt ( age ) = (cid:80) i ∈ Uaget s it (cid:80) i ∈ U aget ˆ y it s it . We introduce a set of sensitivity parameters to assess the impact of violations to theMARS assumption for the missingness mechanism. We additionally introduce a sensitiv-ity parameter that account for practice eﬀects, i.e., improved or maintained performanceat follow-up testing due to familiarity with test content or context. The general strategyis to model the observed data distribution, and to use priors on the sensitivity parametersto identify the full-data model.Previous studies of the Betula data suggests that individuals who drop out have lowerperformance and steeper decline in memory (Josefsson et al. 2012). Thus, we expectdropout to be missing not at random conditioning on survival (MNARS). To allow fordeviations from the MARS assumption we introduce a parameter γ it to identify theexpected outcome for dropouts among survivors. That is, for all t > j < t , E ( Y it | ¯ y it − , ¯ r ij − , r it = 0 , ¯ s it = 1 , ¯ x it ) = E ( Y it − γ it | ¯ y it − , ¯ r it = 1 , ¯ s it = 1 , ¯ x it ). If γ it = 0this implies a MARS assumption for the outcome, and if γ it > y ∗ it denote the observed memoryscore (in contrast to y it which denotes an individual’s actual memory function) and δ it denote a sensitivity parameter. Then, for t > E ( Y it | ¯ y it − , ¯ r it = 1 , ¯ s it = 1 , ¯ x it ) = E ( Y ∗ it − δ it | ¯ y it − , ¯ r it = 1 , ¯ s it = 1 , ¯ x it ), where δ it > γ i , . . . , γ iT , and δ i , . . . , δ iT . We spec-ify triangular distributed priors conditioning on auxiliary variables, γ it ∼ Tri( A t (¯ x it ) , , A t (¯ x it ))and δ it ∼ Tri(0 , B t (¯ x it ) , B t (¯ x it )), where the three parameters of the Triangular distribu-tion are the minimum, the mode and the maximum. We restrict the parameters to aplausible range of values, reﬂecting the analysts’ beliefs. In the Analysis of the Betuladata we specify values of A t (¯ x it ) and B t (¯ x it ) in context of the study.With the sensitivity parameters, identiﬁcation of the PPCM based on the workingmodel in (1), usesˆ y it = (cid:90) ¯ y ∗ t − (cid:88) ¯ r t (cid:110)(cid:2) m t (¯ y ∗ it − , ¯ r it , ¯ s it = 1 , ¯ x it ) − γ it I ( r it =0) − δ it (cid:3) × t (cid:89) k =0 p ( y ∗ ik − − γ ik I ( r ∗ ik =0) ) | y ∗ ik − , ¯ r ik − , ¯ s ik − = 1 , ¯ x ik − ) × p ( r ik | ¯ y ∗ ik − , ¯ r ik − , ¯ x ik − , ¯ s ik = 1) (cid:111) d ¯ y ∗ ik − . (2) We propose a Bayesian semi-parametric modelling approach based on Bayesian Additive10egression Trees (BART; Chipman, George, McCulloch, et al. 2010) for the workingmodel in (2) using the observed data and the sensitivity parameters. BART does notrely on strong modeling assumptions for the mean, and in contrast to other tree-basedalgorithms yields interval estimates for full posterior inference.

We specify BART models for the conditional distributions of the time varying variables y it and r it . The distribution of the continuous outcome, y it , conditioned on the historyof the joint process (¯ y it − , ¯ x it ) is speciﬁed as normal, y it ∼ N ( µ it (¯ y it − , ¯ x it ) , σ t ), for thesubset that satisﬁes ¯ r it = 1 and ¯ s it = 1. The mean function is given by the sum-of-trees,such that, µ it (¯ y it − , ¯ x it ) = (cid:80) K yit k =1 g y it (cid:0) (¯ y it − , ¯ x it ); T ky it , M ky it (cid:1) . The model consists of K y it distinct binary regression trees denoted by T ky it . Each tree consists of a set of interior nodedecision rules leading down to b ky it terminal nodes. For a given T ky it , M ky it = ( ρ k, y it , . . . , ρ k,b k y it )are the associated terminal node parameters.The BART models for the binary response indicators r it are speciﬁed as probit models, π it (¯ y it − , ¯ x it ) = Φ (cid:16)(cid:80) K rit k =1 g r it (cid:0) (¯ y it − , ¯ x it ); T kr it , M kr it (cid:1)(cid:17) , where Φ denotes the cumulativedensity function of the standard normal distribution and π it (¯ y it − , ¯ x it ) is the probabilityof being observed at wave t given (¯ y it − , ¯ x it ) for the subset that satisﬁes ¯ r it − = 1 and¯ s it = 1. Note that, r i = 1 and s i = 1 for all individuals, and that π it = 0 if r it − = 0. The estimator of the PPCM as described in Section 3 can be computed using the algorithmin Table 1. In practice, draws from the posterior distribution of the BART models aregenerated using Markov chain Monte Carlo (MCMC). The parameters of the sequentialconditional distributions for y it and r it are assumed independent and thus their posteriors11an be sampled simultaneously. We use the sparse Dirichlet splitting rule prior for BART(DART; Linero 2018) to encourage parsimony, implemented in the R package BART for continuous and binary responses. To simplify notation we denote the conditionalprobability π it (¯ y it − , ¯ x it ) by π it and the conditional expectation for the outcome at time t , µ it (¯ y it − , ¯ x it ) by µ it . In this simulation study we compare ﬁve estimators for population inference. The sampleand population size, the response rate, and the strength of association between ˆ y it and y it , are chosen to mirror the Betula study. We consider a ﬁnite population of size N = 10 , n = 1 , ,

000 simulated datasets were generated. The auxiliary variables are generated indepen-dently as follows, x , x ∼ Bernoulli (0 .

5) and x , x , . . . , x ∼ U nif orm ( − , x − x are uncorrelated with the outcome. We consider two time points ( t = 1 ,

2) andnon-response in the form of dropout (i.e. no deaths). The response rate was set to approx-imately 75%. Here, interest is in estimating the population mean, µ U = N − (cid:80) i ∈ U y ti ,for a continuous outcome variable at t = 2. The approaches are compared in terms ofbias, standard deviation (SD), mean squared error (MSE), and coverage of 95% credibleintervals. We consider 4 true outcome models:1. Generalized linear additive models for the sample selection, response mechanismand outcomes . Given the auxiliary variables x − x , the outcome values for i ∈ U were generated as y i = − − x i + x i + x i + x i + ε i and y i = − − x i + x i + x i + x i − . y i + ε i , where ε i ∼ N (0 , π ci ) = − . − . x i + 0 . x i + 0 . x i + 0 . x i . In eachselected sample, for i ∈ c non-response to the study variable y was generated fromthe following model logit( π ri ) = − . . x i + 1 . x i + 1 . x i + 1 . x i − . y i .2. Interaction and nonlinear dependencies for the response mechanism . y i , y i and π ci were generated as for Scenario 1. The response mechanism was generated from thefollowing model logit( π ri ) = − . − x i + x i + x i + x i + y i + x i x i + x i x i + y i x i .3. Interactions, nonlinear dependencies and skew normal error terms . y i , π ri and π ci were generated as for Scenario 2. However, y i was generated according to y i = − . − . x i + 0 . x i + 0 . x i + 0 . x i + 0 . x i + 0 . x i + 0 . y i − . x i y i + ε i .The error terms for the outcomes were generated from the skew normal distribution,such that ε it ∼ SN ( − . ∗ √ ∗ (cid:113) π , . ,

5) for t=1,2, i.e. a right skewed variablewith 0 mean, a variance of 1 and a skewness of 1.3.4.

Interactions, nonlinear dependencies and practice eﬀects . y i , π ri and π ci were gen-erated as for Scenario 3. For the outcome y i we added a moderate practice eﬀectof 0.1 to Scenario 3, such that y i = − .

87 + 0 . . x i + 0 . x i − . x i + 0 . x i +0 . x i + 0 . x i + 0 . y i − . x i y i + ε i . We compare ﬁve estimators for population inference. Our semi-parametric model basedapproach (MB-sp), was implemented as described in Section 4 and Table 1, but ﬁxing thesensitivity parameters to 0. However, for the fourth scenario we used the PE sensitivityparameter and speciﬁed a triangular prior reﬂecting practice eﬀects, δ i ∼ Tri(0 , b, b ) with b = 0 . , . , .

15. Note that for the other approaches the sensitivity parameters were setto 0 for all scenarios. Details of the other four estimators (MB-lm, HT, MRP, and GREG)13or longitudinal data are given in Appendix A of the Supplementary materials. Brieﬂy,MB-lm, is a parametric version of the MB-sp, specifying the working models as Bayesianadditive linear regression models instead of using DART. Default uninformative priorswere used. HT is an extension of the classic Horvitz-Thompson weighting estimator,where the inclusion weights were replaced by longitudinal probability of participationweights . The general regression estimator (GREG), combines prediction and longitudinalweighting. For GREG, the weights were computed as for HT and the prediction modelswere estimated using additive linear regression models. MRP is an extension of multilevelregression and poststratiﬁcation (Gelman and Little 1997), where the binary covariatesand the outcome at t = 1 were added to the model as ﬁxed eﬀects and the six continuousvariables were ﬁrst categorized into quartiles and added to the model as random eﬀects.In the empirical study using the Betula data, two separate datasets are used for ﬁnitepopulation inference and the inclusion probability is not known. Thus, the inclusionweights used in HT and GREG must be estimated using cell weight adjustment. Cellweight adjustment classiﬁes the sample and population into distinct post-stratiﬁcationcells based on the auxiliary variables recorded for both groups. Note that, continuousvariables have to be dichotomized, and further, with a large set of auxiliary variables,the cell sample sizes can be small. This may result in biased and unstable estimates.To overcome the latter, weight trimming and cell collapsing is recommended. In here,only x − x were considered for computing the cell adjustment weights, hence, theuncorrelated variables x − x were omitted. The continuous variables x and x wereﬁrst categorized into tertiles. Every unique combination of the (categorized) auxiliaryvariables constituted an adjustment cell. Sparse cells, n j <

20, were identiﬁed andcombined with their nearest, non-sparse neighbor, i.e. the cell that has the most similarcombination of auxiliary variables. Weights larger than 30 were trimmed.14he HT and GREG estimators and their 95% conﬁdence intervals were estimatedusing the mase package in R, the working models for MB-lm and MRP were ﬁtted usingthe

MCMCpack package and

MCMCglmm package in R (R Core Team 2018). MCMCconvergence and mixing was monitored using trace plots.

We present the bias, empirical standard deviation (SD; calculated as the standard devia-tion of the parameter estimates), mean-squared error (MSE), and coverage probabilitiesfor the 95% credible intervals (CP) from the 1,000 simulations in Table 2. The resultsshow that inferences using the sample are highly biased, have the highest MSEs and thelowest coverage probabilities under all four scenarios ( ≤ . .

9% - 42 .

0% and Scenario 4: 0 .

1% - 2 . . . .

05, the bias was 0.087 (SD=0.048). If the upper bound was set to the truePE, 0 .

1, the bias was further reduced to 0.045 (SD=0.043). Finally, if the upper bound15as set to 0 .

15 (with a mean equal to the true PE) the bias was 0.019 (SD = 0.047)and the CP was 90 . .

0% and85 .

2% for the upper bounds of 0 .

05 and 0 . We applied the proposed semi-parametric approach for estimating the PPCM( age ) tothe Betula data. Here, interest is in estimating the average memory performance acrossthe adult lifespan among non-demented individuals given survival up to a speciﬁc age.We consider population inference in the context of longitudinal cohort data in the pres-ence of practice eﬀects, non-ignorable dropout and death. Longitudinal auxiliary datafor both the sample and target population is available for four waves of data collection.These include the baseline variables age, sex, having children (Y/N), and highest level ofeducation, and additionally, several income variables, beneﬁts received from the govern-ment, and marital status were treated as time-varying. Details of baseline characteristicsare found in Table 3. Details of the International Classiﬁcation of Diseases (ICD) codesconsidered are given in Appendix B of the Supplementary materials.

The aim of a sensitivity analysis is to evaluate the degree to which inferences are inﬂuencedby departures from default assumptions, in our study MARS and no practice eﬀects. Weexplore sensitivity to these assumptions by specifying informative priors for the sensitivityparameters. Since there is generally little information about the distributional form forthe SPs, the prior distributions is often chosen by the analyst reﬂecting prior beliefs aboutthe departures from the default assumptions.16e assume triangular priors for δ it , reﬂecting the improvement in memory perfor-mance that occur at repeated testings, i.e. practice eﬀects (PEs). The priors were speci-ﬁed as δ i ∼ Tri(0 , U δ i , U δ i ) and δ i , δ i ∼ Tri(0 , U δ i , U δ i ), where U δ i = 4 . − . ∗ age i +5 . ∗ − ∗ age i and U δ i = 11 . − . ∗ age i + 1 . ∗ − ∗ age i . The upper bounds (andmodes), were derived from comparing the parameter estimates from regressing memoryon age using participants from two diﬀerent samples (S1 and S3) from the Betula studywith unequal testing experience. For t=1, this involved regressing memory at the ﬁrst testwave for S3 and the second test wave for S1, on age and age squared, for the subsamplesof participants who participate in at least two consecutive test waves. And further com-pute the age-speciﬁc diﬀerences between those with one versus no previous testing, i.e.,diﬀerences in memory performance between participants with previous testing experienceversus those taking the test for the ﬁrst time. A similar procedure was implemented fort=2 using data from the third respectively the second test wave of data collection in theBetula study. Note that, practice eﬀects were larger for younger participants and at thethird test wave compared to the second.We further assume γ it = γ i , and specify a triangular prior for γ i , reﬂecting a decline inmemory after dropping out of the study. The prior was speciﬁed as γ i ∼ Tri( L γ i , , L γ i ).The lower bound (and mode), L γ i = − . − . ∗ age i + 3 . ∗ − ∗ age i , was estimatedfrom the Betula data comparing change in memory performance between the ﬁrst andsecond wave for responders while adjusting for age and age squared. Note that, this implythat older participants decline more quickly than younger participants after drop out. We estimated the PPCM using our proposed Bayesian semi-parametric approach withthe sensitivity parameters. To reduce computation time of Step 3 of the Algorithm n=1417horter chains were run in parallel and then combined appropriately instead of a set withone long chain. For each chain the ﬁrst 1000 iterations were discarded as burn-in, and atotal of 1000 posterior samples of the PPCM were obtained. Convergence of the posteriorsamples was monitored using trace plots.In the main analysis the priors for the SPs were speciﬁed as described in the previoussection. For the sensitivity analysis we compare the results for diﬀerent values of theSPs. Speciﬁcally, the SPs were set to i) γ i = 0 while δ it was speciﬁed as in the mainanalysis (MARS and PE adjustment), ii) δ it = 0 for t = 2 , ,

4, while γ i was speciﬁedas in the main analysis (MNARS and no PE adjustment), iii) the SPs were speciﬁed astwice as large as in the main analysis (2 × δ it for t = 2 , ,

4, and similarly, 2 × γ i ), and iv) γ i = 0 and δ it = 0 for t = 2 , , γ i = 0), and noPE adjustment ( δ it = 0 for all t). In this scenario only baseline auxiliary variables wereconsidered since x it is missing when s it = 0.The results from the main analysis and the various sensitivity analyses are presented inFigure 1. The corresponding 95% credible intervals are also plotted for the main analysis,the immortal-cohort analysis and for the analysis assuming MARS and no PE adjustment(Figure 1a). For the main analysis, the age-speciﬁc PPCM revealed an initial decline inmemory performance between the ages 35 to 65, and accelerated decline after the age of65. Assuming an immortal cohort, no PEs and MAR non-response (both for dropout anddeath) resulted in a signiﬁcantly higher estimated memory performance across adulthood,with slightly greater decline at younger ages and less decline in memory after the age of 65.An analysis assuming MARS missingness and not adjusting for practice eﬀects resulted ina signiﬁcantly higher estimated memory performance between the ages 40 −

90 compared18o the main analysis.It is apparent in Figure 1b and 1c that adjusting for both PEs and MNARS dropoutresulted in lower estimated memory performance across adulthood, although the discrep-ancy varied in magnitude with respect to age. When the SPs were speciﬁed as twice aslarge as in the main analysis (1c), we see an increased reduction in memory performance,this is more pronounced at older ages. As expected, comparing a PE adjusted analysis toan analysis without PE adjustment (assuming MNARS dropout) revealed lower memoryperformance for the PE adjusted analysis, although the discrepancy was more pronouncedat younger ages. Comparing results from an analysis assuming MNARS and MARS whileadjusting for PEs, also revealed lower memory performance, and the diﬀerence was morepronounced at older ages.We compare the results for the Betula data using our approach (MB-sp) with the fourother estimators for population inference, MB-lm, HT, MRP, and GREG. The sensitivityparameters, γ i , δ i , δ i and δ i , were for simplicity all set to 0 when estimating the PPCM,i.e assuming MARS missingness and no PEs.Details of the estimation procedures for MB-lm, HT, MRP, and GREG are describedin Section 5.2 and in Appendix A of the Supplementary materials. However, some adjust-ments were made. For MRP, the continuous variables were ﬁrst categorized into quartilesplus an additional category if the variable was 0 (added to the model as random eﬀects).Due to excessive number of cells with a small or zero sample size we only used a subset ofthe baseline auxiliary variables for computing the cell adjustment weights π i in HT andGREG. These were age, sex, level of education, widowhood, and history of cardiovasculardisease.Results are found in Figure 2. Compared to our semi-parametric approach (MB-sp), the HT estimator overestimated memory performance across adulthood, and the19hree other parametric approaches revealed accelerated decline in memory performancefor individuals 70 - 95 years old. The large discrepancy in memory performance for HT,and for GREG at older ages, is likely a result from using only a subset of the auxiliaryvariables, dichotomizing continuous variables and collapsing zero or spars cells, since thismay introduce bias in the weights. Given the ﬁndings of the simulation study, we areinclined to believe the discrepancies for the parametric approaches (MB-lm and MRP)from our semi-parametric approach are likely due to model-misspeciﬁcation. Note that,MRP gives results most similar to our semi-parametric approach. This article proposes a ﬂexible Bayesian semi-parametric predictive estimator for esti-mating the population partly conditional mean, when a large set of longitudinal aux-iliary variables is known for all units in the target population. A key feature of theproposed approach is the ﬂexible modeling approach that eﬀectively addresses nonlinear-ity and complex interactions. Additionally, BART (using the sparse Dirichlet splittingrule prior) demonstrated excellent predictive performance when irrelevant regressors wereadded, diminishing the need to carry out formal variable or model selection.Our study is motivated by the fact that it is becoming increasingly diﬃcult to recruitstudy participants, which may severely distort the ability to generalize study ﬁndings.The increased availability of microdata covering the population in many countries how-ever, makes post-sampling adjustments an attractive tool. Although weighting is the mostpopular technique, a large set of auxiliary variables makes cell weight adjustment diﬃ-cult to implement. In this setting model based approaches are more attractive, but putstronger requirements on correct model speciﬁcation. As expected, the results of the sim-ulation study showed that the weighting approach (HT) performed poorly across a wide20ange of scenarios. In contrast, the model based approaches and GREG all performed wellunder correct speciﬁcation of the outcome model, although, our semi-parametric methodwas the only approach who gave unbiased results for the more realistic scenario withunknown nonlinearity and interactions. Furthermore, under the scenario with practiceeﬀects our approach performed relatively well compared to the other approaches, showingthe importance of adjusting for practice eﬀects.The goal of the empirical study was to estimate lifespan trajectories in memory for atarget population. The results revealed an initial decline in memory performance betweenthe ages 35 to 65, in contrast to previous studies showing stable performance up to the ageof 60 (R¨onnlund et al. 2005, Gorbach et al. 2017). Furthermore, the standard approachfor estimation in previous literature that assumes an immortal cohort, no PEs and MARnon-response revealed signiﬁcantly higher memory performance across the adult lifespancompared to our approach. This suggests that in previous studies the magnitude ofmemory performance across adulthood is likely overestimated while the rate of changeis likely underestimated, especially at older ages. This is due to both selective studyenrollment and attrition.Our approach allows for Bayesian inference under MNAR missingness and truncationby death, as well as the ability to characterize uncertainty about practice eﬀects. This wasaccomplished by introducing sensitivity parameters (SPs) that incorporated prior beliefs.A strength of the current approach is that inference with SPs and a mortal cohort isrelatively easy to implement and communicate to non-statisticians. However, specifyingan appropriate prior distribution can sometimes be diﬃcult; a alternative approach couldbe a tipping point analysis (Yan, Lee, and Li 2009). In a tipping point analysis subjectmatter experts can discuss whether the tipping point for the SPs are plausible, whichmay aid in making judgment based on study ﬁndings.21 upplementary online material

In Appendix A of the Supplementary materials we provide details for the alternativeestimators for the PPCM. Classiﬁcation of Diseases (ICD) codes considered are given inAppendix B. R code for the main analysis of the Betula data and the simulation studyusing our approach also accompany the paper as supplementary material.

Acknowledgments

This work was supported by The Swedish Foundation for Humanities and Social Sci-ences [P17-0196:1 to M.J.], and National Institutes of Health [R01 CA 183854 to M.J.D.,R01 GM 112327 to M.J.D.]. We would like to thank Prof. Rolf Adolfsson and projectcoordinator Annelie Nordin Adolfsson, Department of Clinical Sciences at Ume˚a Uni-versity, for providing data from the Betula project for analyses. The Betula Project issupported by Knut and Alice Wallenberg foundation and the Swedish Research Council(K2010-61X-21446-01).

References

Bisbee, J. (2019). “BARP: Improving Mister P Using Bayesian Additive RegressionTrees”. In:

American Political Science Review

The Annals of Applied Statistics

Missing data in longitudinal studies: Strategiesfor Bayesian modeling and sensitivity analysis . Boca Raton: Chapman and Hall/CRC.Deville, J.-C. and C.-E. S¨arndal (1992). “Calibration estimators in survey sampling”. In:

Journal of the American statistical Association

Biometrics

Neurobiology of aging

51, pp. 167–176.Horvitz, D. G. and D. J. Thompson (1952). “A generalization of sampling without re-placement from a ﬁnite universe”. In:

Journal of the American statistical Association

Journal of the Royal Statistical Society: Series C (AppliedStatistics)

Journalof the American Geriatrics Society

Journal ofresearch on educational eﬀectiveness tatistical science: a review journal of the Institute of Mathematical Statistics

Journal of theRoyal Statistical Society: Series C (Applied Statistics)

Journal of the American Statistical Association

Scandinavian journalof public health

BMC medical researchmethodology

Aging,Neuropsychology, and Cognition

Political Analysis

R: A Language and Environment for Statistical Computing . R Foun-dation for Statistical Computing. Vienna, Austria. url : . 24izopoulos, D. (2012). Joint models for longitudinal and time-to-event data: With appli-cations in R . Chapman and Hall/CRC.R¨onnlund, M., L. Nyberg, L. B¨ackman, and L.-G. Nilsson (2005). “Stability, growth,and decline in adult life span development of declarative memory: cross-sectional andlongitudinal data from a population-based study.” In:

Psychology and aging

Biometrika

Neuropsychology

Statistics in medicine

International Journal of Forecasting

Biometrics

Biostatistics

Alzheimer’s & Dementia

Journal of Biopharmaceutical Statistics ables and Figure

Table 1: Algorithm for estimation of the PPCM as described in Sections 3 and 4.1.

Models for the outcomes and response mechanisms :For t = 1 , . . . , T , sample from the observed data posteriors for the parameters of theconditional distribution of y t and r t using DART.2. Sensitivity parameters :For all i ∈ U and t = 1 , . . . , T , sample one set from the prior distributions for γ it and δ it .3. Compute predicted means :For all i ∈ U and t = 1 , . . . , T , compute ˆ y it in [2] as follows:Sequentially (in t ) sample response and outcome data from r ∗ it ∼ Ber (cid:0) π it (cid:1) and y ∗ it − = µ it − + γ t − I (ˆ r it − =0) + ε ∗ it . The expectation µ it − conditions on theauxiliary information ¯ x it − , the previously sampled outcome data ¯ y ∗ it − andresponse data ¯ r ∗ it − given the current posterior sample. Similarly, the conditio-nal probability π it conditions on ¯ x it , ¯ y ∗ it − , ¯ r ∗ it − . For all ¯ s it = 1, computeˆ y it = m t (¯ y ∗ it − , ¯ r ∗ it , ¯ s it = 1 , ¯ x it ) − γ it I ( r ∗ it =0) − δ it .4. Compute the PPCM :Compute one posterior sample of PPCM = (cid:80) Tt =1 (cid:80) i ∈ U s it (cid:80) Tt =1 (cid:80) i ∈ U ˆ y it s it .5. Repeat step 2 - 4 for each posterior sample.26 a b l e : R e s u l t s f r o m , s i m u l a t i o n s w i t h a ﬁn i t e p o pu l a t i o n o f s i ze , nd a s a m p l e s i ze o f , . B i a s , e m p i r i c a l s t a nd a r dd e v i a t i o n ( S D ; c a l c u l a t e d a s t h e s t a nd a r dd e v i a t i o n o f t h e p a r a m e t e r e s t i m a t e s ) , m e a n - s q u a r e d e rr o r( M S E ) , a nd c o v e r ag e p r o b a b ili t i e s i n % f o rt h e % c r e d i b l e i n t e r v a l s ( C P ) . B i a s a ndS D a r e m u l t i p li e db y f o r e a s e o f p r e s e n t a t i o n . M e t h o dS ce n a r i o1 S ce n a r i o2 S ce n a r i o3 S ce n a r i o4 B i a s S D M S E C P B i a s S D M S E C P B i a s S D M S E C P B i a s S D M S E C P S a m p l e . . . . . . . . . . . . . . . . M B - s p . . . . . . . . . . . . . . . . M B - l m . . . . - . . . . . . . . . . . . M R P . . . . . . . . . . . . . . . . G R E G . . . . - . . . . . . . . . . . . H T . . . . . . . . . . . . . . . . A bb r ev i a t i o n s : M B - s p : a s e m i - p a r a m e t r i c m o d e l b a s e d a pp r oa c hu s i n g D A R T ; M B - l m : a p a r a m e t r i c m o d e l b a s e d a pp r oa c hu s i n ga dd i t i v e li n e a rr e g r e ss i o n ; M R P : M u l t il e v e l r e g r e ss i o n a ndp o s t s t r a t i ﬁ c a t i o n ; G R E G : g e n e r a l r e g r e ss i o n e s t i m a t o r ; H T : t h e H o r v i t z - T h o m p s o n e s t i m a t o r . ≤ −

12 years of education 352 (35.4) 3288 (35.7) >

12 years of education 240 (24.2) 2516 (27.3)Education level unknown 100 (10.1) 894 (9.7)Married 628 (63.2) 5960 (64.8)Widow 149 (14.9) 715 (7.8)Disposable income* 1021.3 (412.8) 1058.6 (490.5)Earned income* 868.9 (936.2) 1045.2 (952.6)Retirement income* 275.1 (404.3) 165.4 (335.0)Early retirement income* 37.2 (172.6) 29.6 (151.1)Unemployment beneﬁts* 4.4 (48.3) 9.1 (64.5)Health beneﬁts* 68.7 (220.0) 86.7 (227.8)Diagnosed with diabetes bf 1990 14 (1.4) 185 (2.0)Diagnosed with CVD bf 1990 97 (9.7) 593 (6.4)

Abbreviations : bf: before; CVD: cardiovascular disease.* amounts in Swedish kronor rounded to the nearest thousand. ge M e m o r y ( % o f m a x i m u m )

35 45 55 65 75 85 95 a) PPCM; MNARS & PEPPCM; MARSSample; MAR & immortal cohortAge M e m o r y ( % o f m a x i m u m )

35 45 55 65 75 85 95 b) Age M e m o r y ( % o f m a x i m u m )

35 45 55 65 75 85 95 c) PPCM; MNARS & PE PPCM; MARS & PE PPCM; MNARS

Figure 1: Estimating lifespan trajectories of memory for the Betula data using our MB-spapproach. Panel a)

PPCM, MNARS & PE ; the main analysis,

PPCM, MARS ; the sen-sitivity parameters were all set to 0,

Sample, MAR & immortal cohort ; sample inferenceassuming MAR missingness for both dropouts and deaths and no practice eﬀects. Panelb) and c) shows a sensitivity analysis for k=1 and k=2 respectively, comparing the mainanalysis (

PPCM, MNARS & PE ) to an analysis where γ i = 0 and δ it were speciﬁed asfor the main analysis ( PPCM, MARS & PE ), and to an analysis where δ it = 0, while γ i was speciﬁed as as for the main analysis ( PPCM, MNARS ).29 ge M e m o r y ( % o f m a x i m u m )

35 45 55 65 75 85 95

MB−spMRPMB−lmGREGHT

Figure 2: Analysis of the Betula data. Results from estimating the population partly con-ditional mean using 1) MB-sp: our semi-parametric model-based approach using DART,2) MB-lm: a parametric model based approach using additive linear regression, 3) MRP:Multilevel regression and poststratiﬁcation, 4) GREG: general regression estimator, and,5) HT: the Horvitz-Thompson estimator. The sensitivity parameters γ i , δ i , δ i and δ i were all set to 0. 30 upplementary material - A Bayesian semi-parametricapproach for inference on the population partly conditionalmean from longitudinal data with dropout Josefsson, Maria , , , ∗ , Daniels, Michael J. , & Pudas, Sara , Department of Statistics, USBE, Ume˚a University, Sweden. Centre for Demographic & Ageing Research, Ume˚a University, Sweden. Ume˚a Center for Functional Brain Imaging, Ume˚a University, Sweden. Department of Statistics, University of Florida, USA. Department of Integrative Medical Biology, Ume˚a University, Sweden. ∗ Corresponding author, E-mail: [email protected]

Appendix A, Other strategies for estimating the PPCM

In the simulation study and the empirical data example we compare our proposed ap-proach to four other estimators for estimating the PPCM: MB-lm, MRP, HT and GREG.The ﬁrst estimator (MB-lm) is a parametric version of our semi-parametric approach,where the working models are speciﬁed as Bayesian additive linear regression modelsinstead of using DART.The second approach is multilevel regression and poststratiﬁcation (MRP). The MRPspeciﬁes a multilevel model for the outcome at time t , using random (or modeled) eﬀectsfor some of the predictors. That is, the working model is replaced by the followingmultilevel regression model y it = α + t − (cid:88) m =0 β m y im + t (cid:88) m =0 K (cid:88) k =1 b x mk + ε it , where α is the ﬁxed intercept, β , . . . , β t − are the ﬁxed eﬀects for the outcome history, and b x j xk , . . . , b x K j xk are the random eﬀects for the j x k th category of the auxiliary variable31 k , where k = 1 , . . . , K , and j x k = 1 , . . . , J x k . All random eﬀects are modeled usingindependent normal distributions, such that b x k ∼ N (0 , σ x k ), k = 1 , . . . , K , and, ε it ∼ N (0 , σ MRP t ). In the second step, the multilevel model is used for making predictionsˆ y it from ¯ y it − and ¯ x it , for all i ∈ U given ¯ s it = 1, and is (similarly to our approach)obtained by integrating over the outcome history ¯ y t − . The poststratiﬁcation estimatefor the population mean is given by PPCM MRPt = (cid:80) i ∈ U s it (cid:80) i ∈ U ˆ y it s it .The third approach is an extension of the classic Horvitz-Thompson weighting esti-mator (HT), ˆ µ HT = N (cid:80) i ∈ c y i π i . In a longitudinal setting with MAR missingness anddeath, the HT estimator is obtained by replacing π i with the probability of participationat time t given survival at that time point, here denoted by w it . The PPCM t is given by,PPCM HTt = 1 (cid:80) i ∈ U s it (cid:88) i ∈ c y it w it (cid:88) i ∈ c r it s it , where w it = π i t (cid:89) k =0 Pr( r ik = 1 | ¯ y ik − , ¯ r ik − , ¯ x ik , ¯ s ik = 1)for all i ∈ c . The response mechanism, the second term on the right hand side, canbe estimated from data using e.g. a logistic regression model. In the empirical studyusing the Betula data, two separate datasets are used for ﬁnite population inference andthe participation probability is not known. Then π i must be estimated from data usingcell weight adjustment. Cell weight adjustment classiﬁes the sample and population intodistinct post-stratiﬁcation cells based on the auxiliary variables recorded for both groups.The sampled participants in cell j are weighted by the inverse of the sampled rate in cell j . That is, for individual i ∈ j , ˆ π i = n j /N j , where n j is the number of individuals incell j in the sample and N j is the number of individuals in cell j in the population.32owever, diﬃculties may arise in this setting. For example, continuous variables have tobe dichotomized, and further, with a large set of auxiliary variables, the cell sample sizescan be small. This may result in biased and unstable estimates. To overcome the latter,weight trimming and cell collapsing is recommended.The fourth approach is a dual-modelling strategy that combines prediction and weight-ing. The general regression estimator (GREG) at time t is given by,PPCM GREGt = 1 (cid:80) i ∈ U s it (cid:34)(cid:88) i ∈ U ˆ y it + (cid:88) i ∈ c r it s it (cid:18) y it − ˆ y it ˆ w it (cid:19) (cid:88) i ∈ c r it s it (cid:35) . The approach require both a model for the outcome and the participation mechanism,and is double robust in the sense that it remains consistent if either one of the modelsare correctly speciﬁed.