A recipe for accurate estimation of lifespan brain trajectories, distinguishing longitudinal and cohort effects
AA Recipe for Accurate Estimation of Lifespan Brain Trajectories,Distinguishing Longitudinal and Cohort Effects
Øystein Sørensen a, ∗ , Kristine B. Walhovd a,b , Anders Fjell a,b a Center for Lifespan Changes in Brain and Cognition, Department of Psychology, University of Oslo, Norway b Department of Radiology and Nuclear Medicine, Oslo University Hospital, Norway ∗ Corresponding author: Øystein Sørensen, Department of Psychology, University of Oslo, Pb. 1094 Blindern, 0317 Oslo,Norway.
Email address: [email protected] (Øystein Sørensen) a r X i v : . [ s t a t . A P ] J u l bstract We address the problem of estimating how different parts of the brain develop and change throughout thelifespan, and how these trajectories are affected by genetic and environmental factors. Estimation of theselifespan trajectories is statistically challenging, since their shapes are typically highly nonlinear, and althoughtrue change can only be quantified by longitudinal examinations, as follow-up intervals in neuroimagingstudies typically cover less than 10 % of the lifespan, use of cross-sectional information is necessary. Linearmixed models (LMMs) and structural equation models (SEMs) commonly used in longitudinal analysisrely on assumptions which are typically not met with lifespan data, in particular when the data consistof observations combined from multiple studies. Generalized additive mixed models (GAMMs) offer anattractive alternative to LMMs and SEMs. In this paper, we propose various ways of formulating GAMMs foraccurate estimation of lifespan trajectories of 12 brain regions, using a large longitudinal dataset and realisticsimulation experiments. We show that GAMMs are able to accurately fit lifespan trajectories, distinguishlongitudinal and cross-sectional effects, and estimate effects of genetic and environmental exposures. Finally,we discuss and contrast questions related to lifespan research which strictly require longitudinal data andquestions which can be answered with purely cross-sectional data, and in the latter case, which simplifyingassumptions that need to be made. The examples are accompanied with R code, providing a tutorial forresearchers interested in using GAMMs.
Keywords: aging, cohort effects, generalized additive mixed models, lifespan brain research, longitudinalanalysis, MRI, R. 2 . IntroductionCohort effect : The effect of birth year (cohort) on the relationship between a set of explanatoryvariables and an outcome of interest.
Cross-sectional effect : The effect of age on an outcome of interest at a particular point in time, acrossparticipants with different birth dates.
Longitudinal effect : The effect of increasing age for participants belonging to a given birth cohort.
Linear mixed models (LMMs): Linear regression models used for data with hierarchical structure,e.g. longitudinal data with multiple measurements per individual.
Fixed effects : Regression parameters in mixed models which are common for all participants.
Random effects : Regression parameters in mixed models which are unique to each participant, usedto model correlation between repeated measurements.
Polynomial model : Linear regression model which includes the first n powers of an explanatoryvariable x as distinct variables. Quadratic model : A polynomial model containing the first two powers of an explanatory variable x ,on the form β + β x + β x . Cubic model : A polynomial model containing the first three powers of an explanatory variable x , onthe form β + β x + β x + β x . Generalized additive model (GAM): A linear regression model in which the outcome is modeled asan unknown smooth function of the explanatory variables.
Generalized additive mixed model (GAMM): An extension of GAMs to data with hierarchicalstructure, containing random effects.
Smoothing parameter : Parameter controlling the degree of nonlinearity (wiggliness) of a functionestimated by a GAM/GAMM.
Cubic regression splines : A set of cubic polynomials, each of which is defined over a small part ofthe x -axis and is zero elsewhere. Thin-plate regression splines : A set of functions, each of which represents a given nonlinear shapeover the full x -axis. Smooth function : A nonlinear function estimated by GAMs/GAMMs represented as a weighted sumof (e.g., cubic or thin-plate) regression splines.
Box 1: Key terms used in this paper, defined in the context of longitudinal data analysis.
Large datasets with cross-sectional and longitudinal structural magnetic resonance images (MRIs) ofparticipants whose ages span from early childhood to late adulthood provide ample opportunities to studylifespan brain trajectories. Important questions such data can contribute to answer include how brain struc-ture is related to aging, how the aging effect is modified by genetics and environmental exposures, and atwhich age critical events like maximum volume or maximum rate of change occur. Lifespan brain trajectoriesare nonlinear and differ between regions, as illustrated in Figure 1 for volumes of cerebral white matter,cortex, and hippocampus for 4,352 observations of 2,017 healthy participants from the Center for LifespanChanges in Brain and Cognition longitudinal studies (Fjell et al., 2017; Walhovd et al., 2016), henceforthreferred to as the LCBC data. Modeling the type of nonlinear effects shown in Figure 1 using structuralequation models (SEMs) (McArdle and Hamagami, 2001; Meredith and Tisak, 1990) or linear mixed models(LMMs) (Laird and Ware, 1982) with polynomials typically leads to poor fits at least over parts of the lifes-pan, and is highly dependent on manual selection of terms (Fjell et al., 2010). Generalized additive mixedmodels (GAMMs) (Lin and Zhang, 1999) offer an attractive alternative, typically yielding good fit over thefull lifespan in an automated and data-driven manner. This is illustrated in Figure 2, comparing a GAMMto LMMs with quadratic and cubic polynomials for the effect of age on cerebellum cortex volume. See Box1 for a definition of these and other key terms used in this paper. Similar to often researched structures like We will use the common abbreviation ”GAMM”, although strictly speaking only additive mixed models (AMMs) are usedin this paper. If necessary, all models described can be straightforwardly generalized, e.g. to logistic or Poisson regression. erebral White Matter Cortex Hippocampus0 25 50 75 0 25 50 75 0 25 50 75 4000 6000 800010000300000400000500000600000700000800000300000400000500000600000700000 Age (years) V o l u m e ( mm ) Birthyear
Figure 1:
Lifespan brain development is highly nonlinear . Cerebral white matter, cortex and hippocampal volumes from4,352 MRI scans of 2,017 participants in the LCBC data. The color scale indicates the birth cohort to which the participantbelongs. Dots represent observations and lines connecting the dots indicate repeated observations of the same individual.
Age (years) V o l u m e ( mm ) QuadraticCubicGAMM
Figure 2:
Comparison of LMMs and GAMMs for lifespan data . Comparison of LMMs with quadratic and cubic termsand a GAMM, fitted to lifespan cerebellum cortex volume. Black dots represent observations and black lines connecting thedots indicate repeated observations of the same individual. hippocampus and cerebral white matter, cerebellum cortex is characterized by a nonlinear age trajectory. Incontrast to the GAMM, neither of the LMMs capture the steep increase seen in early childhood, and the cubicLMM predicts an increase in cerebellum cortex volume in old age, whereas the GAMM adequately capturesthe decline seen in the data. In addition, both the quadratic and cubic model estimate cerebellum cortexvolume to reach its maximum at the age of around 25, while the GAMM instead estimates the maximum tooccur around 14 years of age, and the latter seems to be in better agreement with the data. Figures 1 and 2and all subsequent plots were created in R (R Core Team, 2019) with ggplot2 (Wickham, 2016).The goal of this paper is to provide clear recommendations for optimal estimation of lifespan trajectories ofbrain development and aging. To this end, several aspects need consideration. First, as has been emphasizedby a large number of authors, when analyzing data with repeated observations over time, care must betaken to distinguish within-individual and between-individual effects, which for the purpose of this paperare longitudinal and cross-sectional effects (Curran and Bauer, 2011; Hoffman, 2007; Hoffman and Stawski,2009; Morrell et al., 2009; Sliwinski et al., 2010). True change can only be measured by use of longitudinaldata, but how important are longitudinal data when the task is to estimate trajectories spanning many timesthe maximum follow-up interval realistically attainable in a neuroimaging study? Large datasets combinedfrom different studies, either conducted by the same group as for the LCBC data or by multiple groupsparticipating in a data-sharing consortium like Lifebrain (Walhovd et al., 2018), present further challengesfor longitudinal modeling as the number of measurements per participant and the time intervals betweenmeasurements are typically highly varying. All of these issues are illustrated for the LCBC data in Figure4 Age (years) C oho r t ( b i r t h y ea r) Date of initial measurement N u m be r o f s ub j e c t s Measurement interval (years) N u m be r o f s ub j e c t s Figure 3:
Characteristics of lifespan data . The plots show data from 4,352 MRI scans of 2,017 participants in the LCBCdata. Left: Scatter plot of age and cohort. Connected dots show repeated measurements of the same participant. Top right:Histogram of date of initial measurement for the same participants. Bottom right: Histogram of time (in years) betweenmeasurements. The peak at zero corresponds to participants scanned twice on the same day, with different scanners, and thehighest peak corresponds to participants with 10-11 weeks between measurements. o cohort effect Age−indep. cohort effect Age−cohort interaction25 50 75 25 50 75 25 50 750.10.20.30.40.50.20.30.40.50.60.70.250.500.75 Age (years) D ependen t v a r i ab l e Cohort
Figure 4:
Cohort effects . Illustration of the impact of cohort effects in a hypothetical dataset. Dashed lines show the cross-sectional age effect in 1990, colored dots show four cohorts of participants whose age in 1990 was 10, 30, 50, and 70 years,respectively, and the blue lines show longitudinal age effects for each cohort. In the left plot, there are no cohort effects, andhence longitudinal and cross-sectional effects coincide. In the center plot, the cohort effects are independent of age, and thelongitudinal effects differ by an offset but the effect of aging is identical across cohorts, as seen by the parallel blue lines. In theright plot, the cohort effects depend on age, and in this case also the slope of the longitudinal effect varies between cohorts.
3. While GAMMs flexibly handle data with varying follow-up intervals, the statistical literature on use ofGAMMs for longitudinal analysis has almost exclusively focused on cases in which each participant has beenfollowed over the full time range under consideration, from a common baseline (Brumback and Rice, 1998;Durb´an et al., 2005; Edwards et al., 2005; Gu and Ma, 2005; Ke and Wang, 2001; Lambert et al., 2001;Sullivan et al., 2015). There is hence a need for an understanding of how GAMMs should be optimally usedin lifespan brain research, with short follow-up intervals and varying dates of initial measurement as shownin Figure 3.The outline of this paper is as follows. In Section 2 we introduce GAMMs formally and define threedifferent candidate models for estimating lifespan brain trajectories. We also describe simulation experimentsconducted in order to compare these GAMMs in a realistic setting for estimating 12 different brain regions.In Section 3 the simulation results are presented, and next we show two example applications demonstratinghow GAMMs can be used for estimating lifespan brain trajectories and the effect of genetic variations on thetrajectories. Accompanying R code provides a tutorial for researchers interested in using GAMMs. In Section4 we discuss the results taking into regard currently available longitudinal studies. We contrast questionsthat strictly require longitudinal data to questions that under some simplifying assumptions may be answeredwith cross-sectional data. Finally, in Section 5 we conclude by presenting recommendations for how to useGAMMs in lifespan brain research.
2. Materials and Methods
The effect of age on an outcome in a population can be completely explained by longitudinal and cohorteffects, with the former representing the effect of aging for participants in a given birth cohort and the latterdetermining how the longitudinal effects differ between participants belonging to different birth cohorts(Diggle et al., 2002, Ch. 1.1). Cross-sectional effects are the effects of age across cohorts when considered ata particular point in time, as illustrated in Figure 4. In the absence of cohort effects the cross-sectional andlongitudinal effects are identical. Age-independent cohort effects result in different slopes for the longitudinaland cross-sectional effects, while age-cohort interactions lead to longitudinal effects whose slopes depend onage. Selective survival, by which life expectancy is correlated with the dependent variable, leads to populationchanges over time and hence are part of the longitudinal effects (Baltes, 1968). In contrast, with samplingbias, by which the probability of recruitment or the probability of dropout before the end of the study dependson the outcome variable, the sample is not representative of the population under study and biased estimatesmay result (Molenberghs and Fitzmaurice, 2009). 6 .2. Generalized Additive Models
Generalized additive models (GAMs) (Hastie and Tibshirani, 1986) model the effect of a variable x onan outcome y with smooth functions f ( x ), constructed as weighted sums of K basis functions b ( x ), b ( x ), . . . , b K ( x ) with weights β , β , . . . , β K , i.e., f ( x ) = (cid:80) Kk =1 β k b k ( x ). Commonly used basis functions arecubic regression splines and thin-plate regression splines (Wood, 2003), and the number of basis functions istypically chosen large enough to allow a wide range of nonlinear patterns to be estimated, while small enoughto allow computational efficiency. For a GAM with a single smooth term, y = f ( x ) + (cid:15) , the estimate given n observations is computed by finding the values of β , . . . , β K minimizing the criterion n (cid:88) i =1 (cid:34) y i − K (cid:88) k =1 β k b k ( x i ) (cid:35) + λ (cid:90) ba (cid:34) K (cid:88) k =1 β k b (cid:48)(cid:48) k ( x ) (cid:35) d x. The first term is the least squares criterion using the basis functions as explanatory variables, and the secondterm represents the wiggliness of f ( x ) as measured by its squared second derivative over some range [ a, b ],typically the minimum and maximum values of x in the sample. The smoothing parameter λ controls theextent to which wiggliness is penalized, striking a balance between overfitting (too low λ , too wiggly f ( x )) andunderfitting (too high λ , too smooth f ( x )). For data with repeated measurements, GAMs can be extended toGAMMs by the inclusion of random effects. A key insight allowing use of LMM software for efficient fitting ofGAMMs is that the penalized smooth terms may be converted to random effects with variance proportionalto 1 /λ (Lin and Zhang, 1999; Wood, 2004, 2010). Now consider a dataset of n participants indexed i = 1 , . . . , n , assume an outcome y ij has been measured m i times in participant i , with timepoints indexed by j = 1 , . . . , m i , and let a ij denote the age of participant i at her/his j th timepoint. The question of interest is how the outcome varies as a function of age, and thiscan be modeled with the GAMM y ij = β + f ( a ij ) + b i + (cid:15) ij , (1)where f ( a ij ) is the effect of age, β is the intercept, b i is the random intercept for participant i , and (cid:15) ij is a random noise term. Both b i and (cid:15) ij are assumed to be normally distributed, b i ∼ N (0 , σ b )and (cid:15) ij ∼ N (0 , σ ), with σ b representing the between-participant standard deviation and σ the within-participant residual standard deviation. We do not consider random slopes, due to the low number ofrepeated measurements in the typical applications considered in this paper, although this could be includedwith an additional term b i a ij in (1). With sufficient data, use of random slopes is recommended, as it relaxesthe assumptions on the covariance structure of repeated measurements (Fitzmaurice et al., 2011, Ch. 19).In the presence of cohort effects, the term f ( a ij ) represents some weighted combination of cross-sectionaland longitudinal effects, and hence cannot be interpreted as either. The typical method of correcting for thisin LMMs is by splitting the age term into a i representing age at first measurement, and t ij representingtime since baseline (Fitzmaurice et al., 2011; Zeger and Liang, 1992). Extending this to a GAMM yields y ij = β + f ( a i , t ij ) + b i + (cid:15) ij , (2)where f ( a i , t ij ) is a smooth two-dimensional surface representing the nonlinear interaction of baseline ageand time, constructed by multiplying basis functions for a i and t ij (Wood, 2006), thus containing the maineffect of both terms and their interaction. Considering the plots in Figure 1, including an interaction seemsnecessary for estimating lifespan trajectories, as the direction of change clearly depend on baseline age. Inmodel (2) the longitudinal effect of aging t from a baseline a i is given by f ( a i , t ) keeping a i constant, whilethe cross-sectional effect of varying baseline age a is given by f ( a, t ij vary between zero and some maximum which ismuch lower than the maximum age, which might make estimation of nonlinear longitudinal effects challenging.We hence introduce an alternative GAMM modeling cohort effects by including birth date c i , y ij = β i + f ( a ij ) + β ( a ij ) c i + b i + (cid:15) ij , (3)7 allidum Putamen Thalamus ProperCerebral White Matter Cortex HippocampusCaudate Cerebellum Cortex Cerebellum White MatterAccumbens Area Amygdala Brain Stem0 25 50 75 0 25 50 75 0 25 50 7517000180001900020000210002200026000280003000060007000800090001100012000130001400015000160002400280032003600 80000 90000100000110000120000500000600000 9000100001100012000 800100012001400160070007500800085009000360000390000420000450000480000360038004000 Age (years) V o l u m e ( mm ) Figure 5:
Lifespan curves . Characteristic curves of 12 brain regions, estimated from the LCBC data and used in simulationexperiments. in which f ( a ij ) is defined as for (1), while β ( a ij ) c i is a varying-coefficient term (Hastie and Tibshirani, 1993)representing the main effect of cohort (birth date) as a function of age. The longitudinal effect of aging a fora participant belonging to cohort c i is given by f ( a ) + β ( a ) c i keeping c i constant. The cross-sectional effectof age a at date d is given by f ( a ) + β ( a ) c , with c = d − a representing the birth date of participants of age a at date d , and hence both a and c are varying in this case. Model (3) is not identified if all participantshave identical measurement dates, since then birth date is a linear function of age, i.e., c i = d j − a ij where d j is the common date of the j th timepoint. However, as illustrated in Figure 3 (right), both the dates ofinitial measurements and the times between measurements may be highly varying in lifespan data, and thisvariability helps identifying the estimates of model (3).The effect of a time-invariant variable x i on the age trajectory can be estimated by adding an interactionterm f i ( a ij ) x i in models (1) and (3), or an interaction term f i ( t ij ) x i in model (2). If x i is a categoricalvariable, this amounts to estimating how the trajectories associated with a given category differ with respectto a reference category, while for continuous x i the interaction becomes a varying-coefficient term as wasintroduced in model (3). In order to compare the GAMMs (1)-(3), characteristic lifespan curves were estimated for 12 brain regionswith the LCBC data, using GAMMs on the form (1), with additional covariates sex, scanner, and totalintracranial volume (ICV). Volumes were estimated with FreeSurfer 6.0 (Dale et al., 1999; Fischl et al.,2002; Reuter et al., 2012), and detailed sample characteristics are presented in the Supplementary Material.The curves, shown in Figure 5, were used as ground truths from which measurements were sampled. For8 o cohort effect Age−indep. cohort effect Cohort−age interaction Cerebral WMCortexHippocampus0 25 50 75 0 25 50 75 0 25 50 753500004000004500004000005000006000005000600070008000
Age (years) V o l u m e ( mm ) Birth cohort
Figure 6:
Simulated cohort effects . Cohort effects used in simulation studies, illustrated for cerebral white matter, cortex,and hippocampus. (Cerebral WM = Cerebral White Matter) each region, three cases were considered: no cohort effects, age-independent cohort effects, and age-cohortinteractions. In the latter two cases, cohort effects were added to the characteristic curves as illustrated inFigure 6 for cerebral white matter, cortex, and hippocampus, and in the Supplementary Material for theremaining regions. Data were generated with n = 1 ,
000 participants, and the number of timepoints m i foreach was uniformly distributed in { , , } . The time between two measurements of a given participant wasuniformly distributed between 1 and 8 years, which combined with the maximum number of 3 timepoints setthe maximum possible follow-up interval to (3 − × b i and residuals (cid:15) ij were sampledfrom normal distributions with mean zero and standard deviations equal to 50 % and 20 % of the samplestandard deviation of the region’s volume, respectively, similar to what was observed in the LCBC data.Datasets for each of the 12 × Table 1: Models used in simulation experiments with GAMMs. The ’Identifier’ column describes the name used to identify themodel in the simulation results presented in Section 3.1 and in the Supplementary Material. C e r eb r a l W M C o r t e x H i ppo c a m pu s Time since baseline (years) R oo t − m ean − s qua r e e rr o r ( mm ) Baseline age
Figure 7:
Longitudinal estimates with no cohort effects . Simulation results in the case of no cohort effects, showing theRMSE of the predicted value after baseline ages 10, 35, and 60 years. For any given time t along the x -axis, the curves representthe RMSE of the predicted longitudinal effect of t years of increased age since baseline. Column headers specify the model fittedto the data, as defined in Table 1.
3. Results
Figure 7 shows root-mean-square error (RMSE) of longitudinal estimates for each of the first 25 yearsafter baseline ages 10, 35, and 60 years in the case of no cohort effects. Overall, model (1) with longitudinaldata had the most accurate fits, but both model (1) with only cross-sectional data and the two variantsof model (3) were close. The two variants of model (2), on the other hand, had much poorer fits thanthe other models, even for times very close after baseline, for which the data contained a large number ofobservations. Figure 8 shows the results in the presence of age-cohort interactions. Now model (1) with orwithout longitudinal data had almost as high RMSE as model (2). Model (3), on the other hand, was able toaccurately estimate the longitudinal effects. Model (3b), which allows cohort-age interactions, had slightlybetter overall performance than model (3a), which only contains the cohort effect as a single offset term.Results for age-independent cohort effects and for other regions are shown in the Supplementary Material.Figure 9 shows the RMSE of the longitudinal estimates 25 years ahead averaged over each year andover baseline ages of 10, 35, and 60 years, for 6 regions chosen by alphabetic ordering. In the absenceof cohort effects (left column), model (1) was most accurate, with variant (1b) utilizing longitudinal dataperforming slightly better than (1a) which only uses cross-sectional data. The two formulations of model(3) performed well also in the absence of cohort effect, although with higher variation than model (1), likelydue to the additional noise contributed by the cohort term which in this case had zero actual effect. Asexpected, with age-independent cohort effects, model (3a) which includes the cohort effect as a single offsetterm performed better than model (3b) which also estimates cohort-age interactions, while the opposite wastrue with cohort-age interactions. The R package gghalves (Tiedemann, 2020) was used for creating Figure9. Figure 10 shows estimated longitudinal effects of age on hippocampal volume from a baseline age 10 yearsfrom 100 model fits randomly selected among the 1,000 simulated samples. While the estimates of model (3)follow the true effect over the full 25-year period, the estimates of model (2) start diverging before 10 yearsafter baseline. This happens because most participants were not followed for more than about 10 years, andas a consequence of the model formulation of (2), there is not enough data to estimate the effect of timebeyond 10 years. In the absence of data smooth estimates tend toward straight lines, due to the second10 C e r eb r a l W M C o r t e x H i ppo c a m pu s Time since baseline (years) R oo t − m ean − s qua r e e rr o r ( mm ) Baseline age
Figure 8:
Longitudinal estimates with cohort-age interactions . Simulation results in the case of age-cohort interactions,showing the RMSE of the predicted value after baseline ages 10, 35, and 60 years. For any given time t along the x -axis, thecurves represent the RMSE of the predicted longitudinal effect of t years of increased age since baseline. Column headers specifythe model fitted to the data, as defined in Table 1. derivative penalty, as Figure 10 shows. The same effect can be seen by considering the right column of Figure9, with cohort-age interactions. In this case model (2) had accuracy close to model (3) for accumbens areaand caudate, while model (3) was far more accurate for the remaining regions. From Figure 5 it is clear thataccumbens area and caudate are characterized by close to linear trajectories, while the four other regionshave highly nonlinear trajectories, supporting the hypothesis that model (2) is well suited for estimatinglinear longitudinal effects, but not nonlinear effects.Simulation results with identical baseline dates shown in the Supplementary Material are practicallyidentical to those described in this section, suggesting that the issue of varying baseline dates is not criticalfor GAMMs. Instead, as Figure 10 shows, the main challenge with estimating longitudinal effects using theGAMM (2) is caused by the fact that time t ij spans a short period compared to the full lifespan, makingestimation of nonlinear effects beyond the maximum follow-up time impossible. For estimation of cross-sectional effects, the differences between models were much smaller. However, the two versions of model(2) still showed the poorest performance, while the two versions of model (3) showed the best performance.Detailed results for estimation of cross-sectional effects are shown in the Supplementary Material. In this section we show how GAMMs can be applied to the study of lifespan brain development, withexample R code using the packages mgcv (Wood, 2017) and gamm4 (Wood and Scheipl, 2017). Some parts ofthe code are omitted for ease of presentation, and can be found in the Supplementary Material. We first consider the hippocampal volumes shown in Figure 1 (right). The data are organized in longformat in the dataframe dat , with each row representing one timepoint of a single participant, containingthe following variables: • ID : Unique participant ID. • Age : Age in years at MRI session. • Hippocampus : Estimated hippocampal volume in mm . • Hippocampus_z : Estimated hippocampal volume, standardized to have zero mean and unit variance.11 o cohort effect Age−independent cohort effect Cohort−age interaction A cc u m ben s A r ea A m y gda l a B r a i n S t e m C auda t e C e r ebe ll u m C o r t e x C e r ebe ll u m W M ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) Model R oo t − m ean − s qua r e e rr o r ( mm ) Figure 9:
Estimation of longitudinal effects . RMSE of longitudinal estimates averaged over the next 25 years followingbaseline ages 10, 35, and 60 years. Colored regions encircled by black lines show distributions of simulation results, and cloudsof points show RMSE for each simulated sample. The vertical axes represent the average both over the RMSE of the estimatedlongitudinal effects the first 25 years after baseline and over the three baseline ages, i.e., each full row for a given model in Figures7 and 8 correspond to a single box in this plot. Models specified along the horizontal axes are defined in Table 1. (CerebellumWM = Cerebellum White Matter). b (Age)*Cohort0 5 10 15 20 25 0 5 10 15 20 25800085009000 Time since baseline (years) H i ppo c a m pa l v o l u m e ( mm ) Figure 10:
Sample fits . A random sample of 100 fits in the case of cohort-age interaction for hippocampal volume. Thin linesshow estimated longitudinal effects from baseline age 10, and the thick red lines show the true values. • ICV : Estimated total intracranial volume in mm . • ICV_z : Estimated total intracranial volume, standardized to have zero mean and unit variance. • Sex : Participant sex, coded as ”Female” and ”Male”. • Scanner : Factor variable indicating which scanner was used for MRI. • Age_bl : Age in years at initial MRI session. • Time : Time in years since initial MRI session. • Birth_Date_z : Decimal number of years between birth date and 1st January 1970.The transformed variables with suffix _z were created because the algorithms used in the models to befitted are most stable when the numbers are of similar magnitude. Models not Separating Longitudinal and Cross-Sectional Effects.
We start by fitting a GAM using only thefirst timepoint of each participant, using the gam() function from mgcv . By default, gam() uses generalizedcross-validation (Golub et al., 1979) for smoothing, but for comparison with mixed models we specify thatrestricted maximum likelihood (REML) should be used, with the argument method = "REML" . The smoothfunction corresponding to the term f ( a ij ) in model (1) is specified with s(Age, k = 20, bs = "cr") , wherewe use k = 20 cubic regression ( bs = "cr" ) splines. The default is thin-plate splines ( bs = "tp" ), but inour experience cubic regression splines typically require half the computing time, without yielding poorer fit.ICV, sex, and scanner are used as additional covariates. In addition to mgcv , we load the dplyr package(Wickham et al., 2020) for data manipulation. Throughout this section, the names of the fitted modelscorrespond to the model identifiers in Table 1. library(mgcv); library(dplyr) cross_sectional_data <- filter(dat, Time == 0) mod1a <- gam(Hippocampus_z ~ s(Age, k = 20, bs = "cr") + ICV_z + Sex + Scanner,data = cross_sectional_data, method = "REML") The estimated smooth function can be immediately visualized with the plot() function. The output isnot shown, but see the curve labeled mod1a in Figure 13 (left). plot(mod1a)
We can check that the number of splines is sufficiently high with k.check() , which implements a permu-tation algorithm described in Wood (2017, Ch. 5.9). A significant p -value and estimated degrees of freedom13edf) close to the maximum degrees of freedom k’, indicates that more splines are required. As shown in theoutput below, k seems sufficiently high. The maximum number of degrees freedom k’ = 19 is one less thanthe number of cubic regression splines, because one degree of freedom is used to center the smooth functionsuch that it has zero mean over the range of age values in the data. k.check(mod1a) Next, we fit model (1) using the complete data. This can be achieved both with the gam() and gamm() functions from mgcv , and the gamm4() function from gamm4 . For the type of longitudinal data consideredhere, with a low number of repeated measurements of a large number of participants, gam() is very slowcompared to gamm() and gamm4() . We opt for the latter, as it is typically faster and more numerically stable.The main difference from the model with only cross-sectional data, is that we now specify a random interceptwith the argument random = ~(1|ID) . library(gamm4)mod1b <- gamm4(Hippocampus_z ~ s(Age, k = 20, bs = "cr") + ICV_z + Sex + Scanner,data = dat, random = ~(1|ID)) On a MacBook Pro, fitting this GAMM took 1.7 seconds, while fitting the GAM above took less than0.08 seconds. The gamm4() function returns a list with two elements, named mer and gam . The element mer contains information from the lmer() function in the lme4 package (Bates et al., 2015) used in thenumerical computations, and is useful for studying the random effect distributions. The element gam containsinformation about the smooth functions, and is useful for studying smooth terms and parametric fixed effects. str(mod1b, max.level = 1)
The lme4 package is automatically loaded when gamm4 is loaded, and its accessor functions can be usedto study the random effect distributions. The summary below shows that the between-participant variation,ˆ σ b = 0 . σ = 0 . mod1a this information is notavailable. The second line in the output (ˆ σ λ = 0 . λ = ˆ σ / ˆ σ λ = 38 . VarCorr(mod1b$mer)
Modeling Cohort Effects.
Next, we take cohort effects into account by fitting two versions of model (3), mod3a which contains a linear cohort effect term, and mod3b which contains a varying-coefficient term β ( a ij )consisting of five cubic regression splines, allowing the cohort effect to depend on age.14 od3a <- gamm4(Hippocampus_z ~ s(Age, k = 20, bs = "cr") +Birth_Date_z + ICV_z + Sex + Scanner, data = dat, random = ~(1|ID))mod3b <- gamm4(Hippocampus_z ~ s(Age, k = 20, bs = "cr") +s(Age, by = Birth_Date_z, bs = "cr", k = 5) + ICV_z + Sex + Scanner,data = dat, random = ~(1|ID)) We use the tidy() function from the broom package (Robinson and Hayes, 2020) to study the cohorteffect estimated by mod3a . The argument parametric = TRUE is required to extract the parametric effects.Before printing, dplyr ’s mutate() function is used to convert the numbers to mm by multiplying with thesample standard deviation, and 95 % confidence intervals (CIs) are computed by adding the standard error ofthe estimate multiplied by the 2.5 % and 97.5 % quantiles of the standard normal distribution. As the outputshows, the estimated cohort effect from this model is a negative offset of 1.25 mm per birth year, with 95% CI [ − . , .
78] mm . For example, this implies that participants born in 1970 compared to participantsborn in 1920, have an estimated offset − . ×
50 = − . with 95 % CI [ − , .
0] mm . The upperand lower limits of the CI are small, but not negligible compared to the sample average of 8,065 mm . library(broom)tidy(mod3a$gam, parametric = TRUE) %>% filter(term == "Birth_Date_z") %>% mutate(across(c(estimate, std.error),function(x) x * sd(dat$Hippocampus))) %>% mutate(conf.low = estimate + qnorm(.025) * std.error, conf.high = estimate + qnorm(.975) * std.error) Next, the varying-coefficient term in mod3b is extracted as shown below. Its estimated degrees of freedomis 2, implying that the cohort effect is estimated as a straight line defined by an intercept and a slope. Its p -value of 0 . tidy(mod3b$gam) %>% filter(term == "s(Age):Birth_Date_z") The estimated cohort effect can be plotted with the code shown below, using the plot() function for gam objects. The argument scale = 0 ensures that the y-axis limits are adjusted to the term to be plotted,rather than also covering the full range of the term representing the main effect of age. The function suppliedto the trans argument is used to convert the estimates back to mm . plot(mod3b$gam, select = "s(Age):Birth_Date_z", scale = 0,trans = function(x) x * sd(dat$Hippocampus)) A slightly modified version of the resulting plot is shown by the solid and dashed lines in Figure 11. Thefact that the estimated cohort effect averaged over all ages is slightly negative is in agreement with mod3a ,15
Age (years) C oho r t e ff e c t ( mm / y ea r) Figure 11:
Estimated cohort effect . Estimated cohort effect on hippocampal volume as a function of age. Dashed lines show95 % across-the-function CIs, which have the property that the true function is expected to lie within the CI over 95 % of the x -axis. Dotted lines show 95 % simultaneous CIs, which have the property that the true function is expected to be completelyconfined within the CI 95 % of the time under repeated sampling from the population. which estimated a negative but non-significant cohort effect. The CIs shown by the plot() function have theproperty that under repeated sampling from the population, the true function will on average be confinedwithin the upper and lower limits over 95 % of the x -axis (Marra and Wood, 2012; Nychka, 1988). Theseacross-the-function CIs will contain the true function less than 95 % of the time under repeated samplingfrom the population, which explains why the upper limit in Figure 11 is well below zero despite the p -valuebeing larger than 0.05. Simultaneous CIs, on the other hand, would fully contain the complete function 95% of the time under repeated sampling, and can be constructed using a simulation-based approach (Ruppertet al., 2003; Simpson, 2016) shown in the Supplementary Material. These simultaneous CIs are shown as thedotted lines in Figure 11, and are wider than the across-the-function CIs. The fact that its upper limit isvery close to zero for high ages and its lower limit never is above zero, is in agreement with the p -value beingapproximately 0.05.The age-dependent cohort effects estimated by mod3b imply that a participant born in 1970 is expectedto have a 131 mm (CI: [ − , ) higher hippocampal volume at age 20 than a participant born in1920 had at age 20. Conversely, a participant born in 1970 is expected to have a 340 mm (CI: [ − . , ) lower hippocampal volume than a participant born in 1920, at age 70. As for mod3a , these resultssuggest that a cohort effect cannot be ruled out, despite the term not being significant, since cohort effectsof relatively large magnitude are contained within the 95 % CIs. Modeling Baseline Age and Time Since Baseline.
Finally, we fit model (2), using the t2() function to createa two-dimensional smooth term (Wood et al., 2012). The argument k = c(20, 5) specifies that 20 cubicregression splines are used for the effect of baseline age, while only 5 are used for the effect of time, as thisterm does not span more than 11 years. The construction of the two-dimensional function involves formingproducts of all combinations of splines for baseline age and time, implying that the total number of degreesof freedom used by the term equals 20 × − mod2b below took ≈
90 seconds on a MacBook Pro. mod2b <- gamm4(Hippocampus_z ~ t2(Age_bl, Time, k = c(20, 5), bs = "cr") +ICV_z + Sex + Scanner, data = dat, random = ~(1|ID))
Model (2) in Section 2.3 could alternatively be formulated with three smooth terms: the main effects of ageand time, and their interaction. This allows significance testing of each term separately, e.g. to investigate theextent of age-cohort interactions. Functionality for fitting such a model is provided by mgcv ’s ti() function,representing a two-dimensional tensor interaction term in which the main effects have been removed (Wood,2006). As it is not available in gamm4 , the gamm() function needs to be used. The syntax is very similar,except that the random intercept is specified with random = list(ID =~ 1) and the use of REML rather16han the default marginal maximum likelihood is specified with method = "REML" for comparability with themodels fitted with gamm4() . mod2b_ti <- gamm(Hippocampus_z ~ s(Age_bl, k = 20, bs = "cr") +s(Time, k = 5, bs = "cr") +ti(Age_bl, Time, k = c(20, 5), bs = "cr") +ICV_z + Sex + Scanner,data = dat, random = list(ID =~ 1), method = "REML") Information about the model components can again be obtained with tidy() , and shows that all termsare significant. Interestingly, the main effect of time is estimated to be linear, as can be seen by its singledegree of freedom, while the two-dimensional interaction term is highly nonlinear. tidy(mod2b_ti$gam)
Two-dimensional smooth terms can also be visualized with the plot() function, for which a perspectiveplot is produces by setting scheme = 1 . plot(mod2b$gam, scheme = 1) plot(mod2b_ti$gam, select = 3, scheme = 1) The resulting plots are shown in Figure 12. Considering the left part of the plot, the cross-sectional effectis visualized along the baseline age axis, with a trajectory similar to the lifespan hippocampal volume shownin Figure 5. The longitudinal effect, plotted along the time axis, is positive for low baseline ages and negativefor higher baseline ages. The tensor interaction term plotted in the right part of Figure 12 shows that theeffect of time is more positive in the youngest participants than in adults, while the effect of time is morenegative in the oldest participants compared to adults. Since this is an interaction term, the direction of theestimated total longitudinal effect cannot be evaluated based on Figure 12 (right) alone, but also needs totake the main effect into account.
Comparison of Model Fits.
Figure 13 shows estimated cross-sectional and longitudinal effects from the fivemodels estimated in this section. The cross-sectional effects are estimated for 1st January 2010, and areall quite similar. Model 1a, estimated with only cross-sectional data, indicates a less steep growth duringchildhood, and also exhibits some wiggliness between the age of 20 and the age of 50. The longitudinaleffects are estimated 15 years ahead from baseline ages of 10, 30, 50, and 70 years. Models 1a and 1b do notdistinguish longitudinal and cross-sectional effects, and hence have identical estimates in both plots. Theestimates from model 2b are quite different from those of the other models, except for a baseline age of 70.Given the simulation results of Section 3.1, we suspect that the estimates from 2b are not accurate. Thelongitudinal estimates of models 3a and 3b are close to the estimates of model 1b, again suggesting that thecohort effects in these data are moderate.
Estimating Age at Maximum Volume.
A question of interest when estimating lifespan curves, is the age atwhich critical points occur, e.g. the age of maximum volume, maximum growth, or maximum decline. Pointestimates of such critical ages can be read directly from the fits, if necessary after computing derivatives,but an assessment of their statistical uncertainty is not directly available. A Bayesian view of the smoothingintroduced by Kimeldorf and Wahba (1970) lets us achieve this. Letting ˆ β denote the estimated regressionparameters, including spline weights, and ˆΣ β their covariance matrix, the posterior distribution of the true17 a s e li ne age ( y ea r s )
20 40 60 80 T i m e s i n c e ba s e li ne ( y ea r s ) t ( A ge_b l , T i m e ) B a s e li ne age ( y ea r s )
20 40 60 80 T i m e s i n c e ba s e li ne ( y ea r s ) t i ( A ge_b l , T i m e ) −5000500 Figure 12:
Two-dimensional smooth functions . Left: tensor product term t2(Age_bl, Time) in mod2b , representing thetotal effect of baseline age and time. Right: tensor interaction term ti(Age_bl, Time) in mod2b_ti , representing the interactioneffect between baseline age and time. mod1amod1bmod2bmod3amod3bCross−sectional0 25 50 755000600070008000 Age (years) H i ppo c a m pa l v o l u m e ( mm ) Longitudinal20 40 60 80600070008000
Age (years) H i ppo c a m pa l v o l u m e ( mm ) Figure 13:
Model estimates . Estimated cross-sectional and longitudinal effects from each of the five models considered inSection 3.2.1. The cross-sectional estimates are computed for 1st January 2010. The longitudinal estimates are computed 15years ahead from baseline ages of 10, 30, 50, and 70 years. erebellum White Matter Hippocampus0 25 50 75 0 25 50 755000600070008000240002700030000 Age (years) V o l u m e ( mm ) Figure 14:
Posterior samples . The plots show 50 samples from the posterior distributions of curves for lifespan volumes ofcerebellum white matter and hippocampus. Red dots indicate the maximum of each curve. coefficients β is now a normal distribution with mean ˆ β and covariance ˆΣ β , β | y ∼ N ( ˆ β, ˆΣ β ) (Wood, 2017,Ch. 6.10). By sampling from this posterior distribution we can make confidence statements about anyquantity derived from the smooth functions. As an example, Figure 14 shows 50 samples from the posteriordistribution of volume curves for cerebellum white matter and hippocampus, with the maximum of eachmarked with a red dot. Even from these small samples it is evident that there is high uncertainty about theage at which cerebellum white matter volume is maximal, while there is less uncertainty about hippocampalvolume.By sampling 20,000 curves and locating the age at maximum for each, we obtained posterior distributionsof the age at maximum for each region. Figure 15 shows 95 % highest posterior density intervals computedfrom the posterior distributions for all 12 regions, using the HDInterval package (Meredith and Kruschke,2019). The plot shows that the uncertainty about the location of the maximum is highly variable betweenregions, with very narrow intervals for, e.g. caudate, cerebellum cortex, and thalamus proper, and wideintervals and high uncertainty, e.g. for the brain stem and cerebellum white matter.
We now demonstrate how factor-smooth interactions can be used to study how lifespan brain volumes areaffected by a categorical variable. As an example application we consider the apolipoprotein E (APOE) (cid:15) (cid:15) (cid:15) dat , with identical structure to the data used in Section3.2.1, except that variables representing hippocampal volume now are replaced by variables
Cerebellum and
Cerebellum_z , representing cerebellum cortex volume in mm and in sample standard deviations, respec-tively. In addition, a new variable Gene_APOEnE4 represents the total number of APOE (cid:15) dat contained 2,707 observations of 1,139participants. Of these, 764 (1,838 observations) had zero alleles, 341 (789 observations) had 1 allele, and 34(80 observations) had two alleles.
Factor Smooths.
In order to estimate the interaction effects, the variable
Gene_APOEnE4 needs to be codedas an ordered factor. This is done with the following code.19 ccumbens AreaAmygdalaBrain StemCaudateCerebellum CortexCerebellum White MatterCerebral White MatterCortexHippocampusPallidumPutamenThalamus Proper 10 20 30 40 50
Age at maximum volume (years)
Figure 15:
Age at maximum volume . 95 % highest posterior density intervals for the age at maximum volume of 12 brainregions. Red dots show posterior means. dat <- dat %>% mutate(Gene_APOEnE4 = ordered(Gene_APOEnE4))levels(dat$Gene_APOEnE4)
A factor-smooth interaction is defined by s(Age, by = Gene_APOEnE4, k = 10, bs = "cr") . For anordered factor variable with L levels, this term creates L − l th level and the trajectory associated with the baseline level.The difference between trajectories does not include pure offset effects, and hence the main effect of theordered factor must be added. In this case, two smooth factor interaction terms are created, associated with1 or 2 APOE (cid:15) Gene_APOEnE4 was a numeric variable, gamm4() would estimate avarying-coefficient term as used in Section 3.2.1 treating the number of alleles as a continuous variable, and if
Gene_APOEnE4 was a factor variable a single smooth term would be independently estimated for each factorlevel. mod <- gamm4(Cerebellum_z ~ s(Age, k = 10, bs = "cr") +s(Age, by = Gene_APOEnE4, k = 10, bs = "cr") +Gene_APOEnE4 + ICV_z + Sex + Scanner,data = dat, random = ~(1|ID))
Again, we use broom ’s tidy() function to study the estimated smooth terms. The str_detect() functionfrom stringr (Wickham, 2019) is used to detect terms containing the pattern "Gene_APOE" . The termsstarting with s(Age):Gene_APOEnE4 represent the difference between trajectories of carriers of one or twoalleles to carriers of zero alleles, respectively. From the p -values, it is clear that there is no evidence that theshape of the lifespan cerebellum white matter volume depends on APOE (cid:15) library(stringr)tidy(mod$gam) %>%filter(str_detect(term, "Gene_APOE")) The main effects of APOE (cid:15) parametric = TRUE to tidy() . The estimates, which are in units of sample standard deviations, are close to zero and not significant,indicating that there is no evidence for an offset effect of APOE (cid:15) .L (’linear’) and .Q (’quadratic’) are a consequence of how R treats ordered factors, and representthe offset effect of having one or two alleles, respectively, relative to having zero alleles. tidy(mod$gam, parametric = TRUE) %>%filter(str_detect(term, "Gene_APOE")) Prediction from GAMMs.
Creating predictions from GAMMs aids interpretation of the estimated effects,and we illustrate it here by comparing the estimated lifespan cerebellum cortex volumes for participants withzero, one, or two APOE (cid:15) tidyr ’s crossing() function (Wickham and Henry, 2020), all combinations of ages between 4 and 94 yearswith a spacing of 0.1 years, number of APOE (cid:15) predict() functionrequires all variables in the model to be defined, and we hence set ICV_z equal to the sample mean and
Scanner arbitrarily to "ousAvanto" , which is one of the scanners used in the LCBC data. Other values of
ICV_z and
Scanner would shift the resulting curves vertically, but the interpretation would not change. library(tidyr) grid <- crossing(Age = seq(from = 4, to = 94, by = .1),Gene_APOEnE4 = ordered(0:2),Sex = factor(c("Female", "Male"))) %>%mutate(ICV_z = 0, Scanner = "ousAvanto")
Next, predictions are computed at all values of the grid, converted mm , and stored in the dataframe plot_df , and ggplot2 (Wickham, 2016) is used to plot the predicted values. pred <- predict(mod$gam, newdata = grid) plot_df <- grid %>% mutate(fit = pred * sd(dat$Cerebellum) + mean(dat$Cerebellum))library(ggplot2) ggplot(plot_df, aes(x = Age, y = fit, group = interaction(Gene_APOEnE4, Sex),color = Gene_APOEnE4, linetype = Sex)) +geom_line() A slightly modified version of the resulting plot is shown in Figure 16. Note that since
Sex is a parametricterm, it merely shifts the curves vertically, without changing their shapes. In this example, none of theinteraction effects were significant, but the plots still show how smooth interaction terms create differentfunctional shapes depending on the number of APOE (cid:15)
Age (years) C e r ebe ll u m c o r t e x v o l u m e ( mm ) Sex
FemaleMale
APOE e Figure 16:
Factor smooth interactions . Estimated lifespan trajectories of cerebellum cortex volume for males and femaleswith 0, 1, or 2 APOE (cid:15)
4. Discussion
This paper has highlighted that GAMMs are well-suited for estimating lifespan brain trajectories. How-ever, the issue of potential cohort effects requires careful consideration, and direct translation of LMMformulations used for separating longitudinal and cross-sectional effects has potential pitfalls. In Section2.3 we defined model (1) which ignores cohort effects, model (2) which is a direct extension of LMMs com-monly used to separate longitudinal and cross-sectional effects, and the alternative model (3) which includesparticipant birth date as a model term. These models’ abilities to accurately estimate longitudinal and cross-sectional effects were compared in realistic simulation experiments reported in Section 3.1. Not surprisingly,in the absence of cohort effects model (1) was most accurate, with the version of (1) using longitudinal dataperforming slightly better than the equivalent model with only cross-sectional data. With cohort effects,on the other hand, model (3) was most accurate. More importantly, model (2) – which may be seen as a”classic” model used to separate longitudinal and cohort effects – consistently performed worse than (3), andas shown in Figure 9 it had considerably higher variation across the simulated samples. Figure 9 also showsthat model (2) was closest to model (3) for accumbens area and caudate volumes, whose lifespan trajectoriesare close to linear, while model (3) had the clearest advantage for regions with highly nonlinear trajectories,e.g. amygdala and cerebellum cortex. A natural interpretation of this, supported by Figures 7 and 8, is thatmodel (2) is not able to estimate longitudinal effects beyond the typical follow-up interval. In the extremecase of linear longitudinal effects, on the other hand, longitudinal effects for any time after baseline can inprinciple be accurately estimated with follow-up intervals of arbitrary length. Interpretation of the termsin model (2) as longitudinal and cross-sectional effects also requires that all participants have equal datesof initial measurement. However, simulations reported in the Supplementary Material suggest that varyingbaseline dates have a very small effect on the accuracy of model (2) in the settings considered here.While limitations will vary with regard to study specific characteristics, we find it important to emphasize,in light of the present findings, that ”classic” models will never be able to make accurate estimates of lifespantrajectories, or even trajectories for any substantial part of the lifespan. For technical reasons, if no other,available human cohorts with longitudinal imaging data do not span the desired intervals. Furthermore,reaching acceptable power is impossible if dates of initial measurement are to be contained within a smallfraction of time. A look at some of the most powerful and impressive combined cross-sectional and longitudinalstudies of brain changes with age, suggests that fractional follow-up interval and range of variation in initialmeasurement dates realistically need to be accommodated in all statistical models. For instance, even the22BCD study (Casey et al., 2018) which utilizes numerous scan sites to track development in thousands ofchildren at very similar age, need to allow for some variation in initial date of measurement, and follow-upintervals are so far limited to a couple of years. While there luckily are major and most impressive studiesthat contain information on participant samples for many decades, such as the Whitehall study (Filippiniet al., 2014), the Baltimore Longitudinal Study of Aging (Tian et al., 2015), the Betula (Gorbach et al., 2017),or the Lothian Birth Cohort study (Cox et al., 2018) these still await longitudinal imaging data (Filippiniet al., 2014), or typically have MRI data only for a small fraction of the time, less than a decade (Cox et al.,2018; Gorbach et al., 2017; Tian et al., 2015), sometimes with scan waves being completed across severalyears (Gorbach et al., 2017). We note this not as a critique of any study, but as a reminder that statisticalmodels at the very least need to accomodate the realistic situation for the best possible data.
The dangers of using cross-sectional data have repeatedly been pointed out in the quantitative psychologyliterature. For example, mediation analysis using purely cross-sectional data is likely to lead to biasedand misleading estimates under realistic conditions (Cole and Maxwell, 2003; Lindenberger et al., 2011;Lindenberger and P¨otter, 1998). In mediation analysis, the goal is to understand the causal paths throughwhich one or more variables x influence an outcome y , directly or through one or more mediating variables m . Since a cause precedes its effects, carefully designed longitudinal data collection as well as models capableof utilizing this information are then necessary (Collins et al., 1998). Longitudinal data is also required tounderstand how within-individual change differs from between-individual change and how within-individualchange is correlated across multiple processes (Lindenberger et al., 2011; Molenaar, 2004). Traditionally, suchstudies, for which SEMs are ideal analysis tools, have been conducted by following a group of participants ofsimilar age over a number of waves, e.g. Cox et al. (2020); Raz et al. (2005, 2010).While the above mentioned cautions about use of cross-sectional data are completely justified, they donot necessarily extrapolate to estimation of lifespan trajectories. If the goal is to estimate the populationeffect of aging on the volume of one or more brain regions, potentially including interaction effects of statictrait variables like genetic variations or education level (after completed education), a single measurementper participant may be sufficient. One example is when the strong assumption of no cohort effects is made.If it holds, cross-sectional and longitudinal effects are equal, and both can be accurately estimated by model(1) using purely cross-sectional data. However, the assumption of no cohort effects is not always necessary;with sufficient variation in baseline dates, model (3) is in principle able to estimate longitudinal and cross-sectional effects using a single measurement per participant. In practice, however, we have experiencedthat models of the form (3) become more stable and accurate with longitudinal data. In particular, boththe additional variation provided by repeated measurements with heterogeneous follow-up intervals and thecorrelation between repeated measurements of the same participant likely contribute to better separation ofage effects and cohort effects. Furthermore, as shown in Section 3.2.1, a GAMM using longitudinal dataalso estimates the between-individual variation and the within-individual variation, quantifying the extentto which differences between participants are due to systematic variation and noise, respectively. The GAMMs studied in this paper also have some limitations. Model (3) is not identified if all participantshave been measured at the exact same dates, since the cohort term c i then is a deterministic function ofage. This is also true with longitudinal data, and emphasizes the fact that these models are developed forheterogeneous data, typically combined from multiple studies.There is also a need for methodological development related to estimation of correlated change betweenregions. In principle, this could be done by fitting GAMMs separately for each region, and using the corre-lation of random slopes across regions as an estimate of correlated change. A more principled approach isoffered by joint modeling frameworks for LMMs (Fieuws and Verbeke, 2004, 2006), which in this case wouldamount to fitting a single hierarchical GAMM (Pedersen et al., 2019) for the lifespan trajectories of multipleregions, with interaction terms distinguish trajectories for each region and random effect structures modelingthe within-individual level and change correlation between region trajectories. However, the fact that theextent of correlated change between any pair of brain regions is likely to vary across the lifespan would alsoneed to be taken into account, e.g. by modeling correlations as functions of age. Combined with the need for23hree or more timepoints to accurately estimate random slopes, it may currently be challenging to obtain asufficient amount of longitudinal data for fitting such models.Furthermore, while we have considered GAMMs for estimating how time-invariant variables interactwith lifespan trajectories in Section 3.2.2, interaction with time-dependent variables may also be of interest.Although time-dependent interaction variables can be used within the framework considered here, the inter-pretation of the estimated effects becomes more challenging. If only the value of the time-dependent variableat the given timepoint affects the outcome, the effect can be interpreted in exactly the same way as for atime-invariant variable (Fitzmaurice et al., 2011, Ch. 13.5). In many applications, however, it may be moreplausible to assume that also the variable’s change since the previous timepoint contains relevant informa-tion, and in this case the models used in Section 3.2.2 will have biased estimates. Continuous-time SEMs(Driver and Voelkle, 2018; Oud and Jansen, 2000) may be useful for this purpose, as they allow regressing atime-dependent process (the outcome of interest) on the value of another time-dependent process (the inter-action variable) at earlier times, without the restrictive assumption of equally spaced time intervals imposedby ordinary SEMs. However, estimation of nonlinear smooth functions within continuous-time SEMs hasnot been reported in the literature, and is likely to be both computationally challenging and require largelongitudinal datasets.
5. Conclusion
GAMMs are attractive tools for estimating lifespan brain trajectories, which flexibly handle the nonlineareffects and variable follow-up intervals and measurement dates characteristic of lifespan data. If cohorteffects are negligible, GAMMs on the form (1) which directly model the effect of age yield the most accurateestimates, and in this case purely cross-sectional data may even be sufficient. More realistically, cohort effectsare likely to be present, and in this case the GAMM (3) which directly models the effect of birth cohort isable to accurately estimate longitudinal and cross-sectional effects. On the other hand, the GAMM (2) whichseparates the effect of age into a baseline term and a time term as is common with LMMs, yields poorestimates of longitudinal effects. With sufficient variation of measurement dates and follow-up intervals, wethus recommend model (3) for estimating lifespan brain trajectories. On the other hand, for time-structureddata containing little variation in measurement dates, model (3) is not identified and model (2) seems to bethe best option, with the caveat that estimated longitudinal effects will not be reliable for times larger thanthe average follow-up interval, as will also be apparent from the confidence intervals.The R packages mgcv and gamm4 provide efficient software for fitting GAMMs, and are complemented byadditional packages enabling easy visualization and interpretation of summary statistics. Declaration of Competing Interest
The authors report no competing interests.
Data and Code Availability Statement
Code for running simulation experiments is available in the Supplementary Material. The LCBC data areavailable from the corresponding author on reasonable request, given appropriate ethical and data protectionapprovals.
Ethics Statement
Collection of the LCBC was approved by a Norwegian Regional Committee for Medical and HealthResearch Ethics South. Written informed consent was obtained from all participants of age at majority, andthe parents or guardian for children below this age. 24 cknowledgement
The authors thank Andreas Brandmaier, Paolo Ghisletta, and Magne Thoresen for helpful discussions.The data collection was supported by the European Research Council under grant agreements 283634, 725025(to A.M.F.) and 313440 (to K.B.W.), the Norwegian Research Council (to A.M.F., K.B.W.), The NationalAssociation for Public Health’s dementia research program, Norway (to A.M.F) and center support from theUniversity of Oslo.
Supplementary Material
Supplementary Material is available at http://osorensen.rbind.io/pdf/supplementary.pdf . References
Baltes, P.B. Longitudinal and cross-sectional sequences in the study of age and generation effects.
HumanDevelopment , 11(3):145–171, 1968. doi: 10.1159/000270604. URL https://doi.org/10.1159/000270604 .Bates, Douglas; M¨achler, Martin; Bolker, Ben, and Walker, Steve. Fitting linear mixed-effects modelsusing lme4.
Journal of Statistical Software , 67(1):1–48, 2015. doi: 10.18637/jss.v067.i01. URL https://doi.org/10.18637/jss.v067.i01 .Brumback, Babette A. and Rice, John A. Smoothing spline models for the analysis of nested and crossedsamples of curves.
Journal of the American Statistical Association , 93(443):961–976, 1998. doi: 10.1080/01621459.1998.10473755. URL https://doi.org/10.1080/01621459.1998.10473755 .Casey, B.J.; Cannonier, Tariq; Conley, May I.; Cohen, Alexandra O.; Barch, Deanna M.; Heitzeg, Mary M.;Soules, Mary E.; Teslovich, Theresa; Dellarco, Danielle V.; Garavan, Hugh; Orr, Catherine A.; Wager,Tor D.; Banich, Marie T.; Speer, Nicole K.; Sutherland, Matthew T.; Riedel, Michael C.; Dick, Anthony S.;Bjork, James M.; Thomas, Kathleen M.; Chaarani, Bader; Mejia, Margie H.; Hagler, Donald J.; Cornejo,M. Daniela; Sicat, Chelsea S.; Harms, Michael P.; Dosenbach, Nico U.F.; Rosenberg, Monica; Earl, Eric;Bartsch, Hauke; Watts, Richard; Polimeni, Jonathan R.; Kuperman, Joshua M.; Fair, Damien A., andDale, Anders M. The adolescent brain cognitive development (ABCD) study: Imaging acquisition across21 sites.
Developmental Cognitive Neuroscience , 32:43–54, August 2018. doi: 10.1016/j.dcn.2018.03.001.URL https://doi.org/10.1016/j.dcn.2018.03.001 .Cole, David A. and Maxwell, Scott E. Testing mediational models with longitudinal data: Questions andtips in the use of structural equation modeling.
Journal of Abnormal Psychology , 112(4):558–577, 2003.doi: 10.1037/0021-843x.112.4.558. URL https://doi.org/10.1037/0021-843x.112.4.558 .Collins, Linda M.; Graham, John J., and Flaherty, Brian P. An alternative framework for defining mediation.
Multivariate Behavioral Research , 33(2):295–312, 1998. doi: 10.1207/s15327906mbr3302 5. URL https://doi.org/10.1207/s15327906mbr3302_5 .Corder, E.; Saunders, A.; Strittmatter, W.; Schmechel, D.; Gaskell, P.; Small, G.; Roses, A.; Haines, J.,and Pericak-Vance, M. Gene dose of apolipoprotein e type 4 allele and the risk of alzheimer’s diseasein late onset families.
Science , 261(5123):921–923, 1993. doi: 10.1126/science.8346443. URL https://doi.org/10.1126/science.8346443 .Cox, Simon R.; Allerhand, Mike; Ritchie, Stuart J.; Maniega, Susana Mu˜noz; Hern´andez, Maria Vald´es;Harris, Sarah E.; Dickie, David Alexander; Anblagan, Devasuda; Aribisala, Benjamin S.; Morris, Zoe;Sherwood, Roy; Abbott, N. Joan; Starr, John M.; Bastin, Mark E.; Wardlaw, Joanna M., and Deary,Ian J. Longitudinal serum s100 β and brain aging in the Lothian Birth Cohort 1936. Neurobiology ofAging , 69:274–282, September 2018. doi: 10.1016/j.neurobiolaging.2018.05.029. URL https://doi.org/10.1016/j.neurobiolaging.2018.05.029 . 25ox, SR; Harris, MA; Ritchie, SJ; Buchanan, CR; Hern´andez, Maria Vald´es; Corley, Janie; Taylor, Adele M;Madole, JW; Harris, SE; Whalley, HC; McIntosh, AM; Russ, TC; Bastin, ME; Wardlaw, JM; Deary, IJ,and Tucker-Drob, EM. Three major dimensions of human brain cortical ageing in relation to cognitivedecline across the 8th decade of life. bioRxiv , 2020. doi: 10.1101/2020.01.19.911420. URL .Curran, Patrick J. and Bauer, Daniel J. The disaggregation of within-person and between-person effects inlongitudinal models of change.
Annual Review of Psychology , 62(1):583–619, 2011. doi: 10.1146/annurev.psych.093008.100356. URL https://doi.org/10.1146/annurev.psych.093008.100356 .Dale, Anders M.; Fischl, Bruce, and Sereno, Martin I. Cortical surface-based analysis.
NeuroImage , 9(2):179–194, 1999. doi: 10.1006/nimg.1998.0395. URL https://doi.org/10.1006/nimg.1998.0395 .Diggle, Peter; Heagerty, Patrick; Liang, Kung-Yee, and Zeger, Scott.
Analysis of Longitudinal Data 2ndEdition . Oxford University Press, Oxford, U.K., 2002.Driver, Charles C. and Voelkle, Manuel C. Hierarchical bayesian continuous time dynamic modeling.
Psy-chological Methods , 23(4):774–799, 2018. doi: 10.1037/met0000168. URL https://doi.org/10.1037/met0000168 .Durb´an, M.; Harezlak, J.; Wand, M. P., and Carroll, R. J. Simple fitting of subject-specific curves forlongitudinal data.
Statistics in Medicine , 24(8):1153–1167, 2005. doi: 10.1002/sim.1991. URL https://doi.org/10.1002/sim.1991 .Edwards, Lloyd J.; Stewart, Paul W.; MacDougall, James E., and Helms, Ronald W. A method for fittingregression splines with varying polynomial order in the linear mixed model.
Statistics in Medicine , 25(3):513–527, 2005. doi: 10.1002/sim.2232. URL https://doi.org/10.1002/sim.2232 .Fieuws, Steffen and Verbeke, Geert. Joint modelling of multivariate longitudinal profiles: pitfalls of therandom-effects approach.
Statistics in Medicine , 23(20):3093–3104, 2004. doi: 10.1002/sim.1885. URL https://doi.org/10.1002/sim.1885 .Fieuws, Steffen and Verbeke, Geert. Pairwise fitting of mixed models for the joint modeling of multivariatelongitudinal profiles.
Biometrics , 62(2):424–431, 2006. doi: 10.1111/j.1541-0420.2006.00507.x. URL https://doi.org/10.1111/j.1541-0420.2006.00507.x .Filippini, Nicola; Zsoldos, Enik˝o; Haapakoski, Rita; Sexton, Claire E; Mahmood, Abda; Allan, Char-lotte L; Topiwala, Anya; Valkanova, Vyara; Brunner, Eric J; Shipley, Martin J; Auerbach, Edward;Moeller, Steen; U˘gurbil, Kˆamil; Xu, Junqian; Yacoub, Essa; Andersson, Jesper; Bijsterbosch, Janine;Clare, Stuart; Griffanti, Ludovica; Hess, Aaron T; Jenkinson, Mark; Miller, Karla L; Salimi-Khorshidi,Gholamreza; Sotiropoulos, Stamatios N; Voets, Natalie L; Smith, Stephen M; Geddes, John R; Singh-Manoux, Archana; Mackay, Clare E; Kivim¨aki, Mika, and Ebmeier, Klaus P. Study protocol: thewhitehall II imaging sub-study.
BMC Psychiatry , 14(1), 2014. doi: 10.1186/1471-244x-14-159. URL https://doi.org/10.1186/1471-244x-14-159 .Fischl, Bruce; Salat, David H.; Busa, Evelina; Albert, Marilyn; Dieterich, Megan; Haselgrove, Christian;van der Kouwe, Andre; Killiany, Ron; Kennedy, David; Klaveness, Shuna; Montillo, Albert; Makris,Nikos; Rosen, Bruce, and Dale, Anders M. Whole brain segmentation.
Neuron , 33(3):341–355, 2002. doi:10.1016/s0896-6273(02)00569-x. URL https://doi.org/10.1016/s0896-6273(02)00569-x .Fitzmaurice, Garrett M.; Laird, Nan M., and Ware, James H.
Applied Longitudinal Analysis 2nd Edition .John Wiley & Sons, Inc., Hoboken, New Jersey, USA, 2011.Fjell, Anders. M.; Walhovd, Kristine B.; Westlye, Lars T.; Østby, Ylva; Tamnes, Christian K.; Jerni-gan, Terry L.; Gamst, Anthony, and Dale, Anders M. When does brain aging accelerate? Dangersof quadratic fits in cross-sectional studies.
NeuroImage , 50(4):1376 – 1383, 2010. ISSN 1053-8119.doi: https://doi.org/10.1016/j.neuroimage.2010.01.061. URL . 26jell, Anders Martin; Idland, Ane-Victoria; Sala-Llonch, Roser; Watne, Leiv Otto; Borza, Tom; Brækhus,Anne; Lona, Tarjei; Zetterberg, Henrik; Blennow, Kaj; Wyller, Torgeir Bruun, and Walhovd, Kris-tine Beate. Neuroinflammation and tau interact with amyloid in predicting sleep problems in agingindependently of atrophy.
Cerebral Cortex , 28(8):2775–2785, 2017. doi: 10.1093/cercor/bhx157. URL https://doi.org/10.1093/cercor/bhx157 .Genin, E; Hannequin, D; Wallon, D; Sleegers, K; Hiltunen, M; Combarros, O; Bullido, M J; Engelborghs, S;Deyn, P De; Berr, C; Pasquier, F; Dubois, B; Tognoni, G; Fi´evet, N; Brouwers, N; Bettens, K; Arosio, B;Coto, E; Zompo, M Del; Mateo, I; Epelbaum, J; Frank-Garcia, A; Helisalmi, S; Porcellini, E; Pilotto, A;Forti, P; Ferri, R; Scarpini, E; Siciliano, G; Solfrizzi, V; Sorbi, S; Spalletta, G; Valdivieso, F; Veps¨al¨ainen,S; Alvarez, V; Bosco, P; Mancuso, M; Panza, F; Nacmias, B; Boss`u, P; Hanon, O; Piccardi, P; Annoni,G; Seripa, D; Galimberti, D; Licastro, F; Soininen, H; Dartigues, J-F; Kamboh, M I; Broeckhoven, C Van;Lambert, J C; Amouyel, P, and Campion, D. APOE and alzheimer disease: a major gene with semi-dominant inheritance.
Molecular Psychiatry , 16(9):903–907, 2011. doi: 10.1038/mp.2011.52. URL https://doi.org/10.1038/mp.2011.52 .Golub, Gene H.; Heath, Michael, and Wahba, Grace. Generalized cross-validation as a method for choosinga good ridge parameter.
Technometrics , 21(2):215–223, 1979. doi: 10.1080/00401706.1979.10489751. URL https://doi.org/10.1080/00401706.1979.10489751 .Gorbach, Tetiana; Pudas, Sara; Lundquist, Anders; Or¨add, Greger; Josefsson, Maria; Salami, Alireza;de Luna, Xavier, and Nyberg, Lars. Longitudinal association between hippocampus atrophy and episodic-memory decline.
Neurobiology of Aging , 51:167–176, March 2017. doi: 10.1016/j.neurobiolaging.2016.12.002. URL https://doi.org/10.1016/j.neurobiolaging.2016.12.002 .Gu, Chong and Ma, Ping. Generalized nonparametric mixed-effect models: Computation and smoothingparameter selection.
Journal of Computational and Graphical Statistics , 14(2):485–504, 2005. doi: 10.1198/106186005x47651. URL https://doi.org/10.1198/106186005x47651 .Hastie, Trevor and Tibshirani, Robert. Generalized additive models.
Statistical Science , 1(3):297–310, 081986. doi: 10.1214/ss/1177013604. URL https://doi.org/10.1214/ss/1177013604 .Hastie, Trevor and Tibshirani, Robert. Varying-coefficient models.
Journal of the Royal Statistical Society:Series B (Methodological) , 55(4):757–779, 1993. doi: 10.1111/j.2517-6161.1993.tb01939.x. URL https://doi.org/10.1111/j.2517-6161.1993.tb01939.x .Hoffman, Lesa. Multilevel models for examining individual differences in within-person variation and covari-ation over time.
Multivariate Behavioral Research , 42(4):609–629, 2007. doi: 10.1080/00273170701710072.URL https://doi.org/10.1080/00273170701710072 .Hoffman, Lesa and Stawski, Robert S. Persons as contexts: Evaluating between-person and within-personeffects in longitudinal analysis.
Research in Human Development , 6(2-3):97–120, 2009. doi: 10.1080/15427600902911189. URL https://doi.org/10.1080/15427600902911189 .Ke, Chunlei and Wang, Yuedong. Semiparametric nonlinear mixed-effects models and their applica-tions.
Journal of the American Statistical Association , 96(456):1272–1298, 2001. doi: 10.1198/016214501753381913. URL https://doi.org/10.1198/016214501753381913 .Kimeldorf, George S. and Wahba, Grace. A correspondence between Bayesian estimation on stochasticprocesses and smoothing by splines.
The Annals of Mathematical Statistics , 41(2):495–502, 1970. ISSN00034851. URL .Laird, Nan M. and Ware, James H. Random-effects models for longitudinal data.
Biometrics , 38(4):963–974,1982. ISSN 0006341X, 15410420. URL .Lambert, Paul C.; Abrams, Keith R.; Jones, David R.; Halligan, Aidan W. F., and Shennan, Andrew.Analysis of ambulatory blood pressure monitor data using a hierarchical model incorporating restrictedcubic splines and heterogeneous within-subject variances.
Statistics in Medicine , 20(24):3789–3805, 2001.doi: 10.1002/sim.1172. URL https://doi.org/10.1002/sim.1172 .27in, X. and Zhang, D. Inference in generalized additive mixed models by using smoothing splines.
Journalof the Royal Statistical Society: Series B (Statistical Methodology) , 61(2):381–400, 1999. doi: 10.1111/1467-9868.00183. URL https://doi.org/10.1111/1467-9868.00183 .Lindenberger, Ulman and P¨otter, Ulrich. The complex nature of unique and shared effects in hierarchicallinear regression: Implications for developmental psychology.
Psychological Methods , 3(2):218–230, 1998.doi: 10.1037/1082-989x.3.2.218. URL https://doi.org/10.1037/1082-989x.3.2.218 .Lindenberger, Ulman; von Oertzen, Timo; Ghisletta, Paolo, and Hertzog, Christopher. Cross-sectional agevariance extraction: What’s change got to do with it?
Psychology and Aging , 26(1):34–47, 2011. doi:10.1037/a0020525. URL https://doi.org/10.1037/a0020525 .Marra, Giamperio and Wood, Simon N. Coverage properties of confidence intervals for generalized additivemodel components.
Scandinavian Journal of Statistics , 39(1):53–74, 2012. doi: 10.1111/j.1467-9469.2011.00760.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9469.2011.00760.x .McArdle, John J. and Hamagami, Fumiaki. Latent difference score structural models for linear dynamicanalyses with incomplete longitudinal data. In
New methods for the analysis of change. , pages 139–175.American Psychological Association, 2001. doi: 10.1037/10409-005. URL https://doi.org/10.1037/10409-005 .Meredith, Mike and Kruschke, John.
HDInterval: Highest (Posterior) Density Intervals , 2019. URL https://CRAN.R-project.org/package=HDInterval . R package version 0.2.2.Meredith, William and Tisak, John. Latent curve analysis.
Psychometrika , 55(1):107–122, 1990. doi:10.1007/bf02294746. URL https://doi.org/10.1007/bf02294746 .Molenaar, Peter C. M. A manifesto on psychology as idiographic science: Bringing the person back intoscientific psychology, this time forever.
Measurement: Interdisciplinary Research & Perspective , 2(4):201–218, 2004. doi: 10.1207/s15366359mea0204 1. URL https://doi.org/10.1207/s15366359mea0204_1 .Molenberghs, G. and Fitzmaurice, G. Incomplete data: Introduction and overview. In Fitzmaurice, G.;Davidian, M.; Verbeke, G., and Molenberghs, G., editors,
Longitudinal Data Analysis , pages 395–408.Chapman and Hall, London, U.K., 2009.Morrell, C. H.; Brant, L. J., and Ferrucci, L. Model choice can obscure results in longitudinal studies.
TheJournals of Gerontology Series A: Biological Sciences and Medical Sciences , 64A(2):215–222, 2009. doi:10.1093/gerona/gln024. URL https://doi.org/10.1093/gerona/gln024 .Nychka, Douglas. Bayesian confidence intervals for smoothing splines.
Journal of the American StatisticalAssociation , 83(404):1134–1143, 1988. doi: 10.1080/01621459.1988.10478711. URL https://doi.org/10.1080/01621459.1988.10478711 .Oud, Johan H. L. and Jansen, Robert A. R. G. Continuous time state space modeling of panel data bymeans of SEM.
Psychometrika , 65(2):199–215, 2000. doi: 10.1007/bf02294374. URL https://doi.org/10.1007/bf02294374 .Pedersen, Eric J.; Miller, David L.; Simpson, Gavin L., and Ross, Noam. Hierarchical generalized additivemodels in ecology: an introduction with mgcv.
PeerJ , 7:e6876, 2019. doi: 10.7717/peerj.6876. URL https://doi.org/10.7717/peerj.6876 .R Core Team, .
R: A Language and Environment for Statistical Computing . R Foundation for StatisticalComputing, Vienna, Austria, 2019. URL .Raz, Naftali; Lindenberger, Ulman; Rodrigue, Karen M.; Kennedy, Kristen M.; Head, Denise; Williamson,Adrienne; Dahle, Cheryl; Gerstorf, Denis, and Acker, James D. Regional brain changes in aging healthyadults: General trends, individual differences and modifiers.
Cerebral Cortex , 15(11):1676–1689, 2005. doi:10.1093/cercor/bhi044. URL https://doi.org/10.1093/cercor/bhi044 .28az, Naftali; Ghisletta, Paolo; Rodrigue, Karen M.; Kennedy, Kristen M., and Lindenberger, Ulman. Tra-jectories of brain aging in middle-aged and older adults: Regional and individual differences.
NeuroIm-age , 51(2):501–511, 2010. doi: 10.1016/j.neuroimage.2010.03.020. URL https://doi.org/10.1016/j.neuroimage.2010.03.020 .Reuter, Martin; Schmansky, Nicholas J.; Rosas, H. Diana, and Fischl, Bruce. Within-subject templateestimation for unbiased longitudinal image analysis.
NeuroImage , 61(4):1402–1418, 2012. doi: 10.1016/j.neuroimage.2012.02.084. URL https://doi.org/10.1016/j.neuroimage.2012.02.084 .Robinson, David and Hayes, Alex. broom: Convert Statistical Analysis Objects into Tidy Tibbles , 2020. URL https://CRAN.R-project.org/package=broom . R package version 0.5.6.Ruppert, David; Wand, M. P., and Carroll, R. J.
Semiparametric Regression . Cambridge University Press,Cambridge, U.K., 2003.Simpson, Gavin.
Simultaneous intervals for smooths revisited , 2016.https://fromthebottomoftheheap.net/2016/12/15/simultaneous-interval-revisited/ (accessed July 10,2020).Sliwinski, Martin; Hoffman, Lesa, and Hofer, Scott M. Evaluating convergence of within-person changeand between-person age differences in age-heterogeneous longitudinal studies.
Research in HumanDevelopment , 7(1):45–60, 2010. doi: 10.1080/15427600903578169. URL https://doi.org/10.1080/15427600903578169 .Sullivan, Kristynn J.; Shadish, William R., and Steiner, Peter M. An introduction to modeling longitudinaldata with generalized additive models: Applications to single-case designs.
Psychological Methods , 20(1):26–42, 2015. doi: 10.1037/met0000020. URL https://doi.org/10.1037/met0000020 .Tian, Qu; Studenski, Stephanie A.; Resnick, Susan M.; Davatzikos, Christos, and Ferrucci, Luigi. Midlife andlate-life cardiorespiratory fitness and brain volume changes in late adulthood: Results from the baltimorelongitudinal study of aging.
The Journals of Gerontology Series A: Biological Sciences and Medical Sci-ences , 71(1):124–130, April 2015. doi: 10.1093/gerona/glv041. URL https://doi.org/10.1093/gerona/glv041 .Tiedemann, Frederik. gghalves: Compose Half-Half Plots Using Your Favourite Geoms , 2020. URL https://CRAN.R-project.org/package=gghalves . R package version 0.1.0.Walhovd, K.B.; Fjell, A.M.; Westerhausen, R.; Nyberg, L.; Ebmeier, K.P.; Lindenberger, U.; Bartres-Faz,D.; Baare, W.F.C.; Siebner, H.R.; Henson, R., and et al., . Healthy minds 0100 years: Optimising the useof European brain imaging cohorts (Lifebrain).
European Psychiatry , 47:7677, 2018. doi: 10.1016/j.eurpsy.2017.10.005.Walhovd, Kristine B.; Krogsrud, Stine K.; Amlien, Inge K.; Bartsch, Hauke; Bjørnerud, Atle; Due-Tønnessen,Paulina; Grydeland, H˚akon; Hagler, Donald J.; H˚aberg, Asta K.; Kremen, William S.; Ferschmann, Lia;Nyberg, Lars; Panizzon, Matthew S.; Rohani, Darius A.; Skranes, Jon; Storsve, Andreas B.; Sølsnes,Anne Elisabeth; Tamnes, Christian K.; Thompson, Wesley K.; Reuter, Chase; Dale, Anders M., andFjell, Anders M. Neurodevelopmental origins of lifespan changes in brain and cognition.
Proceedingsof the National Academy of Sciences , 113(33):9357–9362, 2016. doi: 10.1073/pnas.1524259113. URL https://doi.org/10.1073/pnas.1524259113 .Walhovd, Kristine B; Fjell, Anders M.; Sørensen, Øystein; Mowinckel, Athanasia Monica; Reinbold,C´eline Sonja; Idland, Ane-Victoria; Watne, Leiv Otto; Franke, Andre; Dobricic, Valerijia; Kilpert, Fabian;Bertram, Lars, and Wang, Yunpeng. Genetic risk for alzheimer’s disease predicts hippocampal volumethrough the lifespan. bioRxiv , 2019. doi: 10.1101/711689. URL .Wickham, Hadley. ggplot2: Elegant Graphics for Data Analysis . Springer-Verlag, New York, New York,USA, 2016. ISBN 978-3-319-24277-4. URL https://ggplot2.tidyverse.org .29ickham, Hadley. stringr: Simple, Consistent Wrappers for Common String Operations , 2019. URL https://CRAN.R-project.org/package=stringr . R package version 1.4.0.Wickham, Hadley and Henry, Lionel. tidyr: Tidy Messy Data , 2020. URL https://CRAN.R-project.org/package=tidyr . R package version 1.1.0.Wickham, Hadley; Franois, Romain; Henry, Lionel, and Mller, Kirill. dplyr: A Grammar of Data Manipula-tion , 2020. URL https://CRAN.R-project.org/package=dplyr . R package version 1.0.0.Wood, Simon and Scheipl, Fabian. gamm4: Generalized Additive Mixed Models using ’mgcv’ and ’lme4’ ,2017. URL https://CRAN.R-project.org/package=gamm4 . R package version 0.2-5.Wood, Simon N. Thin plate regression splines.
Journal of the Royal Statistical Society: Series B (StatisticalMethodology) , 65(1):95–114, 2003. doi: 10.1111/1467-9868.00374. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9868.00374 .Wood, Simon N. Stable and efficient multiple smoothing parameter estimation for generalized additive models.
Journal of the American Statistical Association , 99(467):673–686, 2004. doi: 10.1198/016214504000000980.URL https://doi.org/10.1198/016214504000000980 .Wood, Simon N. Low-rank scale-invariant tensor product smooths for generalized additive mixed models.
Bio-metrics , 62(4):1025–1036, 2006. doi: 10.1111/j.1541-0420.2006.00574.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1541-0420.2006.00574.x .Wood, Simon N. Fast stable restricted maximum likelihood and marginal likelihood estimation of semipara-metric generalized linear models.
Journal of the Royal Statistical Society: Series B (Statistical Method-ology) , 73(1):3–36, 2010. doi: 10.1111/j.1467-9868.2010.00749.x. URL https://doi.org/10.1111/j.1467-9868.2010.00749.x .Wood, Simon N.; Scheipl, Fabian, and Faraway, Julian J. Straightforward intermediate rank tensorproduct smoothing in mixed models.
Statistics and Computing , 23(3):341–360, 2012. doi: 10.1007/s11222-012-9314-z. URL https://doi.org/10.1007/s11222-012-9314-z .Wood, S.N.
Generalized Additive Models: An Introduction with R . Chapman and Hall/CRC, Boca Raton,Florida, USA, 2nd edition, 2017.Zeger, Scott L. and Liang, Kung-Yee. An overview of methods for the analysis of longitudinal data.
Statisticsin Medicine , 11(14-15):1825–1839, 1992. doi: 10.1002/sim.4780111406. URL https://doi.org/10.1002/sim.4780111406https://doi.org/10.1002/sim.4780111406