[PDF] Distributional data analysis via quantile functions and its application to modelling digital biomarkers of gait in Alzheimer's Disease

Abstract

With the advent of continuous health monitoring via wearable devices, users now generate their unique streams of continuous data such as minute-level physical activity or heart rate. Aggregating these streams into scalar summaries ignores the distributional nature of data and often leads to the loss of critical information. We propose to capture the distributional properties of wearable data via user-specific quantile functions that are further used in functional regression and multi-modal distributional modelling. In addition, we propose to encode user-specific distributional information with user-specific L-moments, robust rank-based analogs of traditional moments. Importantly, this L-moment encoding results in mutually consistent functional and distributional interpretation of the results of scalar-on-function regression. We also demonstrate how L-moments can be flexibly employed for analyzing joint and individual sources of variation in multi-modal distributional data. The proposed methods are illustrated in a study of association of accelerometry-derived digital gait biomarkers with Alzheimer's disease (AD) and in people with normal cognitive function. Our analysis shows that the proposed quantile-based representation results in a much higher predictive performance compared to simple distributional summaries and attains much stronger associations with clinical cognitive scales.

Full PDF

11 Distributional data analysis via quantile functions and its application tomodelling digital biomarkers of gait in Alzheimer’s Disease

Rahul Ghosal , ∗ , Vijay R. Varma , Dmitri Volfson , Inbar Hillel , Jacek Urbanek ,Jeﬀrey M. Hausdorﬀ , , , Amber Watts and Vadim Zipunnikov Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland USA National Institute on Aging (NIA), National Institutes of Health (NIH), Baltimore, Maryland, USA Neuroscience Analytics, Computational Biology, Takeda, Cambridge, MA, USA Center for the Study of Movement, Cognition and Mobility, Neurological Institute,Tel Aviv Sourasky Medical Center, Tel Aviv, Israel Department of Medicine, Johns Hopkins University School of Medicine, Baltimore Maryland, USA Department of Physical Therapy, Sackler Faculty of Medicine, and Sagol School of Neuroscience,Tel Aviv University, Tel Aviv, Israel Rush Alzheimer’s Disease Center and Department of Orthopedic Surgery,Rush University Medical Center, Chicago, USA Department of Psychology, University of Kansas, Lawrence, KS, USA* email: [email protected]

Summary:

With the advent of continuous health monitoring via wearable devices, users now generate their uniquestreams of continuous data such as minute-level physical activity or heart rate. Aggregating these streams intoscalar summaries ignores the distributional nature of data and often leads to the loss of critical information. Wepropose to capture the distributional properties of wearable data via user-speciﬁc quantile functions that are furtherused in functional regression and multi-modal distributional modelling. In addition, we propose to encode user-speciﬁc distributional information with user-speciﬁc L-moments, robust rank-based analogs of traditional moments.Importantly, this L-moment encoding results in mutually consistent functional and distributional interpretation of theresults of scalar-on-function regression. We also demonstrate how L-moments can be ﬂexibly employed for analyzingjoint and individual sources of variation in multi-modal distributional data. The proposed methods are illustrated ina study of association of accelerometry-derived digital gait biomarkers with Alzheimer’s disease (AD) and in peoplewith normal cognitive function. Our analysis shows that the proposed quantile-based representation results in a muchhigher predictive performance compared to simple distributional summaries and attains much stronger associations a r X i v : . [ s t a t . M E ] F e b iometrics with clinical cognitive scales. Key words:

Wearable data; Quantile functions; L-Moments; Scalar-on-function regression; JIVE; Alzheimer’sdisease; Gait

1. Introduction

Wearables now generate continuous streams of data such as minute-level physical activityor heart rate. These streams are a rich source of information and can be used for deeperunderstanding of human behaviours and their inﬂuence on human health and disease. Com-mon analytical practice in many epidemiological and clinical studies employing wearablesis to use simple scalar, or average summaries such as total activity count (TAC), minutesof moderate-to-vigorous-intensity physical activity (MVPA) (Varma et al., 2017; Bakraniaet al., 2017) or total step count (TSC) (Reider et al., 2020; Mc Ardle et al., 2020). However,collapsing the entire stream of data into a single metric completely ignore temporal anddistributional nature of the data.Temporal aspect of wearable data can be accounted for with functional data analysis(FDA) that treats wearable data streams as functional observations recorded over 24 hours(Morris et al., 2006; Xiao et al., 2015; Goldsmith et al., 2016, 2015). Associations betweenaccelerometric functional proﬁles and variables of interest such as health outcomes, age, andothers can be studied within FDA using scalar-on-function (SOFR) or function-on-scalar(FOSR) regression (Morris, 2015) models. In addition to diurnal (over 24-hour) modelling,FDA approaches help to model weekly and seasonal variation in accelerometric signal (Huanget al., 2019; Xiao et al., 2015). Co-registration (or warping) is often a desirable pre-processingstep to make sure the amplitude and phase variations are properly separated during diurnalmodelling (Dryden and Mardia, 2016; Wrobel et al., 2019).Distributional aspect of wearable data can be captured via modelling user-speciﬁc dis-tributions. Augustin et al. (2017) proposed to summarize accelerometry activity countsrecorded over 24-hours with user-speciﬁc histograms. They proposed to use these histogramsas predictors in scalar-on-function regression. The main limitations of this approach includei) possibly unequal (eﬀective) support across user-speciﬁc histograms, ii) inability to model

Biometrics, 000 speciﬁc quantiles of the distribution, which, as it will be demonstrated below, could be of agreat practical interest, iii) scale-dependence.In this article, we put forward an alternative to Augustin et al. (2017) and propose touse user-speciﬁc quantile functions to capture the distributional nature of wearable data.Our approach overcomes the limitations i) and ii) above and provides a more ﬂexible wayto summarize distributional properties of wearable data. There is rich literature on quantilefunctions (Gilchrist, 2000; Powley, 2013) in statistical modelling and decision theoretic anal-ysis. Quantile functions enjoy many mathematical properties, which make them particularlyuseful for distributional modelling. Quantile representations have been used in symbolicdata literature for regression modelling of both quantile functions responses and predictors.The approach is based on the use of ‘ Wasserstein distance (Irpino and Verde, 2013;Verde and Irpino, 2010). Zhang and M¨uller (2011) proposed a method for density functionestimation using a quantile synchronization approach, instead of using the cross-sectionalaverage density which does not incorporate time-warping. Recently, (Yang et al., 2020)and (Yang, 2020) proposed quantile-function-on-scalar regression approaches for modellingquantile functions as outcomes. Having quantile functions as outcomes imposes constraintson the regression that requires a regression approximant to be a valid quantile function. Inour regression applications, we do not have these constraints because user-speciﬁc quantilefunctions are functional predictors.Additional motivation for the proposed quantile-function approach stems from multipleresearch eﬀorts aimed at discovering digital biomarkers based on distributional properties ofwearable data. Stride-velocity-q95, a 95-percentile of accelerometry-estimated stride velocity,is one of the very ﬁrst digital biomarkers approved as a secondary endpoint by the EuropeanMedicines Agency (EMA), a European counterpart of the Federal Drug Administration inthe US (Haberkamp et al., 2019). This measure has been demonstrated to be much more sensitive for tracking longitudinal decline in children with Duchenne muscular distrophy com-pared to the mean stride-velocity. Another example comes from accelerometry-estimated gaitwhich is often quantiﬁed via user-speciﬁc averages, measures of variability (such as standarddeviation), and asymmetry (such as skewness) of gait parameters (Hausdorﬀ et al., 2018;Shema-Shiratzky et al., 2020) estimated over a wear-period. Thus, wearable applicationsactively employ various distributional or quantile based user-speciﬁc summaries.As a motivating study, we will focus on continuous accelerometry data collected over oneweek in a sample of 86 community-dwelling older adults with mild Alzheimer’s disease (AD)and cognitively normal controls (CNC)(Varma and Watts, 2017; Varma et al., 2021). AD isthe most common form of dementia and cases are projected to more than double in the next40 years (Alzheimer’s Association, 2020; Hebert et al., 2013). Digital biomarkers have recentlybeen considered for early detection of AD as an alternative to more invasive and expensiveﬂuid and imaging markers (Kourtis et al., 2019). Speciﬁcally, digital biomarkers that reﬂectalterations in gait may help to predict AD due to the close relationship between complexcognitive functions and gait (Yogev-Seligmann et al., 2008; Varma et al., 2021). In our studysubjects wore an accelerometer on their hip in order to measure continuous, communityactivity over 7 days. Using a validated processing pipeline (Weiss et al., 2014), we measured52, domain-speciﬁc gait parameters during identiﬁed episodes of sustained walking (deﬁnedas walking longer than 60 seconds) over the course of the 7-day data collection period.Figure 1 shows the user-speciﬁc quantile functions of accelerometry-estimated step velocityfor mild-AD and CNC group. [Figure 1 about here.]Interestingly, for each group, the average quantile function is directly related to the Wasser-stein barycenter of their respective distribution (Bigot et al., 2018). The average quantilefunctions, therefore, can be highly informative and can identify parts of the distribution

Biometrics, 000 that are most discriminative between groups of interest. The largest diﬀerence between thetwo groups can be seen in the upper quantiles of step velocity. This supports the point thatquantile functions can be useful for distributional representations.Three main methodological contributions of this article are as follows. First, we propose tocapture the distributional properties of wearable data via user-speciﬁc quantile functions anduse them as predictors in scalar-on-function regression. This approach is further generalizedusing functional generalized additive model (McLean et al., 2014) which can capture possiblenonlinear eﬀects of the quantile functions. Such models have strong mathematical interpreta-tions as linear functional of quantile functions or its transformation is known to encode severalimportant characteristics of a continuous distribution (Powley, 2013). Second, we proposeto use L-moments to represent user-speciﬁc quantile functions via functional decompositionsthat also preserve distributional information. L-moments introduced in Hosking (1990) arerobust rank based analogs of traditional moments and they fully deﬁne the distribution.Quantile functions are representable via L-moments through their projections on shiftedLegendre polynomials (LP) which are orthogonal on [0 , framework, introduce the mathematical background for quantile functions, and illustrate theproposed approach of using subject-speciﬁc quantile function in additive scalar-on-functionregression. In Section 3, we introduce L-moments for distributional data and show howthey can be used for SOFR and JIVE. In Section 4, we demonstrate the applications ofthe proposed methods in the Alzheimer’s disease (AD) study. Section 5 concludes with adiscussion on the main contribution and some possible extensions of this work.

2. Modelling Framework

Suppose, we have densely observed repeated measures data X ij ( j = 1 , . . . , J ) for subjects i = 1 , . . . , n , where j = 1 , . . . , J is typically quite large (14 days or 50-60 walking bouts).For example X ij can be X i ( t ij ), where X i is observed across various time-points t ij . Assume X ij ( j = 1 , . . . , J ) ∼ F i ( x ), a subject-speciﬁc cumulative distribution function (cdf), where F i ( x ) = P ( X ik x ). Then, we can deﬁne subject-speciﬁc quantile function Q i ( p ) = inf { x : F i ( x ) > p } . The quantile function completely characterizes the distribution of the individualobservations. In this article, we restrict our attention to cases, where both F i ( x ) s and Q i ( p )are continuous, which ensures Q i = F − i , so F i ( Q i ( p )) = p, Q i ( F i ( x )) = x . This also ensuresboth quantile function and cdf are strictly increasing in their respective domains (Powley,2013). Using X ij ( j = 1 , . . . , J ) one can calculate ˆ F i ( x ), the empirical cdf and then obtainthe empirical quantile function ˆ q i ( p ) = ˆ F − i ( p ) ( p ∈ [0 , • A non-negative linear combination of ﬁnite number of quantile functions is a quantilefunction.

Biometrics, 000 • For a probability distribution deﬁned via a quantile function Q ( · ), all integer moments canbe represented as E ( X m ) = R Q m ( p ) dp (assuming that all moments exist). • The diﬀerential entropy of a continuous distribution function can be calculated as R log ( Q ( p )) dp . • The quantile density function and the p -probability density function are deﬁned as q ( p ) = Q ( p ) and f ( Q ( p )), respectively. Here, f is the density function corresponding to F . • Tail behaviour of the distributions can be evaluated via van-Zweet index function Q ( p ) Q ( p ) (Powley, 2013; vanZwet, 1964), which is a quantile function for log concave distributions. • The average of subject-speciﬁc quantile functions ¯ Q ( p ) = n P ni =1 Q i ( p ) can be mapped tothe Wasserstein barycenter of the measures induced by the respective distributions (Bigotet al., 2018). • Expansions of quantile functions via orthogonal polynomials are directly related to thecomponents of the Shapiro–Francia Statistic (Takemura, 1983) and L-moments (Hosking,1990) which will be discussed in greater details in Section 3. • In L [0 , W or the Mallow’s d ( F , F ) = R ( Q ( p ) − Q ( p )) dp .Next section illustrates the use of quantile functions as functional objects in scalar-on-function regression models.2.1 Supervised Learning with quantile functions via SOFR

In this section, we assume that Y i , i = 1 , . . . , n is an outcome of interest that can becontinuous or discrete that comes from an exponential family. We consider the followinggeneralized scalar-on-function regression model: E ( Y i | X i , X i , . . . , X iJ ) = µ i , g ( µ i ) = α + Z Ti γ + Z Q i ( p ) β ( p ) dp. (1)Here g is a known link function and Q i ( p )s are the subject-speciﬁc quantile function ofpredictor of interest X ij and scalar covariates Z i . The smooth coeﬃcient function β ( p ) represents the functional eﬀect of the quantile function at quantile level p . As pointed out inReiss et al. (2017) locations with largest | β ( p ) | are the most inﬂuential to the response andof practical interest. In the special case of β ( p ) = β , model (1) reduces to a usual GLM onthe mean summary ( ν i = R Q i ( p ) dp ) of repeated measures data g ( µ i ) = α + Z Ti γ + β Z Q i ( p ) dp = α + Z Ti γ + βν i . (2)Several methods exist in the literature for estimation of the smooth coeﬃcient function β ( p )(Goldsmith et al., 2011; Marx and Eilers, 1999) in scalar-on-function regression model. Inthis article, we follow a smoothing spline estimation method. The penalized negative loglikelihood criterion for estimation is given by R ( α, γ , β ( · )) = − logL ( α, γ , β ( · ); Y i , Z Ti , Q i ( p )) + λ Z { β ( p ) } dp. (3)The second derivative penalty on β ( p ) ensures the resulting coeﬃcient function is smooth.We model the unknown coeﬃcient functions β ( p ) using univariate basis function expan-sion as β ( p ) = P Kk =1 β k θ k ( p ) = θ ( p ) T β , where θ ( p ) = [ θ ( p ) , θ ( p ) , . . . , θ K ( p )] T and β =( β , β , . . . , β K ) T is the vector of unknown coeﬃcients. In this article, we use cubic B-splinebasis functions, however, other basis functions can be used as well. The linear functionaleﬀect then becomes R Q i ( p ) β ( p ) dp = P Kk =1 β k R Q i ( p ) θ k ( p ) = P Kk =1 β k Q ik = Q Ti θ . Theminimization criterion in (3) now can be reformulated as, R ( ψ ) = R ( α, γ , β ) = − logL ( α, γ , β ; Y i , Z i , Q i ) + λ β T P β , (4)where P is the penalty matrix given by P = { R θ ( p ) θ ( p ) T dp } . This minimization can becarried out using the Newton-Raphson algorithm implemented under generalized additivemodels (GAM) (Wood et al., 2016; Wood, 2017). The smoothing parameter λ can bechosen using REML, information criterion like AIC, BIC, or data driven methods such asGeneralized CV. We use the refund package (Goldsmith et al., 2018) in R (R Core Team,2018) for implementation of SOFR. Biometrics, 000

Extension of Quantile function based SOFR to FGAM

The traditional scalar-on-function regression (SOFR) model (1) assumes a linear associationbetween the quantile function and the outcome. We extend the quantile function based SOFRmodel to functional generalized additive model (FGAM) of McLean et al. (2014) which canbe used to capture nonlinear eﬀects of quantile function Q ( p ). Speciﬁcally, we model the linkfunction g ( · ) as g ( µ i ) = α + Z Ti γ + R F ( Q i ( p ) , p ) dp. The bivariate function F { Q ( p ) , p } issmooth function on R × [0 , Q ( p ) atquantile level p . In a special case of F { Q ( p ) , p } = Q ( p ) β ( p ), FGAM reduces to model (1).The estimation procedure for FGAM is discussed in Web Appendix 1. Remark 1:

It can be shown that SOFR model using quantile functions of a predictor is invariant undercontinuous monotone transformation of predictors using equivariance property of the quantilefunction. The FGAM approach using quantile function is more ﬂexible and remains invariantunder any continuous transformation of predictors. (McLean et al., 2014).

3. L-moments and modelling quantile functions

L-moments were deﬁned by (Hosking, 1990) as the expectation of a linear combination oforder statistics. In particular, the r -th order L-moment of a random variable X is deﬁned as L r = r − r − X k =0 ( − k (cid:18) r − k (cid:19) E ( X r − k : r ) r = 1 , , . . . , (5)where X n X n . . . X n : n denote order statistics of a random sample of size n drawn from the distribution of X . The ﬁrst order L-moment equals the traditional mean.The second order L-moment is equal to the Gini’s mean diﬀerence statistic divided by 2.Thus, L can be seen as a robust measure of scale. Higher order L-moments are related torobust distributional summaries that can be interpreted similarly to traditional higher-ordermoments such as skewness and kurtosis.In our approach, we leverage an alternative representation of L-moments as projections of quantile functions on Legendre polynomial basis. Speciﬁcally, L-moments of order r can berepresented as L r = Z Q ( p ) P r − ( p ) dp, (6)where P r ( p ) is the shifted Legendre polynomials (LP) of degree r deﬁned as follows P r ( p ) = r X k =0 s r,k p r , s r,k = ( − r − k (cid:18) rk (cid:19)(cid:18) r + kk (cid:19) = ( − r − k ( r + k )!( k !) ( r − k )! . (7)Legendre polynomials have standard orthogonality properties on [0 ,

1] as Z P s ( p ) P r ( p ) dp = δ rs ∗ (1 / r + 1) , (8)where δ rs = I ( r = s ). The quantile function Q ( p ) hence have this following decomposition Q k ( p ) = k X r =1 (2 r − P r ( p ) Z Q ( p ) P r − ( p ) dp = k X r =1 (2 r − L r P r − ( p ) → Q ( p ) , k → ∞ , . (9)Note that this approximation can be poor in the tails, if the distribution is heavy tailed(Hosking, 1990). For a non negative random variable X , regular moments can be expressedas µ k = EX k = µ k ( ¯ F ) = k R ∞ x k − ¯ F ( x ) dx = µ k ( Q ) = R Q k ( p ) dp , where ¯ F ( x ) = 1 − F ( x ),and F ( x ) is the cumulative distribution function for X . The main advantages of L-momentsover traditional moments are the following. First, all L-moments exist as long as E ( X ) < ∞ .Trimmed L-moments (Elamir and Seheult, 2003) can be used, if E ( X s ) < ∞ , for some s ∈ (0 , µ k ( ¯ F ), µ k ( Q ) and L k ( Q ) = R Q ( p ) P k − ( p ) dp for a log-normal and aBeta distribution. [Figure 2 about here.]Note that the integrals for µ k ( ¯ F ), kx k − ¯ F ( x ) can diverge over x . Similarly, the integrals for Biometrics, 000 µ k ( Q ), Q k ( p ) can diverge over p . However, functions Q ( p ) P k − ( p ) always lie between Q ( p )and − Q ( p ), which guarantees the existence of all moments as long as R Q ( p ) dp < ∞ .3.1 Scalar-on-function regression (SOFR) using L-moments

In this section, we develop a scalar-on-function regression method using subject-speciﬁc L-moments. This approach will allow us to simultaneously interpret the results of SOFR bothin terms of functional eﬀects of quantile functions as well as in terms of the eﬀects of speciﬁcL-moments.The representation of the quantile function in (9) allows us to deﬁne an inner productbetween two quantile functions for subjects i and j in terms of their L-moments as = P ∞ r =1 (2 r − L ir L jr . Since shifted Legendre polynomials form an orthogonalbasis of L [0 , β ( p ) in model (1) as β ( p ) = P Kk =1 β k P k − ( p ). Thus, thescalar-on-function regression model (1) reduces to a standard GLM on the subject-speciﬁcL-moments as g ( µ i ) = α + Z Ti γ + K X k =1 β k Z Q i ( p ) P k − ( p ) dp = α + Z Ti γ + K X k =1 β k L ik . (10)Unknown basis coeﬃcients β k in this representation capture a linear eﬀect of the L-momentsof the subject-speciﬁc distribution. Note that the ﬁrst order L-moment L equals mean of thesubject-speciﬁc distribution, therefore, model (10) is more general than using the subject-speciﬁc mean.This approach can be seen as somewhat analogous to functional principal componentregression (fPCR) method (Reiss et al., 2017), where fPC scores are used for supervised learn-ing. L-moments are projections on orthogonal basis functions and, hence, can be interpretedin similar additive manner while additionally providing moment-based characterization ofunderlying distributions. The number of L-moments K to be retained can be treated asa tuning parameter and can be chosen in a data driven way using cross-validation or information criteria such as AIC or BIC. It is important to note that the proposed approachallows mutually consistent interpretation of the results from the quantile-level perspectivevia β ( p ) and from the distributional L-moment level perspective via β k ’s.Nonlinear associations can be modelled using nonlinear scalar-on-function regression (Reisset al., 2017) that can be seen as a functional analogue of single index model(Stoker, 1986).In particular, the model is given by E ( Y i | X i , X i , . . . , X iJ ) = µ i , g ( µ i ) = α + Z Ti γ + h (cid:18)Z Q i ( p ) β ( p ) dp (cid:19) , (11)where h ( · ) is a smooth unknown function on real line. Expanding β ( p ) via Legendre poly-nomial basis, we get a traditional single index model where the L-moments play a role ofpredictors as g ( µ i ) = α + Z Ti γ + h ( L Ti β ). Traditional estimation methods for single indexmodel (Wang and Yang, 2009; Ichimura, 1991) can be applied to estimate both h ( · ) and β .An alternative way to capture nonlinear association between the outcome and quantilefunctional predictors is to use a generalized additive model (GAM) with L-moments as g ( µ i ) = α + Z Ti γ + P Kk =1 h k ( L ik ) . The GAM approach using L-moments is analogous to the“functional additive model” of M¨uller and Yao (2008), where fPC scores were used for scalar-on-function modelling and oﬀers additional interpretability in terms of nonlinear eﬀects ofrobust distributional summaries of data.3.2

Joint and Individual Variation Explained for multimodal distributional data viaL-moments

In this section, we demonstrate how L-moments can be used to identify joint and individualsources of variation in multi-modal distributional data. Suppose, we have repeated measuresdata from multiple domains d = 1 , , . . . , D each consisting of K d diﬀerent features on thesame subjects i = 1 , , . . . , n . Thus, we have subject-speciﬁc quantile functions Q ( d,r d ) i ( p ) for r d -th feature within domain d ( r d = 1 , , . . . , R d ). In many applications, it is important Biometrics, 000 to identify joint and individual sources of variation in these multi-modal distributionaldata. For scalar data, Lock et al. (2013) introduced joint and individual variation explained(JIVE) for integrative analysis of data coming from multiple domains. JIVE decomposesthe original block data matrix into three parts, a low rank approximation capturing thejoint structure and low rank approximations capturing domain-speciﬁc individual variationand noise. For multi-modal distributional data Q ( d,r d ) i ( p ), we propose to use L-moments foranalyzing joint and individual sources of variation. This enables us to decompose the dis-tributional variation captured via L-moments into joint and individual sources. Speciﬁcally,let L ( d,r d ) ik = R Q ( d,r d ) i ( p ) P k − ( p ) be the k -th L-moment ( k = 1 , , . . . , K ) for Q ( d,r d ) i . Foreach feature, we form the vector of L-moments and denoting it as L ( d,r d ) i . Then, for eachdomain d , we get the following vector of L-moments L ( d ) i = [ L ( d, T i , L ( d, T i , . . . , L ( d,R d ) T i ] T ,where L ( d ) i is a v d = KR d − dimensional vector consisting of all L-moments for all featuresin domain d . Next, we apply JIVE decomposition of these L-moments vectors as L ( d ) i = J di + A di + ε di = Φ dJ ξ J,i + Φ dA ξ dA,i + ε di . Here, J di and A di represent the low rank joint andindividual structures with rank s and s d , respectively, and ε di is the residual noise. Matricesof loadings for joint and individual structures are given by Φ dJ ∈ R ν d × S , Φ dA ∈ R ν d × S d , and ξ J,i , ξ dA,i are corresponding joint and individual scores for subject i . The summary of theproposed JIVE approach is presented as an Algorithm in Web Appendix 2. The ranks s , s d can be chosen using BIC or permutation tests (Lock et al., 2013). The number of L-moments K can be chosen beforehand in a data-driven way. The joint and individual scores ξ J,i , ξ dA,i can further be used for supervised learning purposes. We use the r.jive package (O’Connelland Lock, 2017) in R (R Core Team, 2018) for implementation of JIVE.

4. Digital gait biomarkers in Alzheimers’ Disease

Data for this study (Varma et al., 2021) were collected using a GT3x+ tri-axial accelerometerin a sample of older adults including mild-AD ( n = 38) and age-matched cognitively normal controls (CNC) ( n = 48). The accelerometer was placed on the dominant hip of the partici-pants via elastic belt. Activity was monitored continuously for seven days and, subsequently,gait parameters were obtained using a processing pipeline developed and validated in theParkinson’s Disease (PD) ﬁeld (Weiss et al., 2014). The pipeline outputs 52 gait metricscoming from 5 gait domains of Amplitude (8 metrics), Pace (3 metrics), Rhythm (13 metrics),Symmetry (9 metrics), and Variability (19 metrics). The complete list of gait metrics, alongwith their description and associated domains have been described in Varma et al. (2021)and is given in Web Table 4. Each gait metric is calculated every time a subject completes asustained bout of walking of at least 60 sec; this provides multiple observations per subjectacross each of the seven wear days. Figure 1 reveals distributional nature of this data for aparticular gait metric “step velocity” for AD and CNC group.Along with the gait metrics, we also have data on several baseline variables e.g., age, sex,BMI, years of education, V0 max (maximum rate of oxygen consumption during a treadmilltest) on each subject. Descriptive statistics of these variables for the complete, AD, and CNCsamples are displayed in Web Table 1. We will start with SOFR and FGAM approachesusing quantile functions of the digital gait biomarkers to model mild-AD status (Varmaet al., 2021). Subsequently, we illustrate SOFR via L-moment representation of quantilefunctions to study associations between digital gait biomarkers and cognitive functioning.Finally, we will perform JIVE on multi-modal digital gait biomarkers to identify their jointand individual sources of variation and then associate them with cognitive functioning.4.1 Discrimination of AD using Gait Biomarkers

One of the primary objectives of the AD study was to explore how well digital gait biomarkerscan discriminate between mild-AD and CNC. To do that, we perform SOFR and FGAMwith logit-link to model mild-AD vs CNC. We ﬁt multiple models using each gait metricsseparately and adjust for age and sex. For evaluation of the models, we use the “deviance Biometrics, 000 explained” criterion in GAM (Wood, 2017), which represents the proportion of null devianceexplained by the respective models. Table 1 displays the top ten gait metrics ranked accordingto deviance explained in SOFR and FGAM.[Table 1 about here.]The variables in FGAM consistently explain higher deviance which is expected. Particularlyinteresting are “Mean Stride Time s ” (mean stride time) and “Mean Step Time s ” (meanstep time). Used within FGAM, they explain a much higher proportion of deviance (0 . . F ( q, p ). FGAM captures this non-linearity and therefore produces a superiorperformance for this metric in terms of deviance explained (63%) compared to the SOFR model (37%). The estimated bivariate surface of the quantile eﬀect is more or less linear inquantile level ( q ) for step velocity and is highly negative in the upper tail (evident from thesliced eﬀect). Hence high maximal performance in this metric is associated with lower oddsof AD. Since the eﬀect of the quantile function for this metric is linear, the performance andinference from FGAM are similar to what we obtain from SOFR.4.2 SOFR via L-moments

In this section, we illustrate the application of L-moments in SOFR. In particular, we usestride regularity, step velocity, and cadence (among the top-performing metrics in SOFR)and derive the ﬁrst 4 L-moments of these metrics for each subject. Compared to traditional orcentral moments, L-moments are orthogonal projections on the shifted Legendre polynomials.As an illustration, we display the heatmap of sample correlation among the ﬁrst four L-moments, traditional and central moments of the gait metric cadence in Web Figure 3. Ascan be noticed the tradional and central moments are highly correlated among themselves,unlike the L-moments.Next, we ﬁt three separate logistic regression models for each of the gait metrics usingtheir L-moments to model mild-AD status. The results are reported in Table 2.[Table 2 about here.]Importantly, it is not the mean (equal to the ﬁrst order L-moment), but the second( L ) or third-order L-moments ( L ) which have a signiﬁcant eﬀect on odds of AD for allthe three gait metrics. Considering predictive performance, model A2 with step velocity isfound to be the best among the three models considered, in terms of the highest proportiondeviance explained (48 . . Biometrics, 000 orders 5, 6 and 8 to be signiﬁcant and this also improves the model performance (devianceexplained=55 .

81% using L , L , L , L ). We display the resulting coeﬃcient functions fromSOFR with L-moments in Web Figure 4. The coeﬃcient functions β ( p ) obtained withtraditional SOFR and SOFR with L-moments are very similar, but the latter approachprovides additional L-moments interpretability.For comparison, we provide results from a similar analysis using the ﬁrst four traditionaland central moments in Web Table 2 and 3, respectively. The eﬀect of the gait measuresare much weaker and not signiﬁcant (at nominal level α = 0 .

05) for traditional moments,possibly due to high correlation among themselves. Central moments perform similarly toL-moments. However, they do not provide a simple way to additionally interpret quantilelevel eﬀects as can be done in SOFR with L-moments.In addition to the cognitive status, this study used conﬁrmatory factor analysis (CFA) toderive cognitive scores for attention (ATTN), verbal memory (VM) and executive function(EF), which represent a continuous scale of cognitive functioning. We study association of theL-moments of stride regularity, step velocity and cadence with the cognitive scores of ATTN,VM and EF using a linear regression model, while adjusting for age, sex and education. Theresults are displayed in Table 3. [Table 3 about here.]We observe the cognitive scores of ATTN, VM and EF to be associated with sex, educationand primarily the second ( L ) order L-moments. For all the measures considered, thisassociation is found to be positive, indicating higher L moments are associated with highercognitive scores and lower odds of dementia, which matches with our earlier analysis. Thesignal for the cognitive scores of VM and EF are found be much stronger (adjusted R-squared 35% − − to a benchmark model on age, sex and education and competing models on mean of the gaitmetrics (around 30% −

40% improvement) in terms of adjusted R-squared criterion. Resultsfrom GAM using the L-moments of the gait measures discussed in Section 4.2 are reported inWeb Table 5. The eﬀect of the second order L-moment is again found to be most signiﬁcantfor the gait measures in all the models considered.Our analysis using L-moments demonstrates there is a signiﬁcant association betweenthe gait metrics and cognitive function, illustrates that robust higher order distributionalsummaries of the gait measures might be more informative and hence more important toconsider rather than using a single summary metric like mean (ﬁrst order L moment).4.3

JIVE with L-moments

So far in our analysis, we have considered each gait measure separately while performingSOFR with quantile functions or L-moments to study their association with cognitive per-formance. However, gait measures are correlated among themselves and conceptually can beplaced within one of the ﬁve unique gait domains mentioned earlier. The inter-dependenceof the metrics within and between the domains can be beneﬁcial in statistical modelling andcan be analyzed using JIVE approach with L-moments illustrated in Section 3.2. We focuson domains of Pace (3 features), Rhythm (13 features) and Variability (19 features) as thetop performing gait metrics in the SOFR analysis (see Table 1) belong to either of thesethree domains. Figure 4 displays a Venn diagram illustrating a conceptual overlap of jointand individual variation across the three domains.[Figure 4 about here.]First, we pre-normalize variable via z-score transformation z i = Φ − ( F n ( x i )), where x i s arethe original features (subject-speciﬁc L-moments in our case). Further, all the data blocksare again normalized (centered and scaled) so that data blocks from diﬀerent domains arecomparable as suggested by Lock et al. (2013). Applying JIVE (O’Connell and Lock, 2017) Biometrics, 000 and determining the optimal ranks via the permutation test, we get the following ranks:(Joint, Pace, Rhythm, Variability) = (2 , , , n = 86), we further performvariable selection and identify the important JIVE scores using LASSO (Tibshirani, 1996).The selected scores are joint-PC1, joint-PC2, Pace-PC1, Pace-PC2, Rhythm-PC1, Rhythm-PC2, Rhythm-PC5, Rhythm-PC7, Variability-PC4, Variability-PC7 and Variability-PC9.The results of logistic regression for cognitive status are reported in Web Table 6. To furtherunderstand the associations of JIVE PC scores with cognitive function relate them back tothe original gait metrics, we use the cross-correlation between the original L-moments andthe signiﬁcant JIVE scores. The top 10 gait metrics (L-moments) ranked according to theircorrelation with each score are displayed in Figure 6.[Figure 6 about here.]Joint-PC1, joint-PC2 are found to be positively associated with higher odds of AD. Forjoint PC-1, the ﬁrst order L-moments ( L ) from Pace (step velocity, distance) and Rhythm(stride regularity) are negatively loaded, indicating higher mean value for these variablese.g., step velocity lowers the odds of AD. Similarly, for joint PC-2, the second order L-moments ( L ) from Pace (step velocity, distance, mean step length) and Rhythm (cadence)are negatively loaded, indicating higher second order L-moments (representing scale) forthese variables are associated with a lower risk of AD which matches with our analysis in Section 4.2. The association between cognitive status and Rhythm-PC2, Rhythm-PC5 arefound to be negative, whereas Rhythm-PC7 is found to be positively associated. Primarily,higher order L-moments ( L , L , L ) gait metrics from Rhythm and Variability are loadedon these individual PCs outlining the importance of these domains.Our analysis in this section, using JIVE with L-moments of the gait measures provides acomprehensive summary of the joint and individual sources of variation of the gait metrics,helps in dimension reduction of the original problem, and provides better understanding ofthe underlying association between the gait measures and cognitive functions.

5. Discussion

Distributional data analysis is an emerging area of research that has a large number ofdiverse applications. There are many ways to represent distributional information includingcumulative distribution function, density function, quantile function, and others. Although,all these distributional representations can be re-expressed through each other via diﬀerential,integral, inverse, or other more involved transformation, a speciﬁc choice for statistical mod-elling may depend on desirable interpretation and require diﬀerent analytical machinery. Wehave proposed to capture distributional nature of data via subject-speciﬁc quantile functionsand use them or their L-moment representations in SOFR, FGAM, and JIVE methods. Aswe argue, our approach provide many advantages including intuitive interpretation of resultsand uniform support on [0 , Biometrics, 000 associated with cognitive domains of attention, verbal memory, and executive function. JIVEanalysis via L-moment representation of multi-modal distributions of gait measures revealedboth joint and individual sources of variation in the domains of Pace, Rhythm and Variability.These are many more areas that remain to be explored based on the current work. Forexample, it would of interest to conceptualize and accommodate distribution-level interac-tions. One way to do this would be via the inner product of quantile functions expressedin terms of the interactions of corresponding L-moments. Another direction of work isto extend the quantile function based SOFR model to capture possible temporal eﬀects.Finally, normalization of scales across multiple distributional predictors will need to be deeplystudied. For example, quantile function Q i ( p ) could be normalized within each subject usingthe transformation ( Q i ( p ) − M edian i ) /IQR i to make scales more comparable across subjectsand variables. References

Alzheimer’s Association (2020). 2020 alzheimer’s disease facts and ﬁgures.

Alzheimer’s &Dementia

Statistical methods in medical research

BMJ open e014456.Bigot, J., Gouet, R., Klein, T., Lopez, A., et al. (2018). Upper and lower risk bounds for estimating the wasserstein barycenter of random measures on the real line. ElectronicJournal of Statistics

Statistical shape analysis: with applications in R ,volume 995. John Wiley & Sons.Elamir, E. A. and Seheult, A. H. (2003). Trimmed l-moments.

Computational Statistics &Data Analysis

Statistical modelling with quantile functions . CRC Press.Goldsmith, J., Bobb, J., Crainiceanu, C. M., Caﬀo, B., and Reich, D. (2011). Penalizedfunctional regression.

Journal of computational and graphical statistics

Medicine and science insports and exercise refund: Regression withFunctional Data . R package version 0.1-17.Goldsmith, J., Zipunnikov, V., and Schrack, J. (2015). Generalized multilevel function-on-scalar regression and principal component analysis.

Biometrics

Neuromuscular Disorders

The Journals of Gerontology: Series A Biometrics, 000

Neurology

Journal of the Royal Statistical Society: Series B(Methodological)

Journal of the American Statistical Association

Statistical models for data analysis , pages 161–169.Springer.Kourtis, L. C., Regele, O. B., Wright, J. M., and Jones, G. B. (2019). Digital biomarkersfor alzheimer’s disease: the mobile/wearable devices opportunity.

NPJ digital medicine The annals ofapplied statistics Technometrics

Gait & Posture

Journal of Computational and Graphical Statistics

Annual Review of Statistics and Its Application Journal of the American Statistical Association

Journal of the AmericanStatistical Association r.jive: Perform JIVE Decomposition for Multi-Source Data . R package version 2.1.Parzen, E. et al. (2004). Quantile probability and statistical data modeling.

StatisticalScience

Quantile function methods for decision analysis . PhD thesis, StanfordUniversity.R Core Team (2018).

R: A Language and Environment for Statistical Computing . RFoundation for Statistical Computing, Vienna, Austria.Reider, L., Bai, J., Scharfstein, D. O., Zipunnikov, V., Investigators, M. O. S., et al. (2020).Methods for step count data: Determining “valid” days and quantifying fragmentationof walking bouts.

Gait & Posture

International Statistical Review Biometrics, 000

Shema-Shiratzky, S., Hillel, I., Mirelman, A., Regev, K., Hsieh, K. L., Karni, A., Devos, H.,Sosnoﬀ, J. J., and Hausdorﬀ, J. M. (2020). A wearable sensor identiﬁes alterations incommunity ambulation in multiple sclerosis: contributors to real-world gait quality andphysical activity.

Journal of Neurology pages 1–10.Stoker, T. M. (1986). Consistent estimation of scaled coeﬃcients.

Econometrica: Journal ofthe Econometric Society pages 1461–1481.Takemura, A. (1983). Orthogonal expansion of quantile function and components ofthe shapiro-francia statistic. Technical report, STANFORD UNIV CA DEPT OFSTATISTICS.Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.

Journal of the RoyalStatistical Society: Series B (Methodological)

Convex transformations of random variables . MathematischCentrum, Amsterdam.Varma, V. R., Dey, D., Leroux, A., Di, J., Urbanek, J., Xiao, L., and Zipunnikov, V. (2017).Re-evaluating the eﬀect of age on physical activity over the lifespan.

Preventive medicine

Alzheimer’s & Dementia:Translational Research & Clinical Interventions e12131.Varma, V. R. and Watts, A. (2017). Daily physical activity patterns during the early stageof alzheimer’s disease. Journal of Alzheimer’s Disease

Proceedings of COMPSTAT’2010 , pages 581–588. Springer.Wang, L. and Yang, L. (2009). Spline estimation of single-index models.

Statistica Sinica pages 765–783.Weiss, A., Herman, T., Giladi, N., and Hausdorﬀ, J. M. (2014). Objective assessment offall risk in parkinson’s disease using a body-ﬁxed sensor worn for 3 days. PloS one e96675.Wood, S. N. (2017). Generalized additive models: an introduction with R . CRC press.Wood, S. N., Pya, N., and S¨afken, B. (2016). Smoothing parameter and model selection forgeneral smooth models.

Journal of the American Statistical Association

Biometrics

Biostatistics

Journalof Statistical Planning and Inference

Journal of the American StatisticalAssociation

Movement disorders: oﬃcial journal of the Movement DisorderSociety

ComputationalStatistics & Data Analysis

Supporting Information

Web Appendix 1-2, Web Tables 1-6 and Web Figures 1-5 referenced in this article areavailable with supporting information at the Biometrics website on Wiley Online Library. Biometrics, 000

AD Y p s t ep v e l o c i t y quan t il e f un c t i on Q ( p ) AD N p s t ep v e l o c i t y quan t il e f un c t i on Q ( p ) p A v e r age quan t il e f un c t i on Q ( p ) , s t ep v e l o c i t y AD Y, n=38AD N, n=48

Figure 1 : Displayed are the individual (left two panel) and average (right panel) quantilefunctions of step velocity for AD and controls. . . . . . m k ( F ) x k=1k=2k=3k=4 . . . . . m k ( Q ) p k=1k=2k=3k=4 − − L k ( Q ) p k=1k=2k=3k=4−Q . . . . . . m k ( F ) x k=1k=2k=3k=4 . . . . . . m k ( Q ) p k=1k=2k=3k=4 − . − . . . . L k ( Q ) p k=1k=2k=3k=4−Q Figure 2 : Top: lognormal distribution with SD = 0.3. Bottom: beta distribution with α = β = 0 . Biometrics, 000

Step_Velocity__cm_sec_ p quan t il e f un c t i on q ( p ) AD Y, n=38AD N, n=48 − . − . . . . effect of quantile function p b ( p ) . . . . strRegAP p quan t il e f un c t i on q ( p ) AD Y, n=38AD N, n=48 − − effect of quantile function p b ( p ) Figure 3 : Linear functional eﬀects β ( p ) of quantile functions of Step Velocity cm sec (top)and strRegAP (bottom). Pace RhythmJoint VariationVariability

Domain1 Domain2Domain3

Figure 4 : Venn diagram highlighting two main sources of variation: joint (the shadedintersection region) and domain-speciﬁc (individual part of each circle). Biometrics, 000

Pace Rhythm Variability

Variation Explained . . . . . . JointIndividualResidual

Figure 5 : Joint and individual variation explained by each domain from JIVE. −0.9 −0.9 −0.8 0.8 0.8 0.8 0.8 −0.8 0.7 −0.7Joint PC 1 S t ep_ V e l o c i t y __ c m _ s e c _ D i s t an c e__ m _ s t r R eg V D i s t an c e__ m _ S t ep_ V e l o c i t y __ c m _ s e c _ s t r R eg V M ean_ S t ep_Leng t h__ c m _ s t r R eg AP s t r R eg AP M ean_ S t ep_Leng t h__ c m _ Var2 V a r −1.0−0.50.00.51.0 cross correlation −0.9 −0.9 −0.8 −0.8 −0.8 −0.7 −0.7 −0.7 −0.7 −0.7Joint PC 2 S t ep_ V e l o c i t y __ c m _ s e c _ D i s t an c e__ m _ M ean_ S t ep_Leng t h__ c m _ C aden c e_ V _ t i m e_do m a i n_ m p V s t r R eg V S T D A M PP ea ks M ean_ S t ep_ T i m e_ s _ M ean_ S t r i de_ T i m e__ s _ s l p V Var2 V a r −1.0−0.50.00.51.0 cross correlation −0.7 −0.7 −0.6 −0.6 −0.6 −0.6 −0.6 −0.6 −0.6 −0.6Rhythm PC 2 s t r i de T i m e M L s t r i de T i m e M L s t ep T i m e M L s t r i de T i m e M L s t ep T i m e M L s t r i de T i m e AP s t ep T i m e M L s t r i de T i m e M L s t ep T i m e AP s t ep T i m e M L Var2 V a r −1.0−0.50.00.51.0 cross correlation −0.5 −0.4 0.4 −0.4 −0.4 0.4 −0.4 −0.4 0.4 0.4Rhythm PC 5 C ad M L_ f r q_60___ s t p s _ m i n__ s t r i de T i m e V C ad AP _ f r q_60___ s t p s _ m i n__ s t ep T i m e V M ean P ea ks w d AP C VA M PP ea ks C aden c e_ V _ t i m e_do m a i n_ C ad V _ f r q_60___ s t p s _ m i n__ C ad M L_ f r q_60___ s t p s _ m i n__ Var2 V a r −1.0−0.50.00.51.0 cross correlation −0.6 0.3 −0.3 −0.3 −0.3 0.3 −0.3 −0.3 −0.3 −0.3Rhythm PC 7 C ad M L_ f r q_60___ s t p s _ m i n__ C ad M L_ f r q_60___ s t p s _ m i n__ s t r i de T i m e V s t r i de T i m e V C aden c e_ V _ t i m e_do m a i n_ C ad M L_ f r q_60___ s t p s _ m i n__ s t ep T i m e AP w d M L s t ep T i m e V C ad V _ f r q_60___ s t p s _ m i n__ Var2 V a r −1.0−0.50.00.51.0 cross correlation Figure 6 : Cross correlation between JIVE PC scores and L-moments. Shown are the top10 L-moments ranked according to absolute value of correlation with each PC score. “Gaitmeasure r ” represents ( r ) th L-moment of the particular gait metric. Biometrics, 000

Table 1: Displayed are the proportion of deviance explained by SOFR and FGAM modellingcognitive status using quantile functions of gait metrics in presence of age and sex. The top10 metrics have been reported for each of the method.

Variable(ranked for SOFR) Deviance explainedSOFR Variable(ranked for FGAM) Deviance explainedFGAMstrRegAP 0.58 Mean Stride Time s 0.63Step Velocity cm sec 0.50 Mean Step Time s 0.63Distance m 0.49 strRegAP 0.59Cadence V time domain 0.48 Step Velocity cm sec 0.50strideTimeV 0.40 Mean Step Length cm 0.50frqML 0.38 frqV 0.50CV Step time 0.38 stepTimeV 0.50frqV 0.37 strideTimeV 0.50Mean Stride Time s 0.37 Distance m 0.49Mean Step Time s 0.37 Cadence V time domain 0.48 Table 2: The results of logistic regressions for mild-AD status (adjusted for age and sex)on ﬁrst four L-moments of strRegAP (Model A1) , Step Velocity cm sec (Model A2) andCadence V time domain (Model A3) respectively. Deviance explained using only mean ofgait measures (adjusted for age and sex) are provided for comparison in the parenthesis.

Model A1 (strRegAP) Model A2 (Step Velocity) Model A3 (Cadence)Coef est p-value Coef est p-value Coef est p-value β β β . × − sex (M) 2.740 0.00155 L -0.604 0.85739 L -0.044 0.16441 L L -21.801 0.03457 L -0.480 0.00127 L -1.055 4 . × − L L -0.601 0.03620 L L L -0.212 0.61004 L -0.121 0.90079Devianceexplained 21 . . . . . . Biometrics, 000

Table 3: The results from linear regression models of cognitive scores (ATTN, VM and EF)on ﬁrst four L-moments of strRegAP , Step Velocity cm sec and Cadence V time domain ,adjusting for age, sex and education. Benchmark models using just age, sex, and educationproduce adjusted- R AT T N = 0 . R V M = 0 . R EF = 0 . Y Model B1 (strRegAP) Model B2 (Step Velocity) Model B3 (Cadence)ATTN Coef est p-value Coef est p-value Coef est p-value β -2.471 0.017 β -1.964 0.089 β L L L -0.025 0.035 L L L L L L -0.137 0.122 L -6.456 0.415 L -0.004 0.961 L β -4.469 0.036 β -3.283 0.159 β × − sex (M) -1.591 5 . × − sex (M) -1.583 3 . × − edu 0.171 0.001 edu 0.119 0.025 edu 0.123 0.022 L L . × − L -0.030 0.213 L L L L L L -0.158 0.389 L L L β -3.307 0.048 β -3.623 0.048 β . × − sex (M) -1.329 1 . × − sex (M) -1.368 3 . × − edu 0.134 0.001 edu 0.108 0.009 edu 0.103 0.012 L -0.402 0.804 L L -0.036 0.053 L L L . × − L -3.396 0.765 L L -0.264 0.061 L -1.960 0.879 L -0.064 0.664 L iometrics Supporting Information for Distributional data analysis via quantile functionsand its application to modelling digital biomarkers of gait in Alzheimer’sDisease

This paper has been submitted for consideration for publication in

Biometrics a r X i v : . [ s t a t . M E ] F e b

1. Web Appendix 1: Estimation in Qunatile function based FGAM

For identiﬁability of the FGAM (model (5) in the paper), the following constraint is im-posed: P ni =1 R F ( Q i ( p ) , p ) = 0 (McLean et al., 2014; Wood, 2017). Note that the proposedapproach of using F { Q ( p ) , p } capturing the smooth eﬀect of the subject-speciﬁc quantilefunction at each quantile level is diﬀerent from using the transformation approach in McLeanet al. (2014), which is focused on studying the smooth diurnal eﬀect F { G t ( X ( t )) , t } usingpopulation level distribution function G t ( x ). Nevertheless, the same model ﬁtting procedurecan be used. In particular, we model the bivariate function F ( · , · ) using a tensor product ofunivariate B-spline basis functions. Suppose, { B Q,k ( q ) } Kk =1 and { B P,‘ ( p ) } L‘ =1 be a set of knownbasis functions over q (where Q ( p ) = q ) and p , respectively. Then, F ( · , · ) is modelled usinga tensor product two basis functions as F { Q i ( p ) , p } = P Kk =1 P L‘ =1 θ k,‘ B Q,k { Q i ( p ) } B p,‘ ( p ).Using this expansion model (5) (in the paper) can be reformulated as g ( µ i ) = α + Z Ti γ + K X k =1 L X ‘ =1 θ k,‘ Z B Q,k { Q i ( p ) } B p,‘ ( p ) dp = α + Z Ti γ + W Ti θ , (1)where we denote the KL − dimensional vector of B Q,k { Q i ( p ) } B P,‘ ( p )’s as W i , and θ isthe vector of unknown basis coeﬃcients θ k,‘ ’s. Then, the penalized negative log likelihoodcriterion for estimation is given by S ( ψ ) = R ( α, γ , θ ) = − logL ( α, γ , θ ; Y i , Z i , W i ) + θ T P θ . (2)Here, P = λ q D Tq D q ⊗ I K + λ p D Tp D p ⊗ I L is a penalty matrix consisting of second order rowand column penalties imposing smoothness on F ( · , · ) in both directions (Marx and Eilers,1998). The parameters are estimated using penalized iteratively re-weighted least squares(P-IRLS) as in McLean et al. (2014). We use the refund package (Goldsmith et al., 2018)for implementation of FGAM. Biometrics, 000

2. Web Appendix 2, JIVE Algorithm using L-momentsAlgorithm 1: JIVE using L-Moments of Quantile functions Goal : To estimate joint and individual structure in multi-modal distributional data2.

Input : L , the block data matrix containing L-moments L ( d ) i

3. for iter = 1 to A do4. Determine ranks s and s d s.5. Estimate J by rank s SVD of L , set J = USV T

6. for d = 1 to D do7. ˜ L d = L d − J d

8. Estimate A d by rank s d SVD of ˜ L d ( I − VV T ), set L d = L d − A d

9. Set L =  L L ... L D  Return J , A d .

3. Supporting Tables

Web Tables 1, 2, 3, 4, 5, 6 referenced in the paper are given below.[Table 1 about here.][Table 2 about here.][Table 3 about here.][Table 4 about here.][Table 5 about here.][Table 6 about here.]

4. Supporting Figures

Web Figures 1, 2, 3, 4, 5 referenced in the paper are given below.[Figure 1 about here.][Figure 2 about here.][Figure 3 about here.][Figure 4 about here.][Figure 5 about here.]

References

Goldsmith, J., Scheipl, F., Huang, L., Wrobel, J., Gellar, J., Harezlak, J., McLean, M. W.,Swihart, B., Xiao, L., Crainiceanu, C., and Reiss, P. T. (2018). refund: Regression withFunctional Data . R package version 0.1-17.Marx, B. D. and Eilers, P. H. (1998). Direct generalized additive modeling with penalizedlikelihood.

Computational Statistics & Data Analysis

Journal of Computational and Graphical Statistics

Generalized additive models: an introduction with R . CRC press.

Biometrics, 000 . . . . . . Mean_Stride_Time__s_ p quan t il e f un c t i on q ( p ) AD Y, n=38AD N, n=48 m ean_ s t r i de_ t i m e p F^(mean_stride_time,p)

Step_Velocity__cm_sec_ p quan t il e f un c t i on q ( p ) AD Y, n=38AD N, n=48 S t ep_ v e l o c i t y

50 100150 200 p F^(Step_velocity,p)

Figure 1 : Displayed are additive eﬀects of quantile functions of Mean Stride Time s(top) and Step Velocity cm sec (bottom) obtained using FGAM and the estimated averagequantile functions of two groups (left column) and bivariate surfaces of quantile eﬀect ˆ F ( q, p )(right column). F^ ( mean_stride_time , p ) by mean_stride_time q F ^ ( m ean_ s t r i de_ t i m e , p ) p=0.1p=0.25p=0.5p=0.75p=0.9

50 100 150 200 250 − F^ ( Step_velocity , p ) by Step_velocity q F ^ ( S t ep_ v e l o c i t y , p ) p=0.1p=0.25p=0.5p=0.75p=0.9 Figure 2 : Displayed are sliced eﬀect of the FGAM surface of the two gait metrics ˆ F ( q, p ),for p = 0 . , . , . , . , . Biometrics, 000 c aden c e L_ c aden c e L_ c aden c e L_ c aden c e L_ c aden c e m u_ c aden c e m u_ c aden c e m u_ c aden c e m u_ c aden c e c _ c aden c e c _ c aden c e c _ Var2 V a r −1.0−0.50.00.51.0 correlation among moments Figure 3 : Displayd is the heatmap of sample correlation among the ﬁrst four L-moment (L)s, regular moments (mu) and central moments (c) of the gait metric Cadence V time-domain. − . − . − . − . . . . L moments Analysis p b ( p ) s t ep_ v e l o c i t y − . − . − . − . . . . SOFR analysis p b ( p ) s t ep_ v e l o c i t y − . − . − . . . . . L moments Analysis p b ( p ) C aden c e − . − . − . . . . . SOFR analysis p b ( p ) C aden c e − − − L moments Analysis p b ( p ) s t r R eg AP − − − SOFR analysis p b ( p ) s t r R eg AP Figure 4 : Displayed are linear functional eﬀects β ( p ) from L-moments analysis and SOFRanalysis of the gait measures Step Velocity cm sec (top), Cadence V time domain (mid-dle) and strRegAP (bottom). Biometrics, 000

IndividualData P a c e R h y t h m V a r i ab ili t y Joint Noise === +++ +++ P a c e R h y t h m V a r i ab ili t y Figure 5 : Heatmaps showing JIVE decomposition of the L-moments of gait measures.Columns represent subjects and rows represent features. Rows and columns are orderedby complete linkage clustering of the joint structure.

Table 1: Summary statistics for the complete, AD and CNC samples. No statistical diﬀerencebetween the AD and CNC groups are observed across age, BMI, or V0 max. However, ADgroup had a smaller percentage of females (26 . . . . Characteristic Complete sample AD CNC P valueMean/Freq SD Mean/Freq SD Mean/Freq SDAge 73.21 7.13 73.24 7.71 73.19 6.73 0.975% Female 50 N/A 26.32 N/A 68.75 N/A < Biometrics, 000

Table 2: Displayed are the results from logistic regression models of cognitive status(adjusted for age and sex) on ﬁrst four regular moments ( µ r ) of strRegAP (Model A11) ,Step Velocity cm sec (Model A21) and Cadence V time domain (Model A31) respectively. Model A11 (strRegAP) Model A21 (Step Velocity) Model A31 (Cadence)Coef est p-value Coef est p-value Coef est p-value β -4.098 0.439 β -95.18 0.094 β -713.7 0.202age -0.039 0.281 age -0.058 0.167 age -0.032 0.373sex (M) 2.2 7 . × − sex (M) 2.61 1 . × − sex (M) 1.727 0.002 µ µ µ µ -97.62 0.272 µ -0.059 0.0712 µ -0.381 0.203 µ µ . × − µ µ -10.35 0.450 µ − . × − µ − . × − .

01% Devianceexplained 30 .

29% Devianceexplained 20 . Table 3: Displayed are the results from logistic regression models of cognitive status (adjustedfor age and sex) on mean and three central moments ( µ , µ , µ ) of strRegAP (Model A12) ,Step Velocity cm sec (Model A22) and Cadence V time domain (Model A32) respectively. Model A12 (strRegAP) Model A22 (Step Velocity) Model A32 (Cadence)Coef est p-value Coef est p-value Coef est p-value β β β . × − sex (M) 4.01 4 . × − sex (M) 2.898 0.001 µ -1.097 0.692 µ -0.036 0.168 µ µ -65.056 0.003 µ -0.009 0.024 µ -0.045 0.003 µ -45.54 0.515 µ − . × − µ − × − µ µ . × − µ . × − .

1% Devianceexplained 48 .

14% Devianceexplained 47 . Biometrics, 000 T a b l e : C o m p l e t e li s t o f t h e ga i t m e a s u r e s , t h e i r d e s c r i p t i o n s a nd a ss o c i a t e dd o m a i n s . G a i t M e a s u r e D e s c r i p t i o n D o m a i n G a i t M e a s u r e D e s c r i p t i o n D o m a i n A c t i v i t y L e v e l[ g ] m e a n s i g n a l v ec t o r m ag n i t ud e A m p li t ud e C a d e n ce V (t i m e d o m a i n ) [ s t e p s / m i n ] nu m b e r o f s t e p s p e r m i nu t e R h y t h m r n g V , M L , A P [ g ] a cce l e r a t i o n r a n g e A m p li t ud e r m s V , M L , A P [ g ] a cce l e r a t i o n r oo t m e a n s q u a r e A m p li t ud e f r q V , M L , A P [ H z ] d o m i n a n t f r e q u e n c y o f p o w e r s p ec tr u m R h y t h m a m p V , M L , A P [ g2/ H z ] a m p li t ud e o f t h e d o m i n a n t f r e q u e n c y V a r i a b ili t y w d V , M L , A P [ H z ] w i d t h o f t h e d o m i n a n t f r e q u e n c y V a r i a b ili t y s l p V , M L , A P [ g2/ H z ] a m p / w d V a r i a b ili t y s t p R e g V , M L , A P [ un i t l e ss ] s t e p r e g u l a r i t y S y mm e tr y s tr R e g V , M L , A P [ un i t l e ss ] s tr i d e r e g u l a r i t y V a r i a b ili t y s t e pS y m V , M L , A P [ un i t l e ss ] s t e p s y mm e tr y ( = s t p R e g/ s tr R e g ) S y mm e tr y s t e p T i m e V , M L , A P [ s ] s t e p t i m e ( c a l c u l a t e d f r o m a u t o c o rr e l a t i o n f un c t i o n ) R h y t h m s tr i d e T i m e V , M L , A P [ s ] s tr i d e t i m e ( c a l c u l a t e d f r o m a u t o c o rr e l a t i o n f un c t i o n ) R h y t h m H R v , m l, a p [ un i t l e ss ] h a r m o n i c r a t i o S y mm e tr y C a d V , M L , A P ( f r e q u e n c y d o m a i n ) [ s t e p s / m i n ] f r q *60 R h y t h m M e a nS tr i d e T i m e [ s ] m e a n s tr i d e t i m e R h y t h m C V S tr i d e T i m e [ % ] ( s t a nd a r d D e v i a t i o n / m e a n ) V a r i a b ili t y M e a nS t e p T i m e [ s ] m e a n s t e p t i m e R h y t h m C V S t e p t i m e [ % ] ( s t a nd a r d D e v i a t i o n / m e a n ) V a r i a b ili t y M e a nS t e p L e n g t h [ c m ] m e a n s t e p l e n g t h P a ce C V S t e p l e n g t h ( s t a nd a r d D e v i a t i o n / m e a n ) V a r i a b ili t y S t e p V e l o c i t y [ c m / s ec ] m e a n s t e p l e n g t h / m e a n s t e p t i m e P a ce D i s t a n ce [ m ] s u m o f s t e p l e n g t h P a ce M e a n P e a k s m e a n o f p e a k s R h y t h m S T D P e a k ss d o f p e a k s V a r i a b ili t y C V P e a k s C V o f p e a k s V a r i a b ili t y M e a n A M PP e a k s m e a np e a k a m p li t ud e s (t i m e d o m a i n ) A m p li t ud e S T D A M PP e a k ss d o f p e a k a m p li t ud e s (t i m e d o m a i n ) V a r i a b ili t y C VA M PP e a k s ( s t a nd a r d D e v i a t i o n / m e a n ) V a r i a b ili t y Table 5: Displayed are the results from generalized additive models (GAM) of cognitive scores(ATTN, VM and EF) on ﬁrst four L-moments of strRegAP , Step Velocity cm sec andCadence V time domain after adjusting for age, sex and education. Benchmark models usingage, sex and education produce (adjusted) R AT T N = 0 . R V M = 0 . R EF = 0 . −

38% gain) is noticed in terms of adjusted R-squaredcompared to models B1-B3 (Table 3 in the paper) indicating potential non-linearity in theeﬀects of the subject-speciﬁc L-moments.

Y Model B1 (strRegAP) Model B2 (Step Velocity) Model B3 (Cadence)ATTN Coef p-value Coef p-value Coef p-value β β β h ( L ) 0.024 h ( L ) 0.876 h ( L ) 0.092 h ( L ) 0.002 h ( L ) 0.0471 h ( L ) 0.005 h ( L ) 0.179 h ( L ) 0.482 h ( L ) 0.422 h ( L ) 0.177 h ( L ) 0.636 h ( L ) 0.874adj-Rsq 0.395 adj-Rsq 0.232 adj-Rsq 0.259VM β β β . . × − sex (M) 0.00026edu 0.00344 edu 0.0126 edu 0.011 h ( L ) 0.265 h ( L ) 0.444 h ( L ) 0.678 h ( L ) 2 . × − h ( L ) 0.00875 h ( L ) 3 . × − h ( L ) 0.071 h ( L ) 0.170 h ( L ) 0.748 h ( L ) 0.0149 h ( L ) 0.567 h ( L ) 0.715adj-Rsq 0.472 adj-Rsq 0.435 adj-Rsq 0.444EF β β β . . × − sex (M) 9 . × − edu 0.00226 edu 0.007 edu 0.005 h ( L ) 0.273 h ( L ) 0.252 h ( L ) 0.138 h ( L ) 0.00051 h ( L ) 0.0164 h ( L ) 0 . h ( L ) 0.474 h ( L ) 0.866 h ( L ) 0.163 h ( L ) 0.413 h ( L ) 0.916 h ( L ) 0.058adj-Rsq 0.491 adj-Rsq 0.407 adj-Rsq 0.4554 Biometrics, 000