[PDF] Hierarchical Continuous Time Hidden Markov Model, with Application in Zero-Inflated Accelerometer Data

Abstract

Wearable devices including accelerometers are increasingly being used to collect high-frequency human activity data in situ. There is tremendous potential to use such data to inform medical decision making and public health policies. However, modeling such data is challenging as they are high-dimensional, heterogeneous, and subject to informative missingness, e.g., zero readings when the device is removed by the participant. We propose a flexible and extensible continuous-time hidden Markov model to extract meaningful activity patterns from human accelerometer data. To facilitate estimation with massive data we derive an efficient learning algorithm that exploits the hierarchical structure of the parameters indexing the proposed model. We also propose a bootstrap procedure for interval estimation. The proposed methods are illustrated using data from the 2003 - 2004 and 2005 - 2006 National Health and Nutrition Examination Survey.

Full PDF

HHierarchical Continuous Time Hidden Markov Model,with Application in Zero-Inﬂated Accelerometer Data

Zekun Xu ∗ Eric B. Laber † Ana-Maria Staicu ‡ Abstract

Wearable devices including accelerometers are increasingly being used to collect high-frequency human activity data in situ . There is tremendous potential to use such data to informmedical decision making and public health policies. However, modeling such data is chal-lenging as they are high-dimensional, heterogeneous, and subject to informative missingness,e.g., zero readings when the device is removed by the participant. We propose a ﬂexibleand extensible continuous-time hidden Markov model to extract meaningful activity patternsfrom human accelerometer data. To facilitate estimation with massive data we derive ane ﬃ cient learning algorithm that exploits the hierarchical structure of the parameters indexingthe proposed model. We also propose a bootstrap procedure for interval estimation. Theproposed methods are illustrated using data from the 2003 - 2004 and 2005 - 2006 NationalHealth and Nutrition Examination Survey. Keywords:

Continuous-time hidden Markov model; Consensus optimization; Accelerometer data. ∗ Department of Statistics, North Carolina State University (Email: [email protected] ) † Department of Statistics, North Carolina State University (Email: [email protected] ) ‡ Department of Statistics, North Carolina State University (Email: [email protected] ) a r X i v : . [ s t a t . C O ] D ec Introduction

The development of the wearable technology has given rise to a variety of sensing devices andmodalities. Some of these devices, e.g., Fitbit or Apple Watch, can be worn continuously andthereby produce huge volumes of high-frequency human activity data. Because these data presentlittle burden on the wearer to collect and provide rich information on the in situ behavior ofthe wearer, they have tremendous potential to inform decision making in healthcare. Examplesof remote sensing data in healthcare include elder care, remote monitoring of chronic disease,and addition management (Bartalesi et al., 2005; Marshall et al., 2007; Hansen et al., 2012).Accelerometers are among the most commonly used and most widely studied types of wearabledevices, they have been used both in randomized clinical trials to evaluate treatment e ﬀ ect onactivity-related impairment (Napolitano et al., 2010; Kanai et al., 2018) and in observationalstudies to characterize activity patterns in a free-living environment (Morris et al., 2006; Hansenet al., 2012; Xiao et al., 2014). However, despite rapidly growing interest and investment inwearable devices for the study of human activity data, a general and extensible class of models foranalysis of the resulting data is lacking.We propose a continuous-time hidden Markov model for the modeling human accelerometerdata that aligns with scientiﬁc (conceptual) models of human activity data; in the proposed model,latent states correspond to latent (unobserved) activities, e.g., resting, running, jumping, etc., thatare shared across the population yet the accelerometer signatures within these activities are allowedto vary across subjects. Furthemore, di ﬀ erences across subgroups, e.g., deﬁned by sex, age, or thepresence-absence of a comorbid condition, can be identiﬁed by aggregating individual-level e ﬀ ectsacross these groups. This work is motivated in part by the physical activity data set from the 2003 -2004 National Health and Nutrition Examination Survey (NHANES). In this study, human activitypatterns were measured at one-minute intervals for up to seven days using the ActiGraph Model7164 accelerometer (Troiano et al., 2008; Metzger et al., 2008; Schmid et al., 2015). Activity foreach minute was recorded as an integer-valued intensity-level commonly referred to as an activitycount. In the study, subjects were instructed to remove the device during sleep or while washing(to keep it dry). Therefore, the observed data comprise high-frequency, integer-valued activitycounts for each subject with intervals of missing values corresponding to when the device wasremoved.The goal of paper is to use the observed data to characterize activity patterns of each subject,subjects within pre-deﬁned subgroups, and the population as a whole. This is important becausethe estimated physical activity model can potentially serve the following three purposes. On thesubject level, the estimated activity model can be used both for the prediction of future activitiesand the imputation of missing activity readings. On the subgroup level, the estimated activitypatterns provide useful insights into clustering people based on their activity proﬁles. On thepopulation level, public policies can be designed based on the estimated activity model so as toencourage everyday exercise and healthy life style.Prior work on modeling activity counts has focused on aggregation and other smoothingtechniques. One common approach is to average the activity counts over time for each subjectand then compare the group means using two-sample t-tests (Troiano et al., 2008), analysis ofcovariance (Hansen et al., 2012), or linear mixed e ﬀ ects models (Cradock et al., 2004). However,in these approaches, averaging focuses on overall activity levels and may obscure trends in activity2ype, activity duration, and transitions between activities. Another approach is to use functionaldata analysis methods wherein the integer activity counts are ﬁrst log-transformed to ﬁx theright-skewedness in the distribution and the transformed activity modeled as a function of timeof day and other covariates (Morris et al., 2006; Xiao et al., 2014; Gruen et al., 2017). Theseapproaches are best-suited for the identiﬁcation of smooth, cyclical patterns in the data whereasthe observed accelerometer data are characterized by abrupt (i.e., non-smooth) changes in activitylevels.Discrete-time hidden Markov models are another common approach to the analysis of mobilitydata measured by wearable devices (He et al., 2007; Nickel et al., 2011; Ronao and Cho, 2014;Witowski et al., 2014). In these models, activity is partitioned into di ﬀ erent latent behavioral statesand the observed activity count is dependent on the unobserved latent activity. The latent statesevolve according to a discrete-time Markov process and a primary goal is the correct classiﬁcationof the latent activity. To construct and validate these models requires training data that are labeledby latent activity. However, the NHANES data, like many accelerometer studies, are not labeledby activity. Furthermore, our goal is to identify the dynamics of a patients evolution through theselatent activities including activity duration, activity intensity, and transitions between activities.Discrete-time hidden Markov models have been used to model latent health states and subsequentlyconduct inference for activity patterns within each state (Scott et al., 2005; Altman, 2007; Shirleyet al., 2010), but the time scales in these applications are rather coarse (daily or weekly). Bycontrast, the physical activity in the NHANES data is measured for each minute; this results in amuch larger data volume and the ability to provide are more complete picture of activity dynamics.A technical limitation of the discrete-time approach, is that it assumes that the observationsare equally spaced in time. Continuous-time hidden Markov models (CTHMM) have been used toanalyze the irregularly-sampled temporal measurements (Nodelman et al., 2012; Wang et al., 2014;Liu et al., 2015). The ﬂexibility of the CTHMM comes at the expense of increased computationalcost, which makes it infeasible for large datasets without modiﬁcation. Liu et al. (2015) developedan e ﬃ cient learning algorithm for parameter estimation in the CTHMMs. However, this algorithmis only suitable for either the completely pooled or unpooled cases wherein all subjects are assumedto be either completely homogeneous so that they share the same parameters, or completelyheterogeneous so that all parameters are subject-speciﬁc. Moreover, the algorithm cannot estimatethe e ﬀ ects of subject covariates and environmental factors on activity counts.We propose to model minute-by-minute accelerometer data using a hierarchical continuous-time hidden Markov model (HCTHMM). This model is aligned with scientiﬁc models of activityas the latent states represent di ﬀ erent types of unobserved physical activities. The continuous-time Markov process for the latent states evolution avoids having to perform imputation formissing yet allows for the possibility that the temporal measurements are irregularly spaced.Furthermore, the proposed model can incorporate both baseline subject covariates and time-varying environmental factors. The proposed model is hierarchical in that it is parameterized by:(i) subject-speciﬁc parameters to account for variability between subjects; (ii) subgroup-speciﬁcparameters parameters to account for similarity in activity patterns within groups; (iii) populationparameters that are common across all subjects. This speciﬁcation allows us to pool informationon some parameters while retaining between-group and between-subject variability. We proposedan estimator of these parameters that is based on consensus optimization using the alternatingdirection method of multipliers (ADMM). There is a vast literature on the convergence properties3f ADMM (Boyd et al., 2011; Shi et al., 2014; Hong and Luo, 2017) which can be readily portedto the proposed algorithm. Finally, we use the nonparametric bootstrap (Efron, 1992) to estimatethe sampling distributions of parameter estimators and to conduct statistical inference. We assume that the observed data are of the form { W i , Y i ( T i ) , X i ( T i ) } ni = , which comprise n independent copies of the trajectory { W , Y ( T ) , X ( T ) } , where: W ∈ R p are baseline subjectcharacteristics; Y ( T ) = { Y ( T ) , . . . , Y ( T K ) } are the non-negative integer activity counts at times T = ( T , . . . , T K ) ∈ [0 , K ; and X ( T ) = { X ( T ) , . . . , X ( T K ) } are concurrent environmental factorssuch that X ( · ) ∈ R q . Both T and K are treated as random variables as the number and timing ofobservations vary across subjects. We model the evolution of the observed data using a hierarchicalcontinuous-time hidden Markov model (HCTHMM), which we will develop over the remainderof this section.Let S i ( t ) ∈ { , . . . , M } denote the unobserved latent state for subject i = , . . . , n at time t ∈ [0 , π i ( m ) (cid:44) P { S i (0) = m ) } for m = , . . . , M such that (cid:80) Mm = π i ( m ) =

1; (ii) atransition rate matrix Q i = { q i ( m , (cid:96) ) } m ,(cid:96) = ,..., M such that q i ( m , m ) = − (cid:80) (cid:96) (cid:44) m q i ( m , (cid:96) ). The transitionrate matrix, also known as the inﬁnitesimal generator matrix, describes the rate of movementsbetween states in a continuous-time Markov chain (Pyke, 1961a,b; Albert, 1962); the transitionprobabilities are derived from the transition rates through a matrix exponential operation such thatfor k = , . . . , K − t > u , P t − ui ( m , (cid:96) ) (cid:44) P { S i ( t ) = (cid:96) | S i ( u ) = m } = { e ( t − u ) Q i } m ,(cid:96) .We assume that the conditional distribution of the activity counts is homogeneous in time giventhe current latent state and environmental factors (to streamline notation, we include baselinecharacteristics in the time-varying environmental factors). For i = , . . . , n and m = , . . . , M deﬁne g i , m ( y ; x ) (cid:44) P { Y i ( t ) = y | S i ( t ) = m , X i ( t ) = x } . Because longitudinal activity count dataare zero-inﬂated, we set g i , ( y ; x ) to be the probability mass function for a zero-inﬂated Poissondistribution with structural zero proportion δ i and mean λ i , for state 1 such that g i , ( y ; x ) = δ i ( x )1 y = + { − δ i ( x ) } (cid:8) λ i , ( x ) (cid:9) y exp (cid:8) λ i , ( x ) (cid:9) y ! . For m = , . . . , M we model the activity counts using a Poisson regression model so that g i , m ( y ; x ) = (cid:8) λ i , m ( x ) (cid:9) y exp (cid:8) λ i , m ( x ) (cid:9) y ! . For each subject i , and latent state m , we assume that functions δ i ( x ) and λ i , m ( x )are of the formlog (cid:40) δ i ( x )1 − δ i ( x ) (cid:41) = b i , , + x (cid:48) b i , , , log (cid:8) λ i , m ( x ) (cid:9) = b i , m , + x (cid:48) b i , m , , where b i , , , . . . , b i , M , and b i , , , . . . , b i , M , are unknown coe ﬃ cients.In the foregoing model description, all parameters are subject-speciﬁc so that each subject’strajectory can be modeled separately. However, in the HCTHMM, some of the parameters are4hared among pre-deﬁned subgroups of the subjects. We assume that subjects are partitioned into J such subgroups based on their baseline characteristics W . For example, these groups might bedetermined by age and sex. Subjects within the same group are though to behave more similarlyto each other than across groups. Let G i ∈ { , . . . , J } be the subgroup to which subject i belongs,and let n j denote the number of subjects in group j = , . . . , J .The HCTHMM is a ﬂexible multilevel model in that it allows for three levels of of parameters:(i) subject-speciﬁc, (ii) subgroup-speciﬁc, (iii) population-level. For example, one might let theintercepts in the generalized linear models for state-dependent parameters be subject-speciﬁc toaccount for the between-subject variability; let the initial state probabilities and the transitionrate parameters depend on group-membership, i.e. π i = π i and Q i = Q i for all i , i suchthat G i = G i ; and let the slope parameters in the generalized linear models for state-dependentparameters be common across all subjects.If all the observed time points are equally spaced and all the parameters are subject-speciﬁc,the HCTHMM reduces to the subject-speciﬁc zero-inﬂated Poisson hidden Markov model. Ifthere are no covariates and all parameters are common for all subjects, the HCTHMM reducesto a zero-inﬂated variant of the continuous-time hidden Markov model (Liu et al., 2015). Theextension from the previous models to the HCTHMM better matches the scientiﬁc goals associatedwith analyzing the NHANES data but also requires new methods for estimation. Because of thehierarchical structure in the parameters, joint parameter estimation is no longer embarrassinglyparallelizable as it would be in the case of its completely pooled or unpooled counterparts. For subject i = , . . . , n , let a i ∈ R M − be the M − π i , and let c i ∈ R M ( M − be the M ( M −

1) free parameters in the transition matrix Q i . To simplifynotation, write b i , (cid:44) [ b i , , , . . . , b i , M , ] ∈ R M + and b i , (cid:44) [ b (cid:48) i , , , . . . , b (cid:48) i , M , ] ∈ R q ( M + to denote theparameters indexing the generalized linear models for the activity counts in each state. Deﬁnethe entire vector of parameters for subject i to be θ i (cid:44) [ a (cid:48) i , c (cid:48) i , b (cid:48) i , , b (cid:48) i , ] ∈ R ( M + M + q ) . Thelikelihood function for subject i is computed using the forward-backward algorithm (Rabiner,1989) as follows. For subject i = , . . . , n , deﬁne the forward variables for k = , . . . , K −

1, and m = , . . . , M , α T k i ( m ; θ i ) (cid:44) P θ i (cid:26) Y i ( T ) , . . . , Y i ( T k ) , S i ( T k ) = m (cid:12)(cid:12)(cid:12) XXX i ( T ) = x T ,. . . , XXX i ( T k ) = x T k (cid:27) . The initialization and recursion formulas are deﬁned as α T i ( m ; θ i ) = π i ( m ; θ i ) g i , m { Y i ( T ); x T , θ i } ,α T k + i ( m ; θ i ) =  M (cid:88) (cid:96) = α T k i ( (cid:96) ; θ i ) { e ( T k + − T k ) Q i } (cid:96), m  g i , m { Y i ( T k + ); x T k + , θ i } , m = , . . . , M and k = , . . . , K −

1. The negative log-likelihood for θ i is therefore f i ( θ i ) = − log (cid:110)(cid:80) Mm = α T K i ( m ; θ i ) (cid:111) . Deﬁne the joint likelihood for θ = ( θ , . . . , θ n ) to be f ( θ ) = (cid:80) ni = f i ( θ i ).To compute the conditional state probabilities in the HCTHMM, we need to generate a set ofauxiliary backward variables analogous to the forward variables deﬁned previously. For subject i = , . . . , n , deﬁne the backward variables for k = , . . . , K , and m = , . . . , M , β T k i ( m ; θ i ) (cid:44) P θ i (cid:26) Y i ( T k + ) , . . . , Y i ( T K ) (cid:12)(cid:12)(cid:12) S i ( T k ) = m , XXX i ( T k + ) = x T k + , . . . , XXX i ( T K ) = x T K (cid:27) . The initialization and recursion formulas are β T K i ( m ; θ i ) = ,β T k i ( m ; θ i ) = M (cid:88) (cid:96) = { e ( T k + − T k ) Q i } m ,(cid:96) g i ,(cid:96) { Y i ( T k + ); x T k + , θ i } β T k + i ( (cid:96) ; θ i ) , where m = , . . . , M . The probability of state m for subject i at time t is γ ti ( m ; θ i ) (cid:44) P θ i { S i ( t ) = m | Y i ( T ) , . . . , Y i ( T K ) } = α ti ( m ; θ i ) β ti ( m ; θ i ) (cid:80) Mm = α ti ( m ; θ i ) β ti ( m ; θ i ) , where t = T , . . . , T k , m = , . . . , M , and i = , . . . , n . The mean probability of state m amongsubjects in group j is thus φ j ( m ; θ ) = n j (cid:88) { i : G i = j } η i ( m ; θ i ) , j = , . . . , J , where η i ( m ; θ i ) = K K (cid:88) k = γ T k i ( m ; θ i ) , m = , . . . , M . The mean state probabilities φ j ( m ; θ ) can be interpreted as the mean proportion of time spent inlatent state m for subjects in group j , whereas η i ( m ; θ i ) represents the mean proportion of timespent in state m for subject i . If all sets of parameters are subject-speciﬁc, then the maximum likelihood estimates for theparameters can be obtained by minimizing f ( θ ) using the gradient-based methods which can beparallelized across subjects. However, in the general setting where parameters are shared acrosssubgroups of subjects, such paralellization is no longer possible. Instead, we use the consensusoptimization approach to obtain the maximum likelihood estimates in the HCTHMM, which isperformed via the alternating direction method of multipliers (ADMM) (Boyd et al., 2011). Weuse the Bayesian Information Criterion (BIC) to select the number of latent states M .6et D denote a contrast matrix such that D θ = θ f ( θ ) s . t . D θ = . For the purpose of illustration, suppose that: (i) the intercepts in the generalized linear models forstate-dependent parameters are subject-speciﬁc; (ii) the initial state probabilities and the transitionrate parameters are subgroup-speciﬁc; (iii) the slope parameters in the generalized linear modelsfor state-dependent parameters are common across all subjects. Then D θ = a i = a i for all G i = G i ; (ii) c i = c i for all G i = G i ; (iii) b i , = b .In our illustrative example, the maximum likelihood estimator solvesmin θ , z f ( θ ) s . t . A i θ i = B i z , i = , . . . , n , where z represents the set of all subgroup-speciﬁc and common parameters in θ so the linearconstraint A i θ i = B i z is equivalent to D θ =

0. The corresponding augmented Lagrangian is L ρ ( θ , z , ξ ) = f ( θ ) + ξ T ( A θ − Bz ) + ρ (cid:107) A θ − Bz (cid:107) = n (cid:88) i = { f i ( θ i ) + ξ Ti ( A i θ i − B i z ) + ρ (cid:107) A i θ i − B i z (cid:107) } , where ξ = [ ξ (cid:48) , . . . , ξ (cid:48) n ] , A =  A · · · A · · · ... . . .

00 0 · · · A n  , B =  B B ... B n  . Here ξ are the Lagrange multipliers and ρ is a pre-speciﬁed positive penalty parameter. Let˜ θ ( v ) n , ˜ z ( v ) n , ˜ ξ ( v ) n be the v th iterates of θ , z , ξ . Then, at the iteration v +

1, the ADMM algorithm updatesare θ − update: ∇ f (˜ θ ( v + n ) + A T ˜ ξ ( v ) n + ρ A T ( A ˜ θ ( v + n − B ˜ z ( v + n ) = , z − update: B T ˜ ξ ( v ) n + ρ B T ( A ˜ θ ( v + n − B ˜ z ( v ) n ) = , ξ − update: ˜ ξ ( v + n − ˜ ξ ( v ) n − ρ ( A ˜ θ ( v + n − B ˜ z ( v + n ) = , where the most computationally expensive θ -update can be programmed in parallel across each i = , . . . , n as θ i − update: ∇ f i (˜ θ ( v + i , n ) + A Ti ˜ ξ ( v ) i , n + ρ A Ti ( A i ˜ θ ( v + i − B i ˜ z ( v ) n ) = . ∇ f i ( · ) for subject i ’s HMM parameters can be computed using Fisher’s identity(Cappé et al., 2005) based on the e ﬃ cient EM algorithm proposed in Liu et al. (2015). The detailsare included in the Supplementary Materials. The use of gradients is needed in our model bothdue to the ADMM update and the covariate structure. Even when there are no covariates and allparameters are shared across subjects, the gradient method is still faster than the EM algorithmbecause M-step is expensive in the zero-inﬂated Poisson distribution. Deﬁne ˆ θ n to be the maximum likelihood estimator for θ and let θ (cid:63) denote its population-levelanalog. The following are the su ﬃ cient conditions to ensure: (i) almost sure convergence of ˆ θ n to θ (cid:63) as n → ∞ , and (ii) numerical convergence of ˜ θ ( v ) n to ˆ θ n as v → ∞ .(A0) The true parameter vector θ (cid:63) for the unconstrained optimization problem min θ f ( θ ) is aninterior point of Θ , where Θ is a compact subset of R dim θ .(A1) The constraint set C (cid:44) { θ ∈ Θ ; D θ = } is nonempty and for some r ∈ R , the set { θ ∈ C ; f ( θ ) ≤ r } is nonempty and compact.(A2) The observed time process ( T k : k ∈ N ) is independent of the generative hidden Markovprocess: the likelihood for the observed times do not share parameters with θ .(A3) There exist positive real numbers 0 < κ − ≤ κ + < i = , . . . , n , κ − ≤ P T k + − T k i ( m , (cid:96) ) ≤ κ + for k = , . . . , K − m , (cid:96) = , . . . , M , almost surely and g i , m ( y ; x ) > y ∈ supp Y for some m = , . . . , M .(A4) For each θ ∈ Θ , the transition kernel indexed by θ is Harris recurrent and aperiodic. Thetransition kernel is continuous in θθθ in an open neighborhood of θθθ (cid:63) .(A5) The hidden Markov model is identiﬁable up to label switching of the latent states.(A6) The negative log likelihood function f ( θ ) is twice di ﬀ erentiable with respect to θ withbounded, continuous derivatives. Denote by e min ( θ ) and e max ( θ ) to be the smallest and largesteigenvalues of ∇ f ( θ ) then there exists positive real numbers 0 < (cid:37) − ≤ (cid:37) + < ∞ such that e min ( θ ) ≥ (cid:37) − > e max ( θ ) ≤ (cid:37) + < ∞ for all θ ∈ Θ .Assumption (A0)-(A2) are mild regularity conditions whereas (A3) - (A5) are standard in latentstate space models; together they ensures that the model is well-deﬁned. Assumption (A4) is aavoids non-standard asymptotic behavior associated with non-smooth functionals. Assumption(A6) can be used to show the Lipschitz continuity in the gradient and strong local convexity whichare used to establish the numerical convergence in the ADMM algorithm Shi et al. (2014). Theorem 3.1.

Under assumptions (A0) - (A6), as T i → ∞ for i = , . . . , n,(i) ˆ θ n converges to θ (cid:63) almost surely as n → ∞ ,(ii) ˜ θ ( v ) n converges numerically to ˆ θ n in an open neighborhood of θ (cid:63) as v → ∞ . θ n to the true parameter value θ (cid:63) . This can be shown using the uniformconvergence results of the log likelihood (Douc et al., 2004) for each subject-speciﬁc hiddenMarkov model, along with the feasibility assumption (A1) and identiﬁability assumption (A5).The second part of Theorem states the numerical convergence of the ADMM algorithm. This isanticipated by Boyd et al. (2011) which identiﬁes general conditions for the numerical convergenceof the residual, the dual variable, and the objective function. Shi et al. (2014) extended theconvergence to the primal variable by adding the Lipschitz continuity and strong convexityassumptions. The details for those assumptions, as well as the proof for Theorem 1, are includedin the Supplementary Materials.For i = , . . . , n , deﬁne the estimator for the mean proportion of time in state m in group k asˆ φ j , n ( m ) , j = , . . . , J , m = , . . . , M , and the estimator for the mean proportion of time in state m asˆ η i , k i where k i is the number of observed time points for subject i . The following result characterizesthe limiting behavior of the estimated time in each state. Theorem 3.2.

Under (A0) - (A6), as k i → ∞ for i = , . . . , n, n j → ∞ for j = , . . . , J,(i) ˆ φ j , n ( m ; ˆ θ n ) converges to µ j ( m ; θ (cid:63) ) almost surely,(ii) ˆ φ j , n ( m ;ˆ θ n ) − µ j ( m ;ˆ θ n ) (cid:113) σ j ( m ;ˆ θ n ) / n j converges in distribution to a standard normal random variable, where µ j ( m ; ˆ θ n ) = E [ ˆ η i , k i ( m ; ˆ θ n )] , σ j ( m ; ˆ θ n ) = Var [ ˆ η i , k i ( m ; ˆ θ n )] for all i such that G i = j . A proof of the preceding result is given in the Supplementary Materials, which follows fromthe almost sure convergence of a bounded continuous function and the central limit theorem. Inprinciple, µ j ( m ; ˆ θ n ) can be obtained from the limiting distribution of a stationary continuous-timeMarkov chain, which is determined by the transition as a function of ˆ θ n . However, it is generallynot easy to compute the standard error analytically for the estimated mean state probabilities.Instead, we use a stratiﬁed nonparametric bootstrap (Efron, 1992) in which we resample subjectswith replacement from each subgroup. We study the ﬁnite sample performance of the proposed estimator for the state probabilitiesusing a suite of simulation experiments. We simulate minute-by-minute activity counts of length T ∼ Uniform(500 , n =

20 and n =

200 subjects, where half of the subjects are male(Group 1) and the other half female (Group 2). The intervals between consecutive time points areindependently drawn from { , , . . . , } with equal probabilities. For each subject, we assume 2 / / (cid:40) δ i ( x )1 − δ i ( x ) (cid:41) = b i , , + . × I { Weekend } ti , log (cid:8) λ i , ( x ) (cid:9) = b i , , − . × I { Weekend } ti , log (cid:8) λ i , ( x ) (cid:9) = b i , , − . × I { Weekend } ti , log (cid:8) λ i , ( x ) (cid:9) = b i , , − . × I { Weekend } ti , where b i , , iid ∼ N ( − , . ) , b i , , iid ∼ N (cid:110) log(50) , . (cid:111) , b i , , iid ∼ N (cid:110) log(300) , . (cid:111) , b i , , iid ∼ N (cid:110) log(700) , . (cid:111) are subject-speciﬁc intercepts; the weekend e ﬀ ect is assumed to be common across all subjects.The initial probabilities for male are ( U , U , − U − U ), where U , U iid ∼ Uniform(0 . , . U , U , − U − U ), where U iid ∼ Uniform(0 . , . U iid ∼ Uniform(0 . , .  − U − U U U U − U − U U U U − U − U  , where U , . . . , U iid ∼ Uniform(0 . , .  − U − U U U U − U − U U U U − U − U  , where U , U iid ∼ Uniform(0 . , . U , U iid ∼ Uniform(0 . , . U , U iid ∼ Uniform(0 . , . Parameter n =

20 n = Table 1: The bias and standard error for the estimated population and subgroup-speciﬁc parametersin HCTHMM. The metric is Euclidean norm for population parameters, and Frobenius norm forsubgroup-speciﬁc parameter vectors. 10igure 1: Average runtime (seconds) for each ADMM iteration in di ﬀ erent conﬁgurations (numberof subjects, length of each series). State Subject-speciﬁc HMM HCTHMM

Male Female Male Female

95% C.I.:1 .942 .942 .954 .9522 .944 .938 .960 .9503 .946 .936 .944 .96299% C.I. :1 .982 .988 .986 .9942 .986 .984 .986 .9883 .978 .980 .994 .992

Table 2: Comparisons on the mean coverage probability for the 95% and 99% bootstrap conﬁdenceintervals for the mean proportion of time in each latent state between subject-speciﬁc HMM andHCTHMM (n = ﬀ erent hierarchies of pa-rameters in the HCTHMM via 500 simulations. In both cases, the biases are small due to thefact that the length of each individual series is large. As the sample size increases, the standarderrors become smaller which is expected. Figure 1 shows the average runtime (seconds) for eachADMM iteration scales linearly with the number of subjects. It generally takes some 30 to 100iterations for the algorithm to converge. Table 2 compares the mean coverage probability ofthe 95% and 99% bootstrap conﬁdence intervals for the mean proportion of time in each latentstate bewteen a baseline subject-speciﬁc HMM and the proposed HCTHMM when the samplesize is 200. As we can see, the baseline subject-speciﬁc HMM su ﬀ ers undercoverage (coverageprobability smaller than nominal level) in some of the latent states, while the proposed HCTHMMrecovers the nominal level well in both the 95% and 99% cases.11 Application

The motivating application is a human physical activity data set from the 2003 - 2004 Na-tional Health and Nutrition Examination Survey (NHANES), which is publicly available at theNational Center for Disease Control (CDC) website . There are 7,176 participants in the study, and for each participantwe have minute-by-minute activity counts for up to seven days. As the subjects were supposedto remove the accelerometer when washing, there are prolonged intervals during the day whenaccelerometer readings are zeros. We further impose the following two inclusion / exclusioncriteria, • Subjects whose age is between 20 and 60 are included. • Subjects with very few measurements are excluded.The ﬁrst criterion speciﬁes the scope of inference. The second criterion exclude subjects withvery few non-missing data available ( <

500 minutes out of 7 days). There are 2,467 subjects whosatisfy both conditions, which constitute more than 95% of those whose age is between 20 and 60.Further, we split those subjects by their baseline characteristics (gender, age) into 4 subgroups.Subgroup 1 consists of 608 male subjects with age from 20 to 40; subgroup 2 consists of 557 malesubjects with age from 40 to 60; subgroup 3 consists of 712 female subjects with age from 20 to40; and subroup 4 consists of 590 female subjects with age from 40 to 60.

Literature Deﬁnition of missingCradock et al. (2004) 30 minutesCatellier et al. (2005) 20 minutesTroiano et al. (2008) 60 minutesRobertson et al. (2010) 20 minutesEvenson (2011) 20 minutesSchmid et al. (2015) 60 minutesLee and Gill (2016) 20 minutes

Table 3: The deﬁnition of missing interval in terms of consecutive minutes of zeros in the literatureon human activity.Table 3 summarizes the related work on the length of an extended period of zero activity countsto be deﬁned as missingness. In this paper, we choose to deﬁne missingness as a sustained intervalof greater than or equal to 20 consecutive zero activity counts, which is the most commonly usedcriterion in the literature. Most missingness occurs between 10 pm to 8 am, which is the sleeptime for most of the subjects. There is still sporadic missingness during other periods of time inthe day, which may correspond to activities like swimming or bathing. The missingness periodsare removed during the data preprocessing. The average proportion of zeros after removing themissingness is around 25%, so that zero-inﬂation is still an issue to be considered in the modeling.In the data preprocessing, activity counts greater than 1,500 ( < odel speciﬁcations BIC5 states, type I 248,009,0826 states, type I 202,804,0817 states, type I 203,261,1506 states, type II 200,457,2176 states, type III 198,808,7386 states, type IV 198,807,080 Table 4: Summary of BIC from model selection. In type I models, all parameters are subject-speciﬁc. In type II models, all parameters are subject-speciﬁc except the slopes, which arepopulation parameters. In type III models, the intercepts are subject-speciﬁc; the slopes arepopulation; the initial probabilities and transitions are subgroup-speciﬁc. In type IV models, allparameters are subgroup-speciﬁc except the intercepts, which are subject-speciﬁc.To apply the HCTHMM model on the activity counts data, we need to select the number oflatent states as well as the hierarchy for di ﬀ erent sets of the parameters. The weekend e ﬀ ect isadjusted for in the Poisson and zero-inﬂated Poisson regression on the activity counts in eachlatent state. By the minimum BIC criterion as shown in Table 4, we select the type IV HCTHMMwith 6 latent states, where the intercepts in the state-dependent generalized linear models forlogit zero proportion in state 1 and log Poisson means in the all states are subject-speciﬁc, whilethe initial probabilities, transition rates, and the slopes in the state-dependent generalized linearmodels are subgroup-speciﬁc. This ﬁnal model indicates the baseline zero proportion and meanactivity counts in each latent state vary across subjects. For all the other parameters, the between-subgroup variability is more prominent than the within-subgroup variability. Figure 2 shows the99% conﬁdence interval for the estimated proportion of time spent in latent activity states foreach subgroup in 03 - 04 NHANES. There are several interesting ﬁndings. First, younger menspend less time in the low intensity activity states (state 1, 2) than older men and women. Second,men spend less time than women in the medium intensity activity states (state 3, 4). Third, menspend more time than women in the high intensity activity states (state 5, 6). Figure 3 plots theestimated quantiles versus the observed quantiles for the accelerometer data, which aligns wellalong the 45-degree line, indicating no lack of ﬁt. To validate the results, we apply the HCTHMMmethodology to 05 - 06 NHANES, which has the same study setup and data structure as the 03 -04 NHANES. Figure 4 shows the 99% conﬁdence interval for the estimated proportion of timespent in latent activity states for each subgroup in 05 - 06 NHANES, which has a similar patternas seen in 03 - 04 NHANES. 13igure 2: The 99% bootstrap conﬁdence intervals for the estimated proportion of time spent inlatent activity states by subgroup in 03 - 04 NHANES.Figure 3: The estimated quantiles versus the observed quantiles of the accelerometer data in 03 -04 NHANES, which shows no evidence of lack of ﬁt.14igure 4: The 99% bootstrap conﬁdence intervals for the estimated proportion of time spent inlatent activity states for each subgroup in 05 - 06 NHANES. We propose HCTHMM to be valid inference strategy for the longitudinal activity data. Within thisframework, we can estimate the mean state probabilities for di ﬀ erent subgroups of subjects as wellas quantify the uncertainty. Our ﬁndings are consistent with previous literature on human physicalactivity (Metzger et al., 2008; Troiano et al., 2008; Hansen et al., 2012; Xiao et al., 2014), whichindicated that the physical activity can be classiﬁed into di ﬀ erent categories by intensity, and thatthe activity level decreases as a result of aging. Moreover, women tend to spend more time inlighter intensity activity, whereas younger men tend to have periods of higher intensity activities.In the future, this HCTHMM framework can be extended to the controlled clinical studiesto estimate certain treatment e ﬀ ects in a speciﬁc cohort of patients. We can also allow for time-varying covariates in the transition rates. Moreover, when some model parameters are trulysubject-speciﬁc or subgroup-speciﬁc, it may be more powerful to model them as random so thattests based on variance components can be constructed to test their e ﬀ ects. Another modiﬁcationis to extend the latent continuous-time Markov process to a semi-Markov process. This will bescientiﬁcally interesting because it is reasonable to assume that the current latent state not onlydepends on the most recent past state but also on the history of the state trajectory. However, allthese changes are computationally expensive, especially on such large-scale high-frequency data.Corresponding estimation methods have to be developed before the application becomes feasible. References

Albert, A. (1962). Estimating the inﬁnitesimal generator of a continuous time, ﬁnite state markovprocess.

The Annals of Mathematical Statistics , 727–753.15ltman, R. M. (2007). Mixed hidden markov models: an extension of the hidden markov modelto the longitudinal data setting.

Journal of the American Statistical Association 102 (477),201–210.Bartalesi, R., F. Lorussi, M. Tesconi, A. Tognetti, G. Zupone, and D. De Rossi (2005). Wearablekinesthetic system for capturing and classifying upper limb gesture. In

Eurohaptics Conference,2005 and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems,2005. World Haptics 2005. First Joint , pp. 535–536. IEEE.Boyd, S., N. Parikh, E. Chu, B. Peleato, and J. Eckstein (2011). Distributed optimization andstatistical learning via the alternating direction method of multipliers.

Foundations and Trends R (cid:13) in Machine Learning 3 (1), 1–122.Cappé, O., E. Moulines, and T. Rydén (2005). Inference in hidden markov models. In SpringerSeries in Statistics .Catellier, D. J., P. J. Hannan, D. M. Murray, C. L. Addy, T. L. Conway, S. Yang, and J. C.Rice (2005). Imputation of missing data when measuring physical activity by accelerometry.

Medicine and science in sports and exercise 37 (11 Suppl), S555.Cradock, A. L., J. L. Wiecha, K. E. Peterson, A. M. Sobol, G. A. Colditz, and S. L. Gortmaker(2004). Youth recall and tritrac accelerometer estimates of physical activity levels.

Medicineand science in sports and exercise 36 (3), 525–532.Douc, R., E. Moulines, T. Rydén, et al. (2004). Asymptotic properties of the maximum likelihoodestimator in autoregressive models with markov regime.

The Annals of statistics 32 (5), 2254–2304.Efron, B. (1992). Bootstrap methods: another look at the jackknife. In

Breakthroughs in statistics ,pp. 569–593. Springer.Evenson, K. R. (2011). Towards an understanding of change in physical activity from pregnancythrough postpartum.

Psychology of sport and exercise 12 (1), 36–45.Gruen, M. E., M. Alfaro-Córdoba, A. E. Thomson, A. C. Worth, A.-M. Staicu, and B. D. X.Lascelles (2017). The use of functional data analysis to evaluate activity in a spontaneous modelof degenerative joint disease associated pain in cats.

PloS one 12 (1), e0169576.Hansen, B. H., E. Kolle, S. M. Dyrstad, I. Holme, and S. A. Anderssen (2012). Accelerometer-determined physical activity in adults and older people.

Medicine and science in sports andexercise 44 (2), 266–272.He, J., H. Li, and J. Tan (2007). Real-time daily activity classiﬁcation with wireless sensornetworks using hidden markov model. In

Engineering in Medicine and Biology Society, 2007.EMBS 2007. 29th Annual International Conference of the IEEE , pp. 3192–3195. IEEE.Hong, M. and Z.-Q. Luo (2017). On the linear convergence of the alternating direction method ofmultipliers.

Mathematical Programming 162 (1-2), 165–199.16anai, M., K. P. Izawa, M. Kobayashi, A. Onishi, H. Kubo, M. Nozoe, K. Mase, and S. Shimada(2018). E ﬀ ect of accelerometer-based feedback on physical activity in hospitalized patients withischemic stroke: a randomized controlled trial. Clinical rehabilitation , 0269215518755841.Lee, J. A. and J. Gill (2016). Missing value imputation for physical activity data measured byaccelerometer.

Statistical methods in medical research , 0962280216633248.Liu, Y.-Y., S. Li, F. Li, L. Song, and J. M. Rehg (2015). E ﬃ cient learning of continuous-timehidden markov models for disease progression. In Advances in neural information processingsystems , pp. 3600–3608.Marshall, A., O. Medvedev, and G. Markarian (2007). Self management of chronic disease usingmobile devices and bluetooth monitors. In

Proceedings of the ICST 2nd international conferenceon Body area networks , pp. 22. ICST (Institute for Computer Sciences, Social-Informatics andTelecommunications Engineering).Metzger, J. S., D. J. Catellier, K. R. Evenson, M. S. Treuth, W. D. Rosamond, and A. M. Siega-Riz(2008). Patterns of objectively measured physical activity in the united states.

Medicine andscience in sports and exercise 40 (4), 630–638.Morris, J. S., C. Arroyo, B. A. Coull, L. M. Ryan, R. Herrick, and S. L. Gortmaker (2006).Using wavelet-based functional mixed models to characterize population heterogeneity inaccelerometer proﬁles: a case study.

Journal of the American Statistical Association 101 (476),1352–1364.Napolitano, M. A., K. E. Borradaile, B. A. Lewis, J. A. Whiteley, J. L. Longval, A. F. Parisi, A. E.Albrecht, C. N. Sciamanna, J. M. Jakicic, G. D. Papandonatos, et al. (2010). Accelerometer usein a physical activity intervention trial.

Contemporary clinical trials 31 (6), 514–523.Nickel, C., C. Busch, S. Rangarajan, and M. Möbius (2011). Using hidden markov models foraccelerometer-based biometric gait recognition. In

Signal Processing and its Applications(CSPA), 2011 IEEE 7th International Colloquium on , pp. 58–63. IEEE.Nodelman, U., C. R. Shelton, and D. Koller (2012). Expectation maximization and complexduration distributions for continuous time bayesian networks. arXiv preprint arXiv:1207.1402 .Pyke, R. (1961a). Markov renewal processes: deﬁnitions and preliminary properties.

The Annalsof Mathematical Statistics , 1231–1242.Pyke, R. (1961b). Markov renewal processes with ﬁnitely many states.

The Annals of MathematicalStatistics , 1243–1259.Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speechrecognition.

Proceedings of the IEEE 77 (2), 257–286.Robertson, W., S. Stewart-Brown, E. Wilcock, M. Oldﬁeld, and M. Thorogood (2010). Utilityof accelerometers to measure physical activity in children attending an obesity treatmentintervention.

Journal of obesity 2011 . 17onao, C. A. and S.-B. Cho (2014). Human activity recognition using smartphone sensors withtwo-stage continuous hidden markov models. In

Natural Computation (ICNC), 2014 10thInternational Conference on , pp. 681–686. IEEE.Schmid, D., C. Ricci, and M. F. Leitzmann (2015). Associations of objectively assessed physicalactivity and sedentary time with all-cause mortality in us adults: the nhanes study.

PLoSOne 10 (3), e0119591.Scott, S. L., G. M. James, and C. A. Sugar (2005). Hidden markov models for longitudinalcomparisons.

Journal of the American Statistical Association 100 (470), 359–369.Shi, W., Q. Ling, K. Yuan, G. Wu, and W. Yin (2014). On the linear convergence of the admm indecentralized consensus optimization.

IEEE Trans. Signal Processing 62 (7), 1750–1761.Shirley, K. E., D. S. Small, K. G. Lynch, S. A. Maisto, and D. W. Oslin (2010). Hidden markovmodels for alcoholism treatment trial data.

The Annals of Applied Statistics , 366–395.Troiano, R. P., D. Berrigan, K. W. Dodd, L. C. Mâsse, T. Tilert, M. McDowell, et al. (2008).Physical activity in the united states measured by accelerometer.

Medicine and science in sportsand exercise 40 (1), 181.Wang, X., D. Sontag, and F. Wang (2014). Unsupervised learning of disease progression models.In

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discoveryand data mining , pp. 85–94. ACM.Witowski, V., R. Foraita, Y. Pitsiladis, I. Pigeot, and N. Wirsik (2014). Using hidden markovmodels to improve quantifying physical activity in accelerometer data–a simulation study.

PloSone 9 (12), e114089.Xiao, L., L. Huang, J. A. Schrack, L. Ferrucci, V. Zipunnikov, and C. M. Crainiceanu (2014).Quantifying the lifetime circadian rhythm of physical activity: a covariate-dependent functionalapproach.