Classification of Categorical Time Series Using the Spectral Envelope and Optimal Scalings
CClassification of Categorical Time SeriesUsing the Spectral Envelope and Optimal
Scalings
Zeda Li*Baruch College, The City University of New YorkScott A. Bruce ∗ Department of Statistics, George Mason UniversityTian CaiGraduate Center, The City University of New York
Abstract
This article introduces a novel approach to the classification of categorical timeseries under the supervised learning paradigm. To construct meaningful features forcategorical time series classification, we consider two relevant quantities: the spectralenvelope and its corresponding set of optimal scalings. These quantities characterizeoscillatory patterns in a categorical time series as the largest possible power at eachfrequency, or spectral envelope , obtained by assigning numerical values, or scalings , tocategories that optimally emphasize oscillations at each frequency. Our procedure com-bines these two quantities to produce an interpretable and parsimonious feature-basedclassifier that can be used to accurately determine group membership for categoricaltime series. Classification consistency of the proposed method is investigated, and sim-ulation studies are used to demonstrate accuracy in classifying categorical time serieswith various underlying group structures. Finally, we use the proposed method to ex-plore key differences in oscillatory patterns of sleep stage time series for patients withdifferent sleep disorders and accurately classify patients accordingly.
Keywords: Categorical Time Series; Classification; Optimal Scaling; Replicated Time Series;Spectral Envelope ∗ Both authors contributed equally to this work. a r X i v : . [ s t a t . M E ] F e b Introduction
Categorical time series are frequently observed in a variety of fields, including sleep medicine,genetic engineering, rehabilitation science, and sports analytics (Stoffer et al., 2000). In manyapplications, multiple realizations of categorical time series from different underlying groupsare collected in order to construct a classifier that can accurately identify group membership.As a motivating example, we consider a sleep study in which participants with differenttypes of sleep disorders are monitored during a night of sleep via polysomnography in orderto understand important clinical and behavioral differences among these sleep disorders.During sleep, the body cycles through different sleep stages: movement/wakefulness, rapideye movement (REM) sleep, and non-rapid eye movement (NREM) sleep, which is furtherdivided into light sleep (S1,S2) and deep sleep (S3, S4). Our analysis focuses on two particularsleep disorders, nocturnal frontal lobe epilepsy (NFLE) and REM behavior disorder (RBD),for which differential diagnosis is especially challenging due to a significant overlap in theirassociated clinical and behavioral characteristics (Tinuper and Bisulli, 2017). For example,NFLE and RBD patients both exhibit complex, bizarre motor behavior and vocalizationsduring sleep. However, we posit that differences in sleep cycling behavior may still existdue to fundamental differences in the sleep disruption mechanisms of NFLE and RBD. Thegoal of our analysis is to investigate potential differences in sleep cycling behavior for NFLEand RBD patients and use this information to accurately classify patients accordingly. Thisdata-driven classification can potentially improve accuracy in differential diagnoses of NFLEand RBD in patients presenting clinical and behavioral characteristics common to bothconditions. Figure 1 displays examples of study participants’ full night sleep stages seriesfrom two different groups.In the statistical literature, classification methods for multiple real-valued time serieshave been well-studied; see Shumway and Stoffer (2016) for a review. However, classifica-tion of categorical time series has not received much attention. The majority of statisticalmethods for categorical time series analysis have been developed for analyzing a single cat-egorical time series. Some examples include the Markov chain model of Billingsley (1961),the link function approach of Fahrmeir and Kauifmann (1987), the likelihood-based method2
NFLE 1
Hours S l eep S t age W / M T R S S S S NFLE 2
Hours S l eep S t age W / M T R S S S S NFLE 3
Hours S l eep S t age W / M T R S S S S RBD 1
Hours S l eep S t age W / M T R S S S S RBD 2
Hours S l eep S t age W / M T R S S S S RBD 3
Hours S l eep S t age W / M T R S S S S Figure 1:
Sleep stage time series from six sleep study participants: three NFLE patients (top row)and three RBD patients (bottom row). of Fokianos and Kedem (1998), and the spectral envelope approach for analyzing a singletime series introduced in Stoffer et al. (1993). A comprehensive discussion of this researchdirection can be found in Fokianos and Kedem (2003). More recently, Krafty et al. (2012) in-troduced the spectral envelope surface for quantifying the association between the oscillatorypatterns of a collection of categorical time series and continuous covariates. However, it isnot immediately useful for classification. To the best of our knowledge, this article presentsthe first statistical approach for supervised classification of multiple categorical time series.In the computer science literature, however, many methods have been developed to clas-sify string-valued time series, which can also be used for classification of categorical timeseries. These include the minimum edit distance classifier with sequence alignment (Navarro,2001; Jurafsky and Martin, 2009), Markov chain-based classifiers (Deshpande and Karypis,2002), the Haar Wavelet classifier (Aggarwal, 2002), and the state-of-the-art sequence learnerthat uses a gradient-bounded coordinate-descent algorithm for efficiently selecting discrim-inative subsequences and then uses logistic regression for classification (Ifrim and Wiuf,2011). These methods are black-box in nature and offer little help in understanding key dif-3erences among groups. On the other hand, the proposed method addresses the classificationproblem using the spectral envelope and optimal scalings, which provide low-dimensional,interpretable summary measures of oscillatory patterns and traversals through categories.These patterns are often associated with scientific mechanisms that distinguish differentgroups and also produce lower classification error compared to state-of-the-art computerscience methods like sequence learner.Many classifiers for real-valued time series rely on feature extraction, a process in whichlow-dimensional summary quantities are constructed that capture essential features of theunderlying groups. These quantities are then used to develop feature-based distance mea-sures, such as the Kullback-Leibler distance and squared quadratic distance, which can beused to measure differences between groups and time series of unknown group membership.Training data can then be used to estimate group-level quantities and construct a classifierthat minimizes the distance between time series and their predicted group (Huang et al.,2004; Shumway and Stoffer, 2016). This type of approach cannot be easily extended tothe classification of categorical time series due to the difficulty in obtaining low-dimensionalfeatures. To this end, we propose using the spectral envelope and its corresponding set ofoptimal scalings (Stoffer et al., 1993) as low-dimensional, interpretable features for differ-entiating groups of categorical time series. Use of these features is motivated by noticingthat most categorical time series can be represented in terms of their prominent oscillatorypatterns, characterized by the spectral envelope, and by the set of mappings from categoriesto numeric values that accentuate specific oscillatory patterns, characterized by the optimalscalings.For example, Figures 2(a) and 2(b) display two categorical time series with similar traver-sals through categories, but different oscillatory patterns. More specifically, the time seriesin Figure 2(b) cycles between categories faster than the time series in Figure 2(a). On theother hand, Figures 2(c) and 2(d) display two categorical time series with similar oscillatorypatterns, but different traversals through categories. More specifically, the time series inFigure 2(c) spends approximately equal amounts of time in each category, while the timeseries in Figure 2(d) spends more time in categories 2 and 3. Moreover, Figure 3 displays theestimated spectral envelope for the two series in Figures 2(a) and 2(b) and the optimal scal-4ngs for the two series in Figures 2(c) and 2(d). The spectral envelope and optimal scalingsclearly reflect the corresponding differences between these series. In particular, the spectralenvelope indicates more high frequency power for the time series in Figure 2(b) since it cyclesbetween categories faster relative to the time series in Figure 2(a). Also, the optimal scalingsfor the time series in Figure 2(c) and Figure 2(d) are quite different, reflecting the differenttraversals over categories resulting in different distributions of time spent in categories. (a)
Time C a t ego r y (b) Time C a t ego r y (c) Time C a t ego r y (d) Time C a t ego r y Figure 2:
Four simulated categorical time series: (a) and (b) have the same dominating cate-gories but different cyclical patterns; (c) and (d) have the same frequency patterns but differentdominating categories.
The proposed method is briefly described as follows. For each time series to be classified,we represent it as a vector-valued time series through the use of indicator variables. Thesmoothed spectral density matrix of this vector-valued time series is then obtained, and thespectral envelope and optimal scalings at each frequency are computed from the estimatedspectral matrix. Then, the spectral envelope and optimal scalings for each group are esti-mated respectively via training data. The proposed feature, which is used to estimate thedistance from each group, is obtained by adaptively summing the differences in the spectralenvelope and optimal scalings. Finally, time series with unknown group membership are5ssigned to groups with the most similar features (i.e. minimum distance). Under the pro-posed framework, we show that the misclassification probability is bounded as long as thespectral density matrix estimator is consistent. The procedure is demonstrated to performwell in simulation studies and a real data analysis.The remainder of the paper is organized as follows. Section 2 provides definitions of thespectral envelope and optimal scalings and corresponding estimators. Section 3 introducesthe proposed classification procedure and its theoretical properties. Section 4 provides de-tailed simulation studies, which explore the empirical properties of the proposed method andcompares with the state-of-the-art sequence learner classifier. Section 5 details the applica-tion of the proposed classifier to the analysis of sleep stage time series to better understandand accurately classify sleep disorders. Section 6 provides some closing discussions andimpactful extensions of this work.
Let X t , for t = 0 , , , . . . , be a categorical time series with finite state-space C = { c , c , . . . , c m } .We assume that X t is stationary such that { X , X , . . . , X t } d = { X h , X h , . . . , X t + h } for h ≥ (cid:96) =1 , ,...,m P( X t = c (cid:96) ) > X t ( β ), obtained by assigning numerical values, orscalings, to categories such that β = ( β , β , . . . , β m ) (cid:48) ∈ R m and X t ( β ) = β (cid:96) when X t = c (cid:96) .We assume that X t ( β ) has a continuous and bounded spectral density f x ( ω ; β ) = ∞ (cid:88) h = −∞ Cov[ X t ( β ) , X t + h ( β )] exp( − πiωh ) . Let V x ( β ) be the variance of the scaled time series X t ( β ), the spectral envelope is thendefined as the maximal normalized spectral density, f x ( ω ; β ) /V x ( β ), among all possible scal-ings not proportional to 1 m at frequency ω , where 1 m is the m -dimensional vector of ones.Scalings that assign the same value to each category are excluded since V x ( β ) is zero and the6ormalized power spectrum is not well defined. Formally, we define the spectral envelopeand set of optimal scalings for frequency ω as λ ( ω ) = max β ∈ R m \{ } f x ( ω ; β ) V x ( β ) , B ( ω ) = arg max β ∈ R m \{ } f x ( ω ; β ) V x ( β ) , respectively, where { } is the subspace of R m that is proportional to 1 m . The spectralenvelope, λ ( ω ), is the largest proportion of the variance that can be obtained at frequency ω for different possible scalings, such that f x ( ω, β ) ≤ λ ( ω ) ∀ β ∈ R m \ { } . The spectralenvelope characterizes important oscillatory patterns in categorical time series.For illustration, Figures 3(a) and 3(b) display the estimated spectral envelopes for timeseries displayed in Figures 2(a) and 2(b) respectively. It can be seen that the time seriesin Figure 2(a), which oscillates more slowly than the time series in Figure 2(b), has morepower in the estimated spectral envelope at lower frequencies. The set of optimal scalingsthat maximize the normalized spectral density at frequency ω , B ( ω ), provides importantinformation about the traversals through categories associated with prominent oscillatorypatterns at frequency ω . For further illustration, Figures 3(c) and 3(d) display the estimatedoptimal scalings for time series displayed in Figures 2(c) and 2(d) respectively. The optimalscalings in Figure 3(d) for categories 2 and 3 are similar at lower frequencies ( ω < . A common approach to the analysis of any type of categorical data is to represent it interms of random vectors of indicator variables. Similar to the formulations used in Stofferet al. (1993); Krafty et al. (2012), we define the ( m − Y t , which has a one in the (cid:96) th element if X t = c (cid:96) for (cid:96) = 1 , . . . , m − c m as the reference category andrestricting the set of optimal scalings to a lower-dimensional space. The assumption that f x ( ω, β ) is continuous is necessary and sufficient for ensuring that Y t has a continuous spectral7 .0 0.1 0.2 0.3 0.4 0.5 . . (a) Frequency S pe c t r a l E n v e l ope . . (b) Frequency S pe c t r a l E n v e l ope (c) Frequency S c a li ng 123 (d) Frequency S c a li ng 123 −3−2−10123 Figure 3: (a) and (b): The spectral envelopes of the time series shown in panels (a) and (b) ofFigure 2; (c) and (d): The scalings of the time series presented in panels (c) and (d) of Figure 2. density, which is defined as f y ( ω ) = ∞ (cid:88) h = −∞ Cov[ Y t , Y t + h ] exp( − πiωh ) . The spectral density f y ( ω ) is a positive definite Hermitian ( m − × ( m −
1) matrix. Weassume f y ( ω ) and the variance of Y t , V y = Var( Y t ), are non-singular for all ω ∈ R (Brillinger,2002). Formally, we define the spectral envelope and the corresponding set of optimal scalingsused in our proposed classification algorithm as follows. Definition 1
For ω ∈ R , the spectral envelope, λ ( ω ) , is defined as the largest eigenvalue of h ( ω ) = V − / y f y ( ω ) V − / y . The ( m − -variate vector of optimal scalings, γ ( ω ) , is defined as the eigenvector associatedwith λ ( ω ) . a ∈ R m − , we have a (cid:48) f y ( ω ) a = a (cid:48) f rey ( ω ) a , where f rey ( ω ) is the real part of f y ( ω ). Thus,the spectral envelope is equivalent to the largest eigenvalue of h ( ω ) re = V − / f rey ( ω ) V − / .Second, a connection between the optimal scalings derived from this formulation and thatdefined in Section 2.1 can be established (Krafty et al., 2012). If V / y γ ( ω ) is an eigenvectorof h re ( ω ) associated with λ ( ω ), then γ ( ω )0 = arg max β ∈ R m \{ } f x ( ω ; β ) V x ( β ) . When the multiplicity of λ ( ω ) as an eigenvalue of h re ( ω ) is one, there exists a unique γ ( ω )such that V / y γ ( ω ) is an eigenvector of γ ( ω ) associated with λ ( ω ) where γ ( ω ) (cid:48) V y γ ( ω ) = 1and with the first nonzero entry of V / γ ( ω ) to be positive. Third, if there is a significantfrequency component near ω , then λ ( ω ) will be large, and the values of γ ( ω ) are dependenton the particular cyclical traversal of the series through categories that produces the valueof λ ( ω ) at frequency ω . Consider a realization of a categorical time series, X t , t . . . , T , and its corresponding multi-variate process realization Y t , t . . . , T defined in Section 2.2. Let ˆ f y ( ω ) represent the estimateof the spectral matrix f y ( ω ). To allow for asymptotic development, we assume Y t is strictlystationary and that all cumulant spectra, of all orders, exist (Brillinger, 2002, Assumption2.6.1). There is an extensive literature on estimation of the power spectral matrix. We useperiodograms, or sample analogues of the spectrum I ( s ) = T − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 Y t exp( − πist/T ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , s = 1 , . . . T. It is well known that the periodogram is an asymptotically unbiased but inconsistent esti-mator of the true spectral matrix. A common way to obtain a consistent estimator of thespectral matrix is to smooth periodogram ordinates over frequencies using kernels (Brillinger,9002). In this paper, we consider the smoothed periodogram estimatorˆ f y ( ω s ) = B T (cid:88) j = − B T W B T ,j I ( s + j ) , where ω s = s/T for s = 1 , . . . , K = (cid:98) ( T − / (cid:99) are the Fourier frequencies, 2 B T + 1 is thesmoothing span, and W B T ,j are nonnegative weights that satisfy the following conditions: W B T ,j = W B T , − j , B T (cid:88) j = − B T W B T ,j = 1 . Generally, the weights are chosen such that W B T , is a decreasing function of B T . It is knownthat ˆ f y ( ω k ) is consistent if B T → ∞ and B T T − → T → ∞ (Brillinger, 2002). Given thesample spectral matrix ˆ f y ( ω ) and sample variance (cid:98) V y , the estimate of the spectral envelopeˆ λ ( ω ) is the largest eigenvalue of ˆ h ( ω ) re = (cid:98) V − / ˆ f rey ( ω ) (cid:98) V − / , and the optimal scaling, ˆ γ ( ω ),is the eigenvector of ˆ h ( ω ) re associated with ˆ λ ( ω ). It should be noted that other approachesfor nonparametric estimation of the spectral matrix, such as those in Dai and Guo (2004),Rosen and Stoffer (2007), and Krafty and Collinge (2013), can also be used. We use thekernel smoothing approach for computational efficiency and ease of theoretical exposition. Consider a population of categorical time series composed of J ≥ , . . . Π J .Denote the j th group-level spectral envelope and ( m − ( j ) ( ω ) and Γ ( j ) ( ω )for j = 1 , . . . , J respectively. Suppose we observe N = (cid:80) Jj =1 N j independent training timeseries of length T and R independent testing time series of length T , X r = { X r , . . . , X rT } , r = 1 , . . . , R , with unknown group membership. In this section, we introduce an adaptivealgorithm for consistent classification. As shown in Figures 2 and 3, groups of categorical time series may exhibit distinct oscilla-tory patterns. In this case, the spectral envelope, which characterizes dominant oscillatorypatterns, can be used as a signature for each group and an important feature for categorical10ime series classification. We outline a classification procedure based on the spectral envelopebelow.1. For each testing time series, compute the sample spectral envelope, ˆ λ ( r ) ( ω s ), for r =1 , . . . , R , where ω s = s/T are the Fourier frequencies with s = 1 , . . . , K and K = (cid:98) ( T (cid:96) − / (cid:99) . Denote ˆ λ ( r ) = { ˆ λ ( r ) ( ω ) , . . . , ˆ λ ( r ) ( ω K ) } (cid:48) as a K -dimensional vector.2. Compute D ( r ) j,ENV = || ˆ λ ( r ) − Λ ( j ) || , (1)where || · || is the L norm.3. Classify time series X r to the group Π j with the most similar spectral envelope suchthat ˆ g r = arg min j D ( r ) j,ENV j = 1 , . . . , J. Classification consistency can be established under the following assumptions. To aid thepresentation, we consider the case of J = 2 groups, Π and Π , while similar results can bederived for J > Assumption 1
Each element of the ( m − × ( m − spectral density matrix f y ( ω ) hasbounded and continuous first derivatives. Assumption 2 || Λ (1) − Λ (2) || ≥ CT for a positive constant C . Under Assumption 1, asymptotic consistency of the estimates ˆ λ ( ω ) and ˆ γ ( ω ) discussed inSection 2.3 can be established, and the largest eigenvalue of the spectral density matrix iscontinuous and bounded from above. Assumption 2 implies that the spectral envelopes of thetwo groups are well separated. The following theorem states the classification consistency ofusing the spectral envelope as a classifier. Theorem 1
Under Assumptions 1-2, the probability of misclassifying X r , a testing timeseries from group Π , to group Π , can be bounded as follows: P ( D ( r )1 ,ENV > D ( r )2 ,ENV ) = O ( B T T − ) , where D ( r )1 ,ENV and D ( r )2 ,ENV are defined in (1) . .2 Classification via Optimal Scalings While the spectral envelope adequately characterizes dominant oscillatory patterns, it doesn’taccount for traversals through categories responsible for such oscillatory patterns. Differencesamong groups may also be due to different traversals through categories that produce par-ticular oscillatory patterns, which are characterized by optimal scalings for each frequencycomponent. Similarly, we present a categorical time series classifier using optimal scalingsbelow.1. Compute the ( m − γ ( r ) ( ω s ), of the testing time series X r for r = 1 , . . . , R , where ω s = s/T are the Fourier frequencies with s = 1 , . . . , K and K = (cid:98) ( T (cid:96) − / (cid:99) . Denote ˆ γ ( r ) = { ˆ γ ( r ) ( ω ) (cid:48) , . . . ˆ γ ( r ) ( ω K ) (cid:48) } (cid:48) as a K × ( m −
1) matrix.2. Compute D ( r ) j,SCA = || ˆ γ ( r ) − Γ ( j ) || F (2)where || · || F is the Frobenius norm.3. Classify time series X r to the group Π j with the most similar set of optimal scalingssuch that ˆ g r = arg min j D ( r ) j,SCA j = 1 , . . . , J. In addition to Assumption 1, the following assumption is necessary to establish theclassification consistency of the scaling classifier, which indicates that the optimal scalingsare well separated.
Assumption 3
For fixed m categories, || Γ (1) − Γ (2) || F ≥ CT for a positive constant C . Theorem 2 states the consistency of classification based on the scalings.
Theorem 2
Under Assumptions 1 and 3, the probability of misclassifying X r , a testing timeseries from group Π , to group Π , can be bounded as follows: P ( D ( r )1 ,SCA > D ( r )2 ,SCA ) = O ( B T T − ) , where D ( r )1 ,SCA and D ( r )2 ,SCA are defined in (2) . .3 Proposed Adaptive Envelope and Scaling Classifier The envelope classifier (Section 3.1) works well in situations where oscillatory patterns aredifferent among groups, while the scaling classifier (Section 3.2) is effective when traversalsthrough categories are distinct among groups. However, in practice, different groups are likelyto exhibit different oscillatory patterns and traversals through categories to some extent.Thus, it is desirable to construct an adaptive classifier that can automatically identify theextent to which groups are different with respect to their oscillatory patterns, traversalsthrough categories, or both, and optimally classify time series accordingly. To this end,we propose a general purpose, flexible classifier that adaptively weights differences in thespectral envelope and optimal scalings in order to determine the characteristics that bestdistinguish groups and provide accurate classification. Specifically, we consider the followingdistance of the r th testing time series to the j th group D ( r ) j,EnvSca = κ || ˆ λ ( r ) − Λ ( j ) || || ˆ λ ( r ) || + (1 − κ ) || ˆ γ ( r ) − Γ ( j ) || F || ˆ γ ( r ) || F , (3)for j = 1 , . . . , J and r = 1 , . . . , R . Since the spectral envelope ˆ λ ( r ) is a K -dimensional vectorand the scaling ˆ γ ( r ) is ( m − × K matrix, we rescale these distances by their correspondingnorms. The unknown tuning parameter κ controls the relative importance of the spectralenvelope and optimal scalings in classifying time series. Our proposed adaptive classificationalgorithm is presented in Algorithm 1.Several remarks on the algorithm should be noted. First, the group-level spectral en-velopes Λ ( j ) and optimal scalings Γ ( j ) are unknown in practice. We obtain Λ ( j ) and Γ ( j ) byaveraging the sample spectral envelopes and sample optimal scalings across training timeseries replicates within the j th group, respectively. In particular, we replace Λ ( j ) and Γ ( j ) bytheir sample estimates ˆΛ ( j ) = 1 N j N j (cid:88) k =1 ˆ λ ( j,k ) , ˆΓ ( j ) = 1 N j N j (cid:88) k =1 ˆ γ ( j,k ) , for j = 1 , . . . , J , where ˆ λ ( j,k ) and ˆ γ ( j,k ) are the estimated spectral envelope and optimalscalings of the k th training time series among group j , respectively. Second, we select thetuning parameter κ by using a grid search through leave-one-out (LOO) cross-validation.13 ata: R independent testing time series , X r = { X r , . . . , X rT } for r = 1 , . . . , R . Result:
Estimated group assignment for each testing time series, { ˆ g , . . . , ˆ g R } ,where ˆ g r ∈ (1 , . . . , J ) for r = 1 , . . . , R .Step 1: Use Leave-one-out cross validation to select tuning parameter κ .Step 2: for r = 1 , . . . R do Convert the testing time series X r with m categories into a ( m − Y r defined in Section 2.2 and compute the ( m − × ( m −
1) matrixˆ h ( ω ) in Definition 1;Compute the sample spectral envelope, ˆ λ ( r ) ( ω s ), of the testing time series X r ,where ω s = s/T are the Fourier frequencies with s = 1 , . . . , K and K = (cid:98) ( T (cid:96) − / (cid:99) . Denoteˆ λ ( r ) = { ˆ λ ( r ) ( ω ) , . . . ˆ λ ( r ) ( ω K ) } (cid:48) as a K -dimensional vector;Compute the ( m − γ ( r ) ( ω s ), of thetesting time series X r , where ω s = s/T are the Fourier frequencies with s = 1 , . . . , K and K = (cid:98) ( T (cid:96) − / (cid:99) . Denoteˆ γ ( r ) = { ˆ γ ( r ) ( ω ) (cid:48) , . . . ˆ γ ( r ) ( ω K ) (cid:48) } (cid:48) as a K × ( m −
1) matrix; for j = 1 , . . . J do Compute D ( r ) j,EnvSca = κ || ˆ λ ( r ) − Λ ( j ) || || ˆ λ ( r ) || + (1 − κ ) || ˆ γ ( r ) − Γ ( j ) || F || ˆ γ ( r ) || F . end Classify the time series X r to group Π j if D ( r ) j,EnvSca is the smallest among all D ( r ) j,EnvSca for j = 1 , . . . , J , that is,ˆ g r = arg min j D ( r ) j,EnvSca . endreturn { ˆ g , . . . , ˆ g R } ; Algorithm 1:
Envelope and Scaling Classifier (EnvSca)In particular, let κ ∈ (0 , . , . , . . . , κ corresponds to the value thatproduces the highest leave-one-out classification rate via Algorithm 1. Although a finer gridcould be used as well, in our experience, using κ ∈ (0 , . , . , . . . ,
1) performs well without14acrificing computational efficiency. Third, to obtain more parsimonious measures that stillcan discriminate among different groups, we may select a subset of elements in the spectralenvelope and optimal scalings that are most different among groups. This strategy has beenused in Fryzlewicz and Ombao (2009) for classifying nonstationary quantitative time series.For example, we compute∆( s ) = J (cid:88) j =1 J (cid:88) h = j +1 (cid:2) Λ ( j ) ( ω s ) − Λ ( h ) ( ω s ) (cid:3) , s = 1 , . . . , K, order ∆( s ) decreasingly, and then choose the top proportion of the elements in ∆( s ). Aleave-one-out cross validation approach that minimizes the classification error is then usedto select an appropriate proportion.Under Assumptions 1 and 4, classification consistency is established in Theorem 3. Assumption 4
For fixed m categories, || Λ (1) − Λ (2) || + || Γ (1) − Γ (2) || F ≥ CT for a positiveconstant C . Theorem 3
Under Assumptions 1 and 4, the probability of misclassifying X r , a time seriesfrom group Π , to group Π , can be bounded as follows: P ( D ( r )1 ,EnvSca > D ( r )2 ,EnvSca ) = O ( B T T − ) , where D ( r )1 ,EnvSca and D ( r )2 ,EnvSca are defined in Equation (3) . We conduct simulation studies to evaluate performance of the proposed classification proce-dure. Following Fokianos and Kedem (2003), categorical time series X t are generated fromthe multinomial logit model as follows p t(cid:96) ( α ) = exp( α (cid:48) (cid:96) Y t − )1 + (cid:80) m − (cid:96) =1 exp( α (cid:48) (cid:96) Y t − ) , (cid:96) = 1 , . . . , m − , and p tm ( α ) = 11 + (cid:80) m − (cid:96) =1 exp( α (cid:48) (cid:96) Y t − ) , Y t is a ( m − (cid:96) th element if X t = c (cid:96) for (cid:96) = 1 , . . . , m − p t(cid:96) for (cid:96) = 1 , . . . , m are the probabilities of X t = c (cid:96) at time t and satisfy (cid:80) m(cid:96) =1 p t(cid:96) = 1, and α (cid:96) for (cid:96) = 1 , . . . , m are the regression parameters.The simulated model incorporates a lagged value of order one of Y t or X t . We consider threedifferent cases under the multinomial model. For the first two cases, we let the number ofcategories m = 4 and the number of groups J = 2. For Case 1, we consider the followingregression parameters. α = (1 . , , (cid:48) , α = (1 , . , (cid:48) , α = (1 , , . (cid:48) if Y t ∈ Π ,α = (0 . , , (cid:48) , α = (1 , . , (cid:48) , α = (1 , , . (cid:48) if Y t ∈ Π . Figures 2(a) and 2(b) display realizations of time series from groups Π and Π in Case 1,respectively. For Case 2, the regression parameters are set to be α = (1 . , , (cid:48) , α = (1 , . , (cid:48) , α = (1 , , . (cid:48) if Y t ∈ Π ,α = (0 . , , (cid:48) , α = (1 , . , (cid:48) , α = (1 , , . (cid:48) if Y t ∈ Π . Figures 2(c) and 2(d) present realizations of time series from groups Π and Π in Case 2,respectively. For Case 3, we consider J = 3 different groups with the following regressionparameters α = (0 . , , (cid:48) , α = (1 , . , (cid:48) , α = (1 , , . (cid:48) if Y t ∈ Π ,α = (1 . , , (cid:48) , α = (1 , . , (cid:48) , α = (1 , , . (cid:48) if Y t ∈ Π ,α = (1 . , . , (cid:48) , α = ( − , − . , − (cid:48) , α = (2 , . , − (cid:48) if Y t ∈ Π .
100 replications are generated for the 27 combinations of 3 cases, 3 numbers of time se-ries per group in the training data, N j = 20 , ,
100 for all j , and 3 time series lengths T = 100 , , and Π have different oscillatory patterns but similar traversals through categories, resultingin a poor classification rate if we use only the optimal scalings for classification. For Case2, where the two groups are distinct mainly in the optimal scalings, the envelope classifierproduces the lowest correct classification rate (around 50%) among all methods considered.The proposed classifier and the scaling classifier perform similarly. They have slightly lowerclassification rates than sequence learner, which is designed to select and use all subsequencesthat are important in classifying responses and thus is well-suited for the setting in Case2. In Case 3, we consider three groups, and groups differ in cyclical patterns and scalings.The proposed classifier has higher mean classification rates than the envelope and scalingclassifiers. This is because groups are different in both oscillatory patterns and traversalsthrough categories. The proposed classifier, by incorporating both the spectral envelope andoptimal scalings, can produce better classification rates in this case. It should be noted thatsequence learner is developed under the framework of logistic regression and cannot classifya population of time series with more than two groups in its current form. One could extendsequence learner to multinomial logistic regression, but extensive programming efforts areneeded and no prior results are available. Thus, we don’t have simulation results for sequencelearner in Case 3.In addition to classification, estimates of the tuning parameter κ in the proposed al-gorithm allow for interpretable inference. For example, the average of estimated tuningparameters ˆ κ in our simulations for Cases 1, 2, and 3 are 1.00, 0.24, and 0.66, respectively.This suggests that κ can help us to identify whether groups are different in oscillatory pat-terns only, traversals through categories only, or a mixture of the two.17able 1: Mean (standard deviation) of the percent of correctly classified time seriesacross methods.
Case N J T EnvSca SCA ENV SEQ100 92.21 (3.41) 49.42 (4.72) 93.32 (2.39) 87.13 (3.32)20 200 96.91 (1.99) 49.84 (4.60) 98.16 (1.39) 93.24 (2.70)500 98.78 (1.66) 50.04 (4.71) 99.98 (0.14) 98.44 (1.40)100 92.99 (2.68) 49.92 (4.79) 93.54 (2.28) 90.40 (2.97)1 50 200 97.64 (1.99) 50.10 (4.31) 98.47 (1.19) 96.46 (2.04)500 99.56 (0.64) 49.63 (4.48) 99.98 (0.14) 99.56 (0.76)100 93.68 (2.67) 50.67 (5.00) 93.76 (2.37) 91.55 (2.71)100 200 98.26 (1.30) 49.73 (4.58) 98.49 (1.19) 96.73 (4.96)500 99.80 (0.45) 50.22 (4.72) 99.97 (0.17) 99.68 (0.60)100 71.13 (6.23) 71.66 (6.00) 50.42 (5.02) 75.16 (4.45)20 200 78.69 (5.76) 79.30 (5.03) 49.85 (5.29) 83.32 (4.21)500 88.27 (3.89) 88.65 (3.96) 49.94 (4.51) 93.14 (2.58)100 76.01 (5.36) 76.25 (5.34) 50.71 (4.39) 77.94 (4.11)2 50 200 84.14 (4.03) 84.22 (4.10) 50.17 (4.92) 86.71 (3.43)500 94.20 (2.47) 99.40 (2.34) 50.93 (5.22) 95.95 (2.23)100 79.19 (4.60) 79.48 (4.51) 50.58 (4.83) 78.56 (4.45)100 200 87.59 (3.73) 87.65 (3.67) 39.61 (5.05) 88.46 (3.32)500 96.29 (1.83) 96.31 (1.89) 50.38 (5.04) 96.68 (1.87)100 81.02 (4.69) 70.43 (4.67) 70.88 (3.97) NA20 200 89.64 (3.58) 75.17 (3.48) 80.61 (3.62) NA500 97.39 (1.80) 81.80 (3.12) 93.04 (2.27) NA100 83.79 (3.30) 72.91 (3.67) 71.08 (3.38) NA3 50 200 92.28 (2.62) 78.18 (2.90) 81.82 (2.91) NA500 98.42 (1.28) 84.51 (3.02) 94.32 (2.12) NA100 84.97 (3.34) 73.07 (3.29) 71.37 (3.48) NA100 200 93.04 (2.09) 79.99 (3.05) 82.69 (2.87) NA500 98.67 (1.00) 87.01 (2.59) 94.29 (1.96) NA Analysis of Sleep Stage Time Series
During a full night of sleep, the body cycles through different sleep stages, including rapideye movement (REM) sleep, in which dreaming typically occurs, and non-rapid eye move-ment (NREM) sleep, which consists of four stages representing light sleep (S1,S2) and deepsleep (S3,S4). These sleep stages are associated with specific physiological behaviors thatare essential to the rejuvenating properties of sleep. Disruptions to typical cyclical behav-ior and changes in the amount of time spent in each sleep stage have been found to beassociated with many sleep disorders (Zepelin et al., 2005; Institute of Medicine, 2006). Par-ticular sleep disorders, such as nocturnal frontal lobe epilepsy (NFLE), are also difficult toaccurately diagnose since clinical, behavioral, and electroencephalography (EEG) patternsfor NFLE patients are often similar to those of patients with other sleep disorders, such asREM behavior disorder (RBD) (D’Cruz and Vaughn, 1997; Tinuper and Bisulli, 2017). Ac-cordingly, there is a need for statistical procedures that can automatically identify cyclicalpatterns in sleep stage time series associated with specific sleep disorders and accuratelyclassify patients with different sleep disorders.The data for this analysis was collected through a study of various sleep-related dis-orders (Terzano et al., 2001) and is publicly available via physionet (Goldberger et al.,2000). All participants were monitored during a full night of sleep and their sleep stageswere annotated by experienced technicians every 20 seconds according to well-establishedsleep staging criteria (Rechtschaffen and Kales, 1968). We consider classifying sleep stagetime series data collected from NFLE and RBD patients, for which differential diagnosis isparticularly challenging (Tinuper and Bisulli, 2017). NFLE and RBD patients both expe-rience significant sleep disruptions associated with complex, often bizarre motor behavior(e.g. violent movements of arms or legs, dystonic posturing) and vocalization (e.g. scream-ing, shouting, laughing), which is due to nocturnal seizures for NFLE patients (Tinuper andBisulli, 2017) and due to dream-enacting behavior in REM sleep for RBD patients (Schencket al., 1986). This makes differentiating RBD and NFLE patients particularly challenging.An objective, data-driven classification procedure that can automatically distinguish patientsand aide differential diagnosis is needed. 19he current analysis considers 8 hours of sleep stage time series from N = 46 participants:34 NFLE patients and 12 RBD patients. This results in categorical time series of length T = 1440 with m = 6 sleep stages (REM, S1, S2, S3, S4, and Wake/Movement). Examplesare provided in Figure 1. In order to estimate the spectral envelope and optimal scalings,Wake/Movement is used as the reference category. Leave-one-out (LOO) cross-validation isthen used to empirically evaluate the effectiveness of the classification rule. For this data,the overall correct classification rate is 82.61%, with 29 of the 34 NFLE patients correctlyclassified and 9 of the 12 RBD patients correctly classified. The tuning parameter estimatedvia LOO cross-validation is ˆ κ = 0 . ≤ .
05) representing cycles lasting longer than 6 . .0000.0050.010 0.00 0.01 0.02 0.03 0.04 0.05 Frequency l ^ NFLE RBD
NFLE
Frequency g ^ R S S S S RBD
Frequency g ^ R S S S S −0.50.00.5 Figure 4:
Left: Estimated spectral envelope for NFLE patients (solid red) and RBD patients(dashed blue) for low frequencies (below 0.05). Group-level estimated spectral envelopes are rep-resented by the two thicker lines. Right: Estimated optimal scalings for NLFE patients (top) andRBD patients patients (bottom) for low frequencies (below 0.05).
Second, differences in optimal scalings (see Figure 4) are more subtle, with noticeabledifferences over some categories (e.g. S3, S4), but not all. More specifically, scalings forfrequencies below 0.025 indicate low frequency behavior in NFLE patients due to cyclingamong three broader sleep stage groupings: 1) light sleep (S2), 2) deep sleep (S4), and 3)a combination of transitional sleep stages (S1, S3), REM, and Wake/Movement. On theother hand, RBD patients exhibit low frequency power primarily due to cycling in and outof light sleep (S2). This can be attributed to more regular and prolonged periods of deepsleep (S4) observed in NFLE patients, lasting 14 minutes per onset and covering 20.9% oftotal sleep on average, compared to RBD patients, lasting only 10.8 minutes per onset andcovering 13.1% of total sleep on average. To better illustrate the differences in the optimalscalings, Figure 5 provides a sample series from each group along with the scaled time seriesobtained by averaging optimal scalings over frequencies below 0.025. Given the propensityfor RBD patients to experience immediate sleep disruptions more so than NFLE patients, it21s not surprising that RBD patients experience less deep sleep than NFLE patients.
NFLE 1 RBD 1
W/MTRS1S2S3S4 1 2 3 4 5 6 7 8 S l eep S t age W/MTRS1S2S3S4 1 2 3 4 5 6 7 8 S l eep S t age −0.30.00.3 1 2 3 4 5 6 7 8 Hours S c a l ed T i m e S e r i e s ( < f <= . ) −0.30.00.3 1 2 3 4 5 6 7 8 Hours S c a l ed T i m e S e r i e s ( < f <= . ) Stage
NREM REM W/MT
Figure 5:
Top: Sample time series from the NFLE and RBD groups. Bottom: Correspondingscaled time series based on the mean scaling for frequencies below 0.025 (i.e. cycles lasting morethan 13 minutes). Color corresponding to NREM (purple), REM (blue) and W/MT (yellow) sleepstages also provided.
It is important to note that the proposed classification rule automatically adapts to theseparticular features of the spectral envelopes and optimal scalings through the data-drivenestimate of ˆ κ = 0 .
852 using LOO cross-validation, which assigns more weight to differencesin spectral envelopes in distinguishing between the two groups. This is an important featureof the proposed classification procedure as it allows for the classification rule to adapt todifferences between groups in the spectral envelope, optimal scalings, or both.
This article presents a novel approach to classifying categorical time series. An adaptivealgorithm that utilizes both the spectral envelope and its corresponding set of optimal scal-ings for classification of categorical time series is developed. Classification consistency is alsoestablished. We conclude this article by discussing some limitations and related future ex-tensions. First, the proposed method assumes that the collection of time series is stationary.22owever, in some applications, the time series could be nonstationary, which would requiretime-varying extensions of the spectral envelope and optimal scalings for proper character-ization. Incorporating nonstationarity may also further improve classification accuracy. Apossible extension of the proposed method for classifying nonstationary categorical time se-ries could use time-varying spectral envelope and scalings. Second, our method requires thatall time series have the same length and all categories are observed. However, in practice,time series may have different lengths and not all categories may be observed. For example,in the sleep study application, participants may have different lengths of full night sleepand some participants may not experience any movement during sleep. Future research willfocus on developing methods that can accommodate these kinds of time series observations.Third, our algorithm assumes that time series within the same group have the same cyclicalpatterns, while extra variability may be present in some applications (Krafty, 2016). A topicof future research would be to incorporate within-group variability into the classificationframework.
Supplementary Material
Supplementary material available online includes code for implementing the proposed clas-sifier on the three cases of simulated data.
Appendix: Proofs
To prove Theorem 1,2, and 3, we will make use of the following lemmas.
Lemma 1
Under Assumption 1 and assume that h ( ω ) re has distinct eigenvalues. Let λ ( ω ) and γ ( ω ) be the largest eigenvalue and corresponding eigenvector of h ( ω ) re . If B T → ∞ and T → ∞ with B T T − → , then, | E { ˆ λ ( ω ) } − λ ( ω ) | = O ( B T T − ) , | E { ˆ γ ( ω ) } − γ ( ω ) | = O ( B T T − ) . emma 2 Under Assumption 1 and assume that h ( ω ) re has distinct eigenvalues. Let λ ( ω ) and γ ( ω ) be the largest eigenvalue and corresponding eigenvector of h ( ω ) re . If B T → ∞ and T → ∞ with B T T − → , then, | ˆ λ ( ω ) − E { ˆ λ ( ω ) }| = O ( B T T − ) , | ˆ γ ( ω ) − E { ˆ γ ( ω ) }| = O ( B T T − ) . Proofs of Lemma 1 and 2 are straightforward from (Brillinger, 2002, Theorems 9.4.1 and9.4.3) and thus omitted.
Proof of Theorem 1
Recall that ˆ λ = { ˆ λ ( ω ) , . . . ˆ λ ( ω K ) } (cid:48) , where K = (cid:98) ( T − / (cid:99) , D ,ENV = || ˆ λ − Λ (1) || , and D ,ENV = || ˆ λ − Λ (2) || . Let ˆ λ s = ˆ λ ( ω s ). It can be shown that D ,ENV − D ,ENV = − K (cid:88) s =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) − K (cid:88) s =1 ( λ (1) s − λ (2) s ) . It remains to show that P ( D ,ENV − D ,ENV >
0) = P (cid:32)(cid:34) − K (cid:88) s =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) − K (cid:88) s =1 ( λ (1) s − λ (2) s ) (cid:35) > (cid:33) (4)is bounded. From Chebyshev inequality, we have P ( D ,ENV − D ,ENV > ≤ E (cid:18)(cid:104) − (cid:80) Ks =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) (cid:105) (cid:19)(cid:104)(cid:80) Ks =1 ( λ (1) s − λ (2) s ) (cid:105) . (5)Let’s consider the numerator, E (cid:40) − K (cid:88) s =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) (cid:41) = 4 E (cid:40) K (cid:88) s =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) (cid:41) = 4 E (cid:40) K (cid:88) s =1 (ˆ λ s − E (ˆ λ s ) + E (ˆ λ s ) − λ (1) s )( λ (1) s − λ (2) s ) (cid:41) ≤ E (cid:40) K (cid:88) s =1 (ˆ λ s − E (ˆ λ s ))( λ (1) s − λ (2) s ) (cid:41) + 8 E (cid:40) K (cid:88) s =1 ( E (ˆ λ s ) − λ (1) s )( λ (1) s − λ (2) s ) (cid:41) (6)24ombine (5) and (6), we have P ( D ,ENV − D ,ENV > ≤ I + II , whereI = 8 E (cid:40) K (cid:88) s =1 (ˆ λ s − E (ˆ λ s ))( λ (1) s − λ (2) s ) (cid:41) (cid:44)(cid:34) K (cid:88) s =1 ( λ (1) s − λ (2) s ) (cid:35) , II = 8 E (cid:40) K (cid:88) s =1 ( E (ˆ λ s ) − λ (1) s )( λ (1) s − λ (2) s ) (cid:41) (cid:44)(cid:34) K (cid:88) s =1 ( λ (1) s − λ (2) s ) (cid:35) , We analyze these two terms separately. For the first term I, we have, the numerator8 E (cid:40) K (cid:88) s =1 (ˆ λ s − E (ˆ λ s ))( λ (1) s − λ (2) s ) (cid:41) = O ( B T )from Lemma 2. From Assumption 2, we have the denominator (cid:80) Ks =1 ( λ (1) s − λ (2) s ) is oforder T . Combine these results we have I = O ( B T T − ). Similarly, using Lemma 1 andAssumption 2, we have II = O ( B T T − ). Thus, complete the proof. Proof of Theorem 2
Recall that ˆ γ = { ˆ γ ( ω ) (cid:48) , . . . ˆ γ ( ω K ) (cid:48) } (cid:48) , a K × ( m −
1) matrix, D ,SCA = || ˆ γ − Γ (1) || and D ,SCA = || ˆ γ − Γ (2) || . It can be shown that D ,SCA − D ,SCA = − m − (cid:88) (cid:96) =1 K (cid:88) s =1 (ˆ γ (cid:96),s − γ (1) (cid:96),s )( γ (1) (cid:96),s − γ (2) (cid:96),s ) − m − (cid:88) (cid:96) =1 K (cid:88) s =1 ( γ (1) (cid:96),s − γ (2) (cid:96),s ) . we aim to show P ( D ,SCA − D ,SCA > P ( D ,SCA − D ,SCA > ≤ I + II , whereI = 8 E (cid:40) m − (cid:88) (cid:96) =1 K (cid:88) s =1 (ˆ γ (cid:96),s − E (ˆ γ (cid:96),s ))( γ (1) (cid:96),s − λ (2) (cid:96),s ) (cid:41) (cid:44)(cid:34) m − (cid:88) (cid:96) =1 K (cid:88) s =1 ( γ (1) (cid:96),s − γ (2) (cid:96),s ) (cid:35) , II = 8 E (cid:40) m − (cid:88) (cid:96) =1 K (cid:88) s =1 ( E (ˆ γ (cid:96),s ) − γ (1) (cid:96),s )( γ (1) (cid:96),s − γ (2) (cid:96),s ) (cid:41) (cid:44)(cid:34) m − (cid:88) (cid:96) =1 K (cid:88) s =1 ( γ (1) (cid:96),s − γ (2) (cid:96),s ) (cid:35) , Combine Lemma 1 and 2, and Assumption 3, we have P ( D ,SCA − D ,SCA >
0) = O ( B T T − ).25 roof of Theorem 3 We would like to show P ( D ,EnvSca − D ,EnvSca >
0) is bounded. It can be shown than D ,EnvSca − D ,EnvSca = A + B, where A = κ (cid:34) − (cid:80) Ks =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) (cid:80) Ks =1 ˆ λ s − (cid:80) Ks =1 ( λ (1) s − λ (2) s ) (cid:80) Ks =1 ˆ λ s (cid:35) , and B = (1 − κ ) (cid:34) − (cid:80) m − (cid:96) =1 (cid:80) Ks =1 (ˆ γ (cid:96),s − γ (1) (cid:96),s )( γ (1) (cid:96),s − γ (2) (cid:96),s ) (cid:80) m − (cid:96) =1 (cid:80) Ks =1 ˆ γ (cid:96),s − (cid:80) m − (cid:96) =1 (cid:80) Ks =1 ( γ (1) (cid:96),s − γ (2) (cid:96),s ) (cid:80) m − (cid:96) =1 (cid:80) Ks =1 ˆ γ (cid:96),s (cid:35) . Using the results in the proof of Theorems 1 and 2, and Assumption 4, we have P ( A > P (cid:32)(cid:34) − m − (cid:88) (cid:96) =1 K (cid:88) s =1 (ˆ γ (cid:96),s − γ (1) (cid:96),s )( γ (1) (cid:96),s − γ (2) (cid:96),s ) − m − (cid:88) (cid:96) =1 K (cid:88) s =1 ( γ (1) (cid:96),s − γ (2) (cid:96),s ) (cid:35) > (cid:33) = O ( B T T − ) . and P ( B >
0) = P (cid:32)(cid:34) − K (cid:88) s =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) − K (cid:88) s =1 ( λ (1) s − λ (2) s ) (cid:35) > (cid:33) = O ( B T T − ) . Since P ( D ,EnvSca − D ,EnvSca > ≤ P ( A >
0) + P ( B > , we have the desired results. References
Aggarwal, C. C. (2002), “On Effective Classification of Strings with Wavelets,” in
Proceedingsof the Eighth ACM SIGKDD International Conference on Knowledge Discovery and DataMining , New York, NY, USA: Association for Computing Machinery, KDD ’02, p. 163–172.Billingsley, P. (1961),
Statistical Inference for Markov Processes , University of Chicago Press.Brillinger, D. R. (2002),
Time Series: Data Analysis and Theory , Philadelphia: SIAM.26ai, M. and Guo, W. (2004), “Multivariate spectral analysis using Cholesky decomposition,”
Biometrika , 91, 629–643.D’Cruz, O. F. and Vaughn, B. V. (1997), “Nocturnal seizures mimic REM behavior disorder,”
American Journal of Electroneurodiagnostic Technology , 37, 258–264.Deshpande, M. and Karypis, G. (2002), “Evaluation of Techniques for Classifying BiologicalSequences,” in
Advances in Knowledge Discovery and Data Mining , eds. Chen, M.-S., Yu,P. S., and Liu, B., Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 417–431.Fahrmeir, L. and Kauifmann, H. (1987), “Regression models for nonstationary categoricaltime series,”
Journal of Time Series Analysis , 8, 147–160.Fokianos, K. and Kedem, B. (1998), “Prediction and classification of nonstationary categor-ical time series,”
Journal of Multivariate Analysis , 67, 277–296.— (2003), “Regression theory for categorical time series,”
Statistical Science , 18, 357–376.Foldvary-Schaefer, N. and Alsheikhtaha, Z. (2013), “Complex nocturnal behaviors: Noctur-nal seizures and parasomnias,”
Continuum: Lifelong Learning in Neurology , 19, 104–131.Fryzlewicz, P. and Ombao, H. (2009), “Consistent classification of nonstationary time seriesusing stochastic wavelet,”
Journal of the American Statistical Association , 104, 299–312.Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P., Mark, R., Mietus, J., Moody,G., Peng, C.-K., and Stanley, H. (2000), “PhysioBank, PhysioToolkit, and PhysioNet:components of a new research resource for complex physiologic signals,”
Circulation , 101,e215–e220.Huang, H., Ombao, H., and Stoffer, D. (2004), “Discrimination and classification of nonsta-tionary time series using the SLEX model,”
Journal of the American Statistical Associa-tion , 99, 763–774.Ifrim, G. and Wiuf, C. (2011), “Bounded Coordinate-Descent for Biological Sequence Classi-fication in High Dimensional Predictor Space,” in
Proceedings of the 17th ACM SIGKDD nternational Conference on Knowledge Discovery and Data Mining , Association for Com-puting Machinery, p. 708–716.Institute of Medicine (2006), Sleep Disorders and Sleep Deprivation: An Unmet Public HealthProblem , Washington, DC: The National Academies Press.Jurafsky, D. and Martin, J. (2009),
Speech and Language Processing. , Pearson EducationInternational, 2nd ed.Krafty, R. T. (2016), “Discriminant Analysis of Time Series in the Presence of Within-GroupSpectral Variability,”
Journal of Time Series Analysis , 37, 435–450.Krafty, R. T. and Collinge, W. O. (2013), “Penalized multivariate Whittle likelihood forpower spectrum estimation,”
Biometrika , 100, 447–458.Krafty, R. T., Xiong, S., Stoffer, D. S., Buysse, D. J., and Hall, M. (2012), “Envelopingspectral surfaces: covariate dependent spectral analysis of categorical time series,”
Journalof Time Series Analysis , 33, 797–806.Navarro, G. (2001), “A Guided Tour to Approximate String Matching,”
ACM ComputingSurveys , 33, 31–88.Rechtschaffen, A. and Kales, A. (1968),
A Manual of Standardized Terminology, Techniquesand Scoring System for Sleep Stages of Human Subjects , Washington DC: US GovernmentPrinting Office.Rosen, O. and Stoffer, D. (2007), “Automatic estimation of multivariate spectra via smooth-ing splines,”
Biometrika , 94, 335–345.Schenck, C. H., Bundlie, S. R., Ettinger, M. G., and Mahowald, M. W. (1986), “ChronicBehavioral Disorders of Human REM Sleep: A New Category of Parasomnia,”
Sleep , 9,293–308.Shumway, R. and Stoffer, D. (2016),
Time series analysis and its applications , Springer:New York, 4th ed. 28toffer, D., Tyler, D., and McDougall, A. (1993), “Spectral analysis for categorical timeseries: scaling and the spectral envelope,”
Biometrika , 80, 611–632.Stoffer, D. S., Tyler, D. E., and Wendt, D. A. (2000), “The spectral envelope and its appli-cations,”
Statist. Sci. , 15, 224–253.Terzano, M., Parrino, L., Sherieri, A., Chervin, R., Chokroverty, S., Guilleminault, C.,Hirshkowitz, M., Mahowald, M., Moldofsky, H., Rosa, A., Thomas, R., and Walters, A.(2001), “Atlas, rules, and recording techniques for the scoring of cyclic alternating pattern(CAP) in human sleep,”
Sleep Med , 2, 537–553.Tinuper, P. and Bisulli, F. (2017), “From nocturnal frontal lobe epilepsy to sleep-relatedhypermotor epilepsy: a 35-year diagnostic challenge,”
Seizure , 44, 87–92.Zepelin, H., Siegel, J., and Tobler, I. (2005),