# Classification of Categorical Time Series Using the Spectral Envelope and Optimal Scalings

CClassiﬁcation of Categorical Time SeriesUsing the Spectral Envelope and Optimal

Scalings

Zeda Li*Baruch College, The City University of New YorkScott A. Bruce ∗ Department of Statistics, George Mason UniversityTian CaiGraduate Center, The City University of New York

Abstract

This article introduces a novel approach to the classiﬁcation of categorical timeseries under the supervised learning paradigm. To construct meaningful features forcategorical time series classiﬁcation, we consider two relevant quantities: the spectralenvelope and its corresponding set of optimal scalings. These quantities characterizeoscillatory patterns in a categorical time series as the largest possible power at eachfrequency, or spectral envelope , obtained by assigning numerical values, or scalings , tocategories that optimally emphasize oscillations at each frequency. Our procedure com-bines these two quantities to produce an interpretable and parsimonious feature-basedclassiﬁer that can be used to accurately determine group membership for categoricaltime series. Classiﬁcation consistency of the proposed method is investigated, and sim-ulation studies are used to demonstrate accuracy in classifying categorical time serieswith various underlying group structures. Finally, we use the proposed method to ex-plore key diﬀerences in oscillatory patterns of sleep stage time series for patients withdiﬀerent sleep disorders and accurately classify patients accordingly.

Keywords: Categorical Time Series; Classiﬁcation; Optimal Scaling; Replicated Time Series;Spectral Envelope ∗ Both authors contributed equally to this work. a r X i v : . [ s t a t . M E ] F e b Introduction

Categorical time series are frequently observed in a variety of ﬁelds, including sleep medicine,genetic engineering, rehabilitation science, and sports analytics (Stoﬀer et al., 2000). In manyapplications, multiple realizations of categorical time series from diﬀerent underlying groupsare collected in order to construct a classiﬁer that can accurately identify group membership.As a motivating example, we consider a sleep study in which participants with diﬀerenttypes of sleep disorders are monitored during a night of sleep via polysomnography in orderto understand important clinical and behavioral diﬀerences among these sleep disorders.During sleep, the body cycles through diﬀerent sleep stages: movement/wakefulness, rapideye movement (REM) sleep, and non-rapid eye movement (NREM) sleep, which is furtherdivided into light sleep (S1,S2) and deep sleep (S3, S4). Our analysis focuses on two particularsleep disorders, nocturnal frontal lobe epilepsy (NFLE) and REM behavior disorder (RBD),for which diﬀerential diagnosis is especially challenging due to a signiﬁcant overlap in theirassociated clinical and behavioral characteristics (Tinuper and Bisulli, 2017). For example,NFLE and RBD patients both exhibit complex, bizarre motor behavior and vocalizationsduring sleep. However, we posit that diﬀerences in sleep cycling behavior may still existdue to fundamental diﬀerences in the sleep disruption mechanisms of NFLE and RBD. Thegoal of our analysis is to investigate potential diﬀerences in sleep cycling behavior for NFLEand RBD patients and use this information to accurately classify patients accordingly. Thisdata-driven classiﬁcation can potentially improve accuracy in diﬀerential diagnoses of NFLEand RBD in patients presenting clinical and behavioral characteristics common to bothconditions. Figure 1 displays examples of study participants’ full night sleep stages seriesfrom two diﬀerent groups.In the statistical literature, classiﬁcation methods for multiple real-valued time serieshave been well-studied; see Shumway and Stoﬀer (2016) for a review. However, classiﬁca-tion of categorical time series has not received much attention. The majority of statisticalmethods for categorical time series analysis have been developed for analyzing a single cat-egorical time series. Some examples include the Markov chain model of Billingsley (1961),the link function approach of Fahrmeir and Kauifmann (1987), the likelihood-based method2

NFLE 1

Hours S l eep S t age W / M T R S S S S NFLE 2

Hours S l eep S t age W / M T R S S S S NFLE 3

Hours S l eep S t age W / M T R S S S S RBD 1

Hours S l eep S t age W / M T R S S S S RBD 2

Hours S l eep S t age W / M T R S S S S RBD 3

Hours S l eep S t age W / M T R S S S S Figure 1:

Sleep stage time series from six sleep study participants: three NFLE patients (top row)and three RBD patients (bottom row). of Fokianos and Kedem (1998), and the spectral envelope approach for analyzing a singletime series introduced in Stoﬀer et al. (1993). A comprehensive discussion of this researchdirection can be found in Fokianos and Kedem (2003). More recently, Krafty et al. (2012) in-troduced the spectral envelope surface for quantifying the association between the oscillatorypatterns of a collection of categorical time series and continuous covariates. However, it isnot immediately useful for classiﬁcation. To the best of our knowledge, this article presentsthe ﬁrst statistical approach for supervised classiﬁcation of multiple categorical time series.In the computer science literature, however, many methods have been developed to clas-sify string-valued time series, which can also be used for classiﬁcation of categorical timeseries. These include the minimum edit distance classiﬁer with sequence alignment (Navarro,2001; Jurafsky and Martin, 2009), Markov chain-based classiﬁers (Deshpande and Karypis,2002), the Haar Wavelet classiﬁer (Aggarwal, 2002), and the state-of-the-art sequence learnerthat uses a gradient-bounded coordinate-descent algorithm for eﬃciently selecting discrim-inative subsequences and then uses logistic regression for classiﬁcation (Ifrim and Wiuf,2011). These methods are black-box in nature and oﬀer little help in understanding key dif-3erences among groups. On the other hand, the proposed method addresses the classiﬁcationproblem using the spectral envelope and optimal scalings, which provide low-dimensional,interpretable summary measures of oscillatory patterns and traversals through categories.These patterns are often associated with scientiﬁc mechanisms that distinguish diﬀerentgroups and also produce lower classiﬁcation error compared to state-of-the-art computerscience methods like sequence learner.Many classiﬁers for real-valued time series rely on feature extraction, a process in whichlow-dimensional summary quantities are constructed that capture essential features of theunderlying groups. These quantities are then used to develop feature-based distance mea-sures, such as the Kullback-Leibler distance and squared quadratic distance, which can beused to measure diﬀerences between groups and time series of unknown group membership.Training data can then be used to estimate group-level quantities and construct a classiﬁerthat minimizes the distance between time series and their predicted group (Huang et al.,2004; Shumway and Stoﬀer, 2016). This type of approach cannot be easily extended tothe classiﬁcation of categorical time series due to the diﬃculty in obtaining low-dimensionalfeatures. To this end, we propose using the spectral envelope and its corresponding set ofoptimal scalings (Stoﬀer et al., 1993) as low-dimensional, interpretable features for diﬀer-entiating groups of categorical time series. Use of these features is motivated by noticingthat most categorical time series can be represented in terms of their prominent oscillatorypatterns, characterized by the spectral envelope, and by the set of mappings from categoriesto numeric values that accentuate speciﬁc oscillatory patterns, characterized by the optimalscalings.For example, Figures 2(a) and 2(b) display two categorical time series with similar traver-sals through categories, but diﬀerent oscillatory patterns. More speciﬁcally, the time seriesin Figure 2(b) cycles between categories faster than the time series in Figure 2(a). On theother hand, Figures 2(c) and 2(d) display two categorical time series with similar oscillatorypatterns, but diﬀerent traversals through categories. More speciﬁcally, the time series inFigure 2(c) spends approximately equal amounts of time in each category, while the timeseries in Figure 2(d) spends more time in categories 2 and 3. Moreover, Figure 3 displays theestimated spectral envelope for the two series in Figures 2(a) and 2(b) and the optimal scal-4ngs for the two series in Figures 2(c) and 2(d). The spectral envelope and optimal scalingsclearly reﬂect the corresponding diﬀerences between these series. In particular, the spectralenvelope indicates more high frequency power for the time series in Figure 2(b) since it cyclesbetween categories faster relative to the time series in Figure 2(a). Also, the optimal scalingsfor the time series in Figure 2(c) and Figure 2(d) are quite diﬀerent, reﬂecting the diﬀerenttraversals over categories resulting in diﬀerent distributions of time spent in categories. (a)

Time C a t ego r y (b) Time C a t ego r y (c) Time C a t ego r y (d) Time C a t ego r y Figure 2:

Four simulated categorical time series: (a) and (b) have the same dominating cate-gories but diﬀerent cyclical patterns; (c) and (d) have the same frequency patterns but diﬀerentdominating categories.

The proposed method is brieﬂy described as follows. For each time series to be classiﬁed,we represent it as a vector-valued time series through the use of indicator variables. Thesmoothed spectral density matrix of this vector-valued time series is then obtained, and thespectral envelope and optimal scalings at each frequency are computed from the estimatedspectral matrix. Then, the spectral envelope and optimal scalings for each group are esti-mated respectively via training data. The proposed feature, which is used to estimate thedistance from each group, is obtained by adaptively summing the diﬀerences in the spectralenvelope and optimal scalings. Finally, time series with unknown group membership are5ssigned to groups with the most similar features (i.e. minimum distance). Under the pro-posed framework, we show that the misclassiﬁcation probability is bounded as long as thespectral density matrix estimator is consistent. The procedure is demonstrated to performwell in simulation studies and a real data analysis.The remainder of the paper is organized as follows. Section 2 provides deﬁnitions of thespectral envelope and optimal scalings and corresponding estimators. Section 3 introducesthe proposed classiﬁcation procedure and its theoretical properties. Section 4 provides de-tailed simulation studies, which explore the empirical properties of the proposed method andcompares with the state-of-the-art sequence learner classiﬁer. Section 5 details the applica-tion of the proposed classiﬁer to the analysis of sleep stage time series to better understandand accurately classify sleep disorders. Section 6 provides some closing discussions andimpactful extensions of this work.

Let X t , for t = 0 , , , . . . , be a categorical time series with ﬁnite state-space C = { c , c , . . . , c m } .We assume that X t is stationary such that { X , X , . . . , X t } d = { X h , X h , . . . , X t + h } for h ≥ (cid:96) =1 , ,...,m P( X t = c (cid:96) ) > X t ( β ), obtained by assigning numerical values, orscalings, to categories such that β = ( β , β , . . . , β m ) (cid:48) ∈ R m and X t ( β ) = β (cid:96) when X t = c (cid:96) .We assume that X t ( β ) has a continuous and bounded spectral density f x ( ω ; β ) = ∞ (cid:88) h = −∞ Cov[ X t ( β ) , X t + h ( β )] exp( − πiωh ) . Let V x ( β ) be the variance of the scaled time series X t ( β ), the spectral envelope is thendeﬁned as the maximal normalized spectral density, f x ( ω ; β ) /V x ( β ), among all possible scal-ings not proportional to 1 m at frequency ω , where 1 m is the m -dimensional vector of ones.Scalings that assign the same value to each category are excluded since V x ( β ) is zero and the6ormalized power spectrum is not well deﬁned. Formally, we deﬁne the spectral envelopeand set of optimal scalings for frequency ω as λ ( ω ) = max β ∈ R m \{ } f x ( ω ; β ) V x ( β ) , B ( ω ) = arg max β ∈ R m \{ } f x ( ω ; β ) V x ( β ) , respectively, where { } is the subspace of R m that is proportional to 1 m . The spectralenvelope, λ ( ω ), is the largest proportion of the variance that can be obtained at frequency ω for diﬀerent possible scalings, such that f x ( ω, β ) ≤ λ ( ω ) ∀ β ∈ R m \ { } . The spectralenvelope characterizes important oscillatory patterns in categorical time series.For illustration, Figures 3(a) and 3(b) display the estimated spectral envelopes for timeseries displayed in Figures 2(a) and 2(b) respectively. It can be seen that the time seriesin Figure 2(a), which oscillates more slowly than the time series in Figure 2(b), has morepower in the estimated spectral envelope at lower frequencies. The set of optimal scalingsthat maximize the normalized spectral density at frequency ω , B ( ω ), provides importantinformation about the traversals through categories associated with prominent oscillatorypatterns at frequency ω . For further illustration, Figures 3(c) and 3(d) display the estimatedoptimal scalings for time series displayed in Figures 2(c) and 2(d) respectively. The optimalscalings in Figure 3(d) for categories 2 and 3 are similar at lower frequencies ( ω < . A common approach to the analysis of any type of categorical data is to represent it interms of random vectors of indicator variables. Similar to the formulations used in Stoﬀeret al. (1993); Krafty et al. (2012), we deﬁne the ( m − Y t , which has a one in the (cid:96) th element if X t = c (cid:96) for (cid:96) = 1 , . . . , m − c m as the reference category andrestricting the set of optimal scalings to a lower-dimensional space. The assumption that f x ( ω, β ) is continuous is necessary and suﬃcient for ensuring that Y t has a continuous spectral7 .0 0.1 0.2 0.3 0.4 0.5 . . (a) Frequency S pe c t r a l E n v e l ope . . (b) Frequency S pe c t r a l E n v e l ope (c) Frequency S c a li ng 123 (d) Frequency S c a li ng 123 −3−2−10123 Figure 3: (a) and (b): The spectral envelopes of the time series shown in panels (a) and (b) ofFigure 2; (c) and (d): The scalings of the time series presented in panels (c) and (d) of Figure 2. density, which is deﬁned as f y ( ω ) = ∞ (cid:88) h = −∞ Cov[ Y t , Y t + h ] exp( − πiωh ) . The spectral density f y ( ω ) is a positive deﬁnite Hermitian ( m − × ( m −

1) matrix. Weassume f y ( ω ) and the variance of Y t , V y = Var( Y t ), are non-singular for all ω ∈ R (Brillinger,2002). Formally, we deﬁne the spectral envelope and the corresponding set of optimal scalingsused in our proposed classiﬁcation algorithm as follows. Deﬁnition 1

For ω ∈ R , the spectral envelope, λ ( ω ) , is deﬁned as the largest eigenvalue of h ( ω ) = V − / y f y ( ω ) V − / y . The ( m − -variate vector of optimal scalings, γ ( ω ) , is deﬁned as the eigenvector associatedwith λ ( ω ) . a ∈ R m − , we have a (cid:48) f y ( ω ) a = a (cid:48) f rey ( ω ) a , where f rey ( ω ) is the real part of f y ( ω ). Thus,the spectral envelope is equivalent to the largest eigenvalue of h ( ω ) re = V − / f rey ( ω ) V − / .Second, a connection between the optimal scalings derived from this formulation and thatdeﬁned in Section 2.1 can be established (Krafty et al., 2012). If V / y γ ( ω ) is an eigenvectorof h re ( ω ) associated with λ ( ω ), then γ ( ω )0 = arg max β ∈ R m \{ } f x ( ω ; β ) V x ( β ) . When the multiplicity of λ ( ω ) as an eigenvalue of h re ( ω ) is one, there exists a unique γ ( ω )such that V / y γ ( ω ) is an eigenvector of γ ( ω ) associated with λ ( ω ) where γ ( ω ) (cid:48) V y γ ( ω ) = 1and with the ﬁrst nonzero entry of V / γ ( ω ) to be positive. Third, if there is a signiﬁcantfrequency component near ω , then λ ( ω ) will be large, and the values of γ ( ω ) are dependenton the particular cyclical traversal of the series through categories that produces the valueof λ ( ω ) at frequency ω . Consider a realization of a categorical time series, X t , t . . . , T , and its corresponding multi-variate process realization Y t , t . . . , T deﬁned in Section 2.2. Let ˆ f y ( ω ) represent the estimateof the spectral matrix f y ( ω ). To allow for asymptotic development, we assume Y t is strictlystationary and that all cumulant spectra, of all orders, exist (Brillinger, 2002, Assumption2.6.1). There is an extensive literature on estimation of the power spectral matrix. We useperiodograms, or sample analogues of the spectrum I ( s ) = T − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 Y t exp( − πist/T ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , s = 1 , . . . T. It is well known that the periodogram is an asymptotically unbiased but inconsistent esti-mator of the true spectral matrix. A common way to obtain a consistent estimator of thespectral matrix is to smooth periodogram ordinates over frequencies using kernels (Brillinger,9002). In this paper, we consider the smoothed periodogram estimatorˆ f y ( ω s ) = B T (cid:88) j = − B T W B T ,j I ( s + j ) , where ω s = s/T for s = 1 , . . . , K = (cid:98) ( T − / (cid:99) are the Fourier frequencies, 2 B T + 1 is thesmoothing span, and W B T ,j are nonnegative weights that satisfy the following conditions: W B T ,j = W B T , − j , B T (cid:88) j = − B T W B T ,j = 1 . Generally, the weights are chosen such that W B T , is a decreasing function of B T . It is knownthat ˆ f y ( ω k ) is consistent if B T → ∞ and B T T − → T → ∞ (Brillinger, 2002). Given thesample spectral matrix ˆ f y ( ω ) and sample variance (cid:98) V y , the estimate of the spectral envelopeˆ λ ( ω ) is the largest eigenvalue of ˆ h ( ω ) re = (cid:98) V − / ˆ f rey ( ω ) (cid:98) V − / , and the optimal scaling, ˆ γ ( ω ),is the eigenvector of ˆ h ( ω ) re associated with ˆ λ ( ω ). It should be noted that other approachesfor nonparametric estimation of the spectral matrix, such as those in Dai and Guo (2004),Rosen and Stoﬀer (2007), and Krafty and Collinge (2013), can also be used. We use thekernel smoothing approach for computational eﬃciency and ease of theoretical exposition. Consider a population of categorical time series composed of J ≥ , . . . Π J .Denote the j th group-level spectral envelope and ( m − ( j ) ( ω ) and Γ ( j ) ( ω )for j = 1 , . . . , J respectively. Suppose we observe N = (cid:80) Jj =1 N j independent training timeseries of length T and R independent testing time series of length T , X r = { X r , . . . , X rT } , r = 1 , . . . , R , with unknown group membership. In this section, we introduce an adaptivealgorithm for consistent classiﬁcation. As shown in Figures 2 and 3, groups of categorical time series may exhibit distinct oscilla-tory patterns. In this case, the spectral envelope, which characterizes dominant oscillatorypatterns, can be used as a signature for each group and an important feature for categorical10ime series classiﬁcation. We outline a classiﬁcation procedure based on the spectral envelopebelow.1. For each testing time series, compute the sample spectral envelope, ˆ λ ( r ) ( ω s ), for r =1 , . . . , R , where ω s = s/T are the Fourier frequencies with s = 1 , . . . , K and K = (cid:98) ( T (cid:96) − / (cid:99) . Denote ˆ λ ( r ) = { ˆ λ ( r ) ( ω ) , . . . , ˆ λ ( r ) ( ω K ) } (cid:48) as a K -dimensional vector.2. Compute D ( r ) j,ENV = || ˆ λ ( r ) − Λ ( j ) || , (1)where || · || is the L norm.3. Classify time series X r to the group Π j with the most similar spectral envelope suchthat ˆ g r = arg min j D ( r ) j,ENV j = 1 , . . . , J. Classiﬁcation consistency can be established under the following assumptions. To aid thepresentation, we consider the case of J = 2 groups, Π and Π , while similar results can bederived for J > Assumption 1

Each element of the ( m − × ( m − spectral density matrix f y ( ω ) hasbounded and continuous ﬁrst derivatives. Assumption 2 || Λ (1) − Λ (2) || ≥ CT for a positive constant C . Under Assumption 1, asymptotic consistency of the estimates ˆ λ ( ω ) and ˆ γ ( ω ) discussed inSection 2.3 can be established, and the largest eigenvalue of the spectral density matrix iscontinuous and bounded from above. Assumption 2 implies that the spectral envelopes of thetwo groups are well separated. The following theorem states the classiﬁcation consistency ofusing the spectral envelope as a classiﬁer. Theorem 1

Under Assumptions 1-2, the probability of misclassifying X r , a testing timeseries from group Π , to group Π , can be bounded as follows: P ( D ( r )1 ,ENV > D ( r )2 ,ENV ) = O ( B T T − ) , where D ( r )1 ,ENV and D ( r )2 ,ENV are deﬁned in (1) . .2 Classiﬁcation via Optimal Scalings While the spectral envelope adequately characterizes dominant oscillatory patterns, it doesn’taccount for traversals through categories responsible for such oscillatory patterns. Diﬀerencesamong groups may also be due to diﬀerent traversals through categories that produce par-ticular oscillatory patterns, which are characterized by optimal scalings for each frequencycomponent. Similarly, we present a categorical time series classiﬁer using optimal scalingsbelow.1. Compute the ( m − γ ( r ) ( ω s ), of the testing time series X r for r = 1 , . . . , R , where ω s = s/T are the Fourier frequencies with s = 1 , . . . , K and K = (cid:98) ( T (cid:96) − / (cid:99) . Denote ˆ γ ( r ) = { ˆ γ ( r ) ( ω ) (cid:48) , . . . ˆ γ ( r ) ( ω K ) (cid:48) } (cid:48) as a K × ( m −

1) matrix.2. Compute D ( r ) j,SCA = || ˆ γ ( r ) − Γ ( j ) || F (2)where || · || F is the Frobenius norm.3. Classify time series X r to the group Π j with the most similar set of optimal scalingssuch that ˆ g r = arg min j D ( r ) j,SCA j = 1 , . . . , J. In addition to Assumption 1, the following assumption is necessary to establish theclassiﬁcation consistency of the scaling classiﬁer, which indicates that the optimal scalingsare well separated.

Assumption 3

For ﬁxed m categories, || Γ (1) − Γ (2) || F ≥ CT for a positive constant C . Theorem 2 states the consistency of classiﬁcation based on the scalings.

Theorem 2

Under Assumptions 1 and 3, the probability of misclassifying X r , a testing timeseries from group Π , to group Π , can be bounded as follows: P ( D ( r )1 ,SCA > D ( r )2 ,SCA ) = O ( B T T − ) , where D ( r )1 ,SCA and D ( r )2 ,SCA are deﬁned in (2) . .3 Proposed Adaptive Envelope and Scaling Classiﬁer The envelope classiﬁer (Section 3.1) works well in situations where oscillatory patterns arediﬀerent among groups, while the scaling classiﬁer (Section 3.2) is eﬀective when traversalsthrough categories are distinct among groups. However, in practice, diﬀerent groups are likelyto exhibit diﬀerent oscillatory patterns and traversals through categories to some extent.Thus, it is desirable to construct an adaptive classiﬁer that can automatically identify theextent to which groups are diﬀerent with respect to their oscillatory patterns, traversalsthrough categories, or both, and optimally classify time series accordingly. To this end,we propose a general purpose, ﬂexible classiﬁer that adaptively weights diﬀerences in thespectral envelope and optimal scalings in order to determine the characteristics that bestdistinguish groups and provide accurate classiﬁcation. Speciﬁcally, we consider the followingdistance of the r th testing time series to the j th group D ( r ) j,EnvSca = κ || ˆ λ ( r ) − Λ ( j ) || || ˆ λ ( r ) || + (1 − κ ) || ˆ γ ( r ) − Γ ( j ) || F || ˆ γ ( r ) || F , (3)for j = 1 , . . . , J and r = 1 , . . . , R . Since the spectral envelope ˆ λ ( r ) is a K -dimensional vectorand the scaling ˆ γ ( r ) is ( m − × K matrix, we rescale these distances by their correspondingnorms. The unknown tuning parameter κ controls the relative importance of the spectralenvelope and optimal scalings in classifying time series. Our proposed adaptive classiﬁcationalgorithm is presented in Algorithm 1.Several remarks on the algorithm should be noted. First, the group-level spectral en-velopes Λ ( j ) and optimal scalings Γ ( j ) are unknown in practice. We obtain Λ ( j ) and Γ ( j ) byaveraging the sample spectral envelopes and sample optimal scalings across training timeseries replicates within the j th group, respectively. In particular, we replace Λ ( j ) and Γ ( j ) bytheir sample estimates ˆΛ ( j ) = 1 N j N j (cid:88) k =1 ˆ λ ( j,k ) , ˆΓ ( j ) = 1 N j N j (cid:88) k =1 ˆ γ ( j,k ) , for j = 1 , . . . , J , where ˆ λ ( j,k ) and ˆ γ ( j,k ) are the estimated spectral envelope and optimalscalings of the k th training time series among group j , respectively. Second, we select thetuning parameter κ by using a grid search through leave-one-out (LOO) cross-validation.13 ata: R independent testing time series , X r = { X r , . . . , X rT } for r = 1 , . . . , R . Result:

Estimated group assignment for each testing time series, { ˆ g , . . . , ˆ g R } ,where ˆ g r ∈ (1 , . . . , J ) for r = 1 , . . . , R .Step 1: Use Leave-one-out cross validation to select tuning parameter κ .Step 2: for r = 1 , . . . R do Convert the testing time series X r with m categories into a ( m − Y r deﬁned in Section 2.2 and compute the ( m − × ( m −

1) matrixˆ h ( ω ) in Deﬁnition 1;Compute the sample spectral envelope, ˆ λ ( r ) ( ω s ), of the testing time series X r ,where ω s = s/T are the Fourier frequencies with s = 1 , . . . , K and K = (cid:98) ( T (cid:96) − / (cid:99) . Denoteˆ λ ( r ) = { ˆ λ ( r ) ( ω ) , . . . ˆ λ ( r ) ( ω K ) } (cid:48) as a K -dimensional vector;Compute the ( m − γ ( r ) ( ω s ), of thetesting time series X r , where ω s = s/T are the Fourier frequencies with s = 1 , . . . , K and K = (cid:98) ( T (cid:96) − / (cid:99) . Denoteˆ γ ( r ) = { ˆ γ ( r ) ( ω ) (cid:48) , . . . ˆ γ ( r ) ( ω K ) (cid:48) } (cid:48) as a K × ( m −

1) matrix; for j = 1 , . . . J do Compute D ( r ) j,EnvSca = κ || ˆ λ ( r ) − Λ ( j ) || || ˆ λ ( r ) || + (1 − κ ) || ˆ γ ( r ) − Γ ( j ) || F || ˆ γ ( r ) || F . end Classify the time series X r to group Π j if D ( r ) j,EnvSca is the smallest among all D ( r ) j,EnvSca for j = 1 , . . . , J , that is,ˆ g r = arg min j D ( r ) j,EnvSca . endreturn { ˆ g , . . . , ˆ g R } ; Algorithm 1:

Envelope and Scaling Classifier (EnvSca)In particular, let κ ∈ (0 , . , . , . . . , κ corresponds to the value thatproduces the highest leave-one-out classiﬁcation rate via Algorithm 1. Although a ﬁner gridcould be used as well, in our experience, using κ ∈ (0 , . , . , . . . ,

1) performs well without14acriﬁcing computational eﬃciency. Third, to obtain more parsimonious measures that stillcan discriminate among diﬀerent groups, we may select a subset of elements in the spectralenvelope and optimal scalings that are most diﬀerent among groups. This strategy has beenused in Fryzlewicz and Ombao (2009) for classifying nonstationary quantitative time series.For example, we compute∆( s ) = J (cid:88) j =1 J (cid:88) h = j +1 (cid:2) Λ ( j ) ( ω s ) − Λ ( h ) ( ω s ) (cid:3) , s = 1 , . . . , K, order ∆( s ) decreasingly, and then choose the top proportion of the elements in ∆( s ). Aleave-one-out cross validation approach that minimizes the classiﬁcation error is then usedto select an appropriate proportion.Under Assumptions 1 and 4, classiﬁcation consistency is established in Theorem 3. Assumption 4

For ﬁxed m categories, || Λ (1) − Λ (2) || + || Γ (1) − Γ (2) || F ≥ CT for a positiveconstant C . Theorem 3

Under Assumptions 1 and 4, the probability of misclassifying X r , a time seriesfrom group Π , to group Π , can be bounded as follows: P ( D ( r )1 ,EnvSca > D ( r )2 ,EnvSca ) = O ( B T T − ) , where D ( r )1 ,EnvSca and D ( r )2 ,EnvSca are deﬁned in Equation (3) . We conduct simulation studies to evaluate performance of the proposed classiﬁcation proce-dure. Following Fokianos and Kedem (2003), categorical time series X t are generated fromthe multinomial logit model as follows p t(cid:96) ( α ) = exp( α (cid:48) (cid:96) Y t − )1 + (cid:80) m − (cid:96) =1 exp( α (cid:48) (cid:96) Y t − ) , (cid:96) = 1 , . . . , m − , and p tm ( α ) = 11 + (cid:80) m − (cid:96) =1 exp( α (cid:48) (cid:96) Y t − ) , Y t is a ( m − (cid:96) th element if X t = c (cid:96) for (cid:96) = 1 , . . . , m − p t(cid:96) for (cid:96) = 1 , . . . , m are the probabilities of X t = c (cid:96) at time t and satisfy (cid:80) m(cid:96) =1 p t(cid:96) = 1, and α (cid:96) for (cid:96) = 1 , . . . , m are the regression parameters.The simulated model incorporates a lagged value of order one of Y t or X t . We consider threediﬀerent cases under the multinomial model. For the ﬁrst two cases, we let the number ofcategories m = 4 and the number of groups J = 2. For Case 1, we consider the followingregression parameters. α = (1 . , , (cid:48) , α = (1 , . , (cid:48) , α = (1 , , . (cid:48) if Y t ∈ Π ,α = (0 . , , (cid:48) , α = (1 , . , (cid:48) , α = (1 , , . (cid:48) if Y t ∈ Π . Figures 2(a) and 2(b) display realizations of time series from groups Π and Π in Case 1,respectively. For Case 2, the regression parameters are set to be α = (1 . , , (cid:48) , α = (1 , . , (cid:48) , α = (1 , , . (cid:48) if Y t ∈ Π ,α = (0 . , , (cid:48) , α = (1 , . , (cid:48) , α = (1 , , . (cid:48) if Y t ∈ Π . Figures 2(c) and 2(d) present realizations of time series from groups Π and Π in Case 2,respectively. For Case 3, we consider J = 3 diﬀerent groups with the following regressionparameters α = (0 . , , (cid:48) , α = (1 , . , (cid:48) , α = (1 , , . (cid:48) if Y t ∈ Π ,α = (1 . , , (cid:48) , α = (1 , . , (cid:48) , α = (1 , , . (cid:48) if Y t ∈ Π ,α = (1 . , . , (cid:48) , α = ( − , − . , − (cid:48) , α = (2 , . , − (cid:48) if Y t ∈ Π .

100 replications are generated for the 27 combinations of 3 cases, 3 numbers of time se-ries per group in the training data, N j = 20 , ,

100 for all j , and 3 time series lengths T = 100 , , and Π have diﬀerent oscillatory patterns but similar traversals through categories, resultingin a poor classiﬁcation rate if we use only the optimal scalings for classiﬁcation. For Case2, where the two groups are distinct mainly in the optimal scalings, the envelope classiﬁerproduces the lowest correct classiﬁcation rate (around 50%) among all methods considered.The proposed classiﬁer and the scaling classiﬁer perform similarly. They have slightly lowerclassiﬁcation rates than sequence learner, which is designed to select and use all subsequencesthat are important in classifying responses and thus is well-suited for the setting in Case2. In Case 3, we consider three groups, and groups diﬀer in cyclical patterns and scalings.The proposed classiﬁer has higher mean classiﬁcation rates than the envelope and scalingclassiﬁers. This is because groups are diﬀerent in both oscillatory patterns and traversalsthrough categories. The proposed classiﬁer, by incorporating both the spectral envelope andoptimal scalings, can produce better classiﬁcation rates in this case. It should be noted thatsequence learner is developed under the framework of logistic regression and cannot classifya population of time series with more than two groups in its current form. One could extendsequence learner to multinomial logistic regression, but extensive programming eﬀorts areneeded and no prior results are available. Thus, we don’t have simulation results for sequencelearner in Case 3.In addition to classiﬁcation, estimates of the tuning parameter κ in the proposed al-gorithm allow for interpretable inference. For example, the average of estimated tuningparameters ˆ κ in our simulations for Cases 1, 2, and 3 are 1.00, 0.24, and 0.66, respectively.This suggests that κ can help us to identify whether groups are diﬀerent in oscillatory pat-terns only, traversals through categories only, or a mixture of the two.17able 1: Mean (standard deviation) of the percent of correctly classiﬁed time seriesacross methods.

Case N J T EnvSca SCA ENV SEQ100 92.21 (3.41) 49.42 (4.72) 93.32 (2.39) 87.13 (3.32)20 200 96.91 (1.99) 49.84 (4.60) 98.16 (1.39) 93.24 (2.70)500 98.78 (1.66) 50.04 (4.71) 99.98 (0.14) 98.44 (1.40)100 92.99 (2.68) 49.92 (4.79) 93.54 (2.28) 90.40 (2.97)1 50 200 97.64 (1.99) 50.10 (4.31) 98.47 (1.19) 96.46 (2.04)500 99.56 (0.64) 49.63 (4.48) 99.98 (0.14) 99.56 (0.76)100 93.68 (2.67) 50.67 (5.00) 93.76 (2.37) 91.55 (2.71)100 200 98.26 (1.30) 49.73 (4.58) 98.49 (1.19) 96.73 (4.96)500 99.80 (0.45) 50.22 (4.72) 99.97 (0.17) 99.68 (0.60)100 71.13 (6.23) 71.66 (6.00) 50.42 (5.02) 75.16 (4.45)20 200 78.69 (5.76) 79.30 (5.03) 49.85 (5.29) 83.32 (4.21)500 88.27 (3.89) 88.65 (3.96) 49.94 (4.51) 93.14 (2.58)100 76.01 (5.36) 76.25 (5.34) 50.71 (4.39) 77.94 (4.11)2 50 200 84.14 (4.03) 84.22 (4.10) 50.17 (4.92) 86.71 (3.43)500 94.20 (2.47) 99.40 (2.34) 50.93 (5.22) 95.95 (2.23)100 79.19 (4.60) 79.48 (4.51) 50.58 (4.83) 78.56 (4.45)100 200 87.59 (3.73) 87.65 (3.67) 39.61 (5.05) 88.46 (3.32)500 96.29 (1.83) 96.31 (1.89) 50.38 (5.04) 96.68 (1.87)100 81.02 (4.69) 70.43 (4.67) 70.88 (3.97) NA20 200 89.64 (3.58) 75.17 (3.48) 80.61 (3.62) NA500 97.39 (1.80) 81.80 (3.12) 93.04 (2.27) NA100 83.79 (3.30) 72.91 (3.67) 71.08 (3.38) NA3 50 200 92.28 (2.62) 78.18 (2.90) 81.82 (2.91) NA500 98.42 (1.28) 84.51 (3.02) 94.32 (2.12) NA100 84.97 (3.34) 73.07 (3.29) 71.37 (3.48) NA100 200 93.04 (2.09) 79.99 (3.05) 82.69 (2.87) NA500 98.67 (1.00) 87.01 (2.59) 94.29 (1.96) NA Analysis of Sleep Stage Time Series

During a full night of sleep, the body cycles through diﬀerent sleep stages, including rapideye movement (REM) sleep, in which dreaming typically occurs, and non-rapid eye move-ment (NREM) sleep, which consists of four stages representing light sleep (S1,S2) and deepsleep (S3,S4). These sleep stages are associated with speciﬁc physiological behaviors thatare essential to the rejuvenating properties of sleep. Disruptions to typical cyclical behav-ior and changes in the amount of time spent in each sleep stage have been found to beassociated with many sleep disorders (Zepelin et al., 2005; Institute of Medicine, 2006). Par-ticular sleep disorders, such as nocturnal frontal lobe epilepsy (NFLE), are also diﬃcult toaccurately diagnose since clinical, behavioral, and electroencephalography (EEG) patternsfor NFLE patients are often similar to those of patients with other sleep disorders, such asREM behavior disorder (RBD) (D’Cruz and Vaughn, 1997; Tinuper and Bisulli, 2017). Ac-cordingly, there is a need for statistical procedures that can automatically identify cyclicalpatterns in sleep stage time series associated with speciﬁc sleep disorders and accuratelyclassify patients with diﬀerent sleep disorders.The data for this analysis was collected through a study of various sleep-related dis-orders (Terzano et al., 2001) and is publicly available via physionet (Goldberger et al.,2000). All participants were monitored during a full night of sleep and their sleep stageswere annotated by experienced technicians every 20 seconds according to well-establishedsleep staging criteria (Rechtschaﬀen and Kales, 1968). We consider classifying sleep stagetime series data collected from NFLE and RBD patients, for which diﬀerential diagnosis isparticularly challenging (Tinuper and Bisulli, 2017). NFLE and RBD patients both expe-rience signiﬁcant sleep disruptions associated with complex, often bizarre motor behavior(e.g. violent movements of arms or legs, dystonic posturing) and vocalization (e.g. scream-ing, shouting, laughing), which is due to nocturnal seizures for NFLE patients (Tinuper andBisulli, 2017) and due to dream-enacting behavior in REM sleep for RBD patients (Schencket al., 1986). This makes diﬀerentiating RBD and NFLE patients particularly challenging.An objective, data-driven classiﬁcation procedure that can automatically distinguish patientsand aide diﬀerential diagnosis is needed. 19he current analysis considers 8 hours of sleep stage time series from N = 46 participants:34 NFLE patients and 12 RBD patients. This results in categorical time series of length T = 1440 with m = 6 sleep stages (REM, S1, S2, S3, S4, and Wake/Movement). Examplesare provided in Figure 1. In order to estimate the spectral envelope and optimal scalings,Wake/Movement is used as the reference category. Leave-one-out (LOO) cross-validation isthen used to empirically evaluate the eﬀectiveness of the classiﬁcation rule. For this data,the overall correct classiﬁcation rate is 82.61%, with 29 of the 34 NFLE patients correctlyclassiﬁed and 9 of the 12 RBD patients correctly classiﬁed. The tuning parameter estimatedvia LOO cross-validation is ˆ κ = 0 . ≤ .

05) representing cycles lasting longer than 6 . .0000.0050.010 0.00 0.01 0.02 0.03 0.04 0.05 Frequency l ^ NFLE RBD

NFLE

Frequency g ^ R S S S S RBD

Frequency g ^ R S S S S −0.50.00.5 Figure 4:

Left: Estimated spectral envelope for NFLE patients (solid red) and RBD patients(dashed blue) for low frequencies (below 0.05). Group-level estimated spectral envelopes are rep-resented by the two thicker lines. Right: Estimated optimal scalings for NLFE patients (top) andRBD patients patients (bottom) for low frequencies (below 0.05).

Second, diﬀerences in optimal scalings (see Figure 4) are more subtle, with noticeablediﬀerences over some categories (e.g. S3, S4), but not all. More speciﬁcally, scalings forfrequencies below 0.025 indicate low frequency behavior in NFLE patients due to cyclingamong three broader sleep stage groupings: 1) light sleep (S2), 2) deep sleep (S4), and 3)a combination of transitional sleep stages (S1, S3), REM, and Wake/Movement. On theother hand, RBD patients exhibit low frequency power primarily due to cycling in and outof light sleep (S2). This can be attributed to more regular and prolonged periods of deepsleep (S4) observed in NFLE patients, lasting 14 minutes per onset and covering 20.9% oftotal sleep on average, compared to RBD patients, lasting only 10.8 minutes per onset andcovering 13.1% of total sleep on average. To better illustrate the diﬀerences in the optimalscalings, Figure 5 provides a sample series from each group along with the scaled time seriesobtained by averaging optimal scalings over frequencies below 0.025. Given the propensityfor RBD patients to experience immediate sleep disruptions more so than NFLE patients, it21s not surprising that RBD patients experience less deep sleep than NFLE patients.

NFLE 1 RBD 1

W/MTRS1S2S3S4 1 2 3 4 5 6 7 8 S l eep S t age W/MTRS1S2S3S4 1 2 3 4 5 6 7 8 S l eep S t age −0.30.00.3 1 2 3 4 5 6 7 8 Hours S c a l ed T i m e S e r i e s ( < f <= . ) −0.30.00.3 1 2 3 4 5 6 7 8 Hours S c a l ed T i m e S e r i e s ( < f <= . ) Stage

NREM REM W/MT

Figure 5:

Top: Sample time series from the NFLE and RBD groups. Bottom: Correspondingscaled time series based on the mean scaling for frequencies below 0.025 (i.e. cycles lasting morethan 13 minutes). Color corresponding to NREM (purple), REM (blue) and W/MT (yellow) sleepstages also provided.

It is important to note that the proposed classiﬁcation rule automatically adapts to theseparticular features of the spectral envelopes and optimal scalings through the data-drivenestimate of ˆ κ = 0 .

852 using LOO cross-validation, which assigns more weight to diﬀerencesin spectral envelopes in distinguishing between the two groups. This is an important featureof the proposed classiﬁcation procedure as it allows for the classiﬁcation rule to adapt todiﬀerences between groups in the spectral envelope, optimal scalings, or both.

This article presents a novel approach to classifying categorical time series. An adaptivealgorithm that utilizes both the spectral envelope and its corresponding set of optimal scal-ings for classiﬁcation of categorical time series is developed. Classiﬁcation consistency is alsoestablished. We conclude this article by discussing some limitations and related future ex-tensions. First, the proposed method assumes that the collection of time series is stationary.22owever, in some applications, the time series could be nonstationary, which would requiretime-varying extensions of the spectral envelope and optimal scalings for proper character-ization. Incorporating nonstationarity may also further improve classiﬁcation accuracy. Apossible extension of the proposed method for classifying nonstationary categorical time se-ries could use time-varying spectral envelope and scalings. Second, our method requires thatall time series have the same length and all categories are observed. However, in practice,time series may have diﬀerent lengths and not all categories may be observed. For example,in the sleep study application, participants may have diﬀerent lengths of full night sleepand some participants may not experience any movement during sleep. Future research willfocus on developing methods that can accommodate these kinds of time series observations.Third, our algorithm assumes that time series within the same group have the same cyclicalpatterns, while extra variability may be present in some applications (Krafty, 2016). A topicof future research would be to incorporate within-group variability into the classiﬁcationframework.

Supplementary Material

Supplementary material available online includes code for implementing the proposed clas-siﬁer on the three cases of simulated data.

Appendix: Proofs

To prove Theorem 1,2, and 3, we will make use of the following lemmas.

Lemma 1

Under Assumption 1 and assume that h ( ω ) re has distinct eigenvalues. Let λ ( ω ) and γ ( ω ) be the largest eigenvalue and corresponding eigenvector of h ( ω ) re . If B T → ∞ and T → ∞ with B T T − → , then, | E { ˆ λ ( ω ) } − λ ( ω ) | = O ( B T T − ) , | E { ˆ γ ( ω ) } − γ ( ω ) | = O ( B T T − ) . emma 2 Under Assumption 1 and assume that h ( ω ) re has distinct eigenvalues. Let λ ( ω ) and γ ( ω ) be the largest eigenvalue and corresponding eigenvector of h ( ω ) re . If B T → ∞ and T → ∞ with B T T − → , then, | ˆ λ ( ω ) − E { ˆ λ ( ω ) }| = O ( B T T − ) , | ˆ γ ( ω ) − E { ˆ γ ( ω ) }| = O ( B T T − ) . Proofs of Lemma 1 and 2 are straightforward from (Brillinger, 2002, Theorems 9.4.1 and9.4.3) and thus omitted.

Proof of Theorem 1

Recall that ˆ λ = { ˆ λ ( ω ) , . . . ˆ λ ( ω K ) } (cid:48) , where K = (cid:98) ( T − / (cid:99) , D ,ENV = || ˆ λ − Λ (1) || , and D ,ENV = || ˆ λ − Λ (2) || . Let ˆ λ s = ˆ λ ( ω s ). It can be shown that D ,ENV − D ,ENV = − K (cid:88) s =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) − K (cid:88) s =1 ( λ (1) s − λ (2) s ) . It remains to show that P ( D ,ENV − D ,ENV >

0) = P (cid:32)(cid:34) − K (cid:88) s =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) − K (cid:88) s =1 ( λ (1) s − λ (2) s ) (cid:35) > (cid:33) (4)is bounded. From Chebyshev inequality, we have P ( D ,ENV − D ,ENV > ≤ E (cid:18)(cid:104) − (cid:80) Ks =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) (cid:105) (cid:19)(cid:104)(cid:80) Ks =1 ( λ (1) s − λ (2) s ) (cid:105) . (5)Let’s consider the numerator, E (cid:40) − K (cid:88) s =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) (cid:41) = 4 E (cid:40) K (cid:88) s =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) (cid:41) = 4 E (cid:40) K (cid:88) s =1 (ˆ λ s − E (ˆ λ s ) + E (ˆ λ s ) − λ (1) s )( λ (1) s − λ (2) s ) (cid:41) ≤ E (cid:40) K (cid:88) s =1 (ˆ λ s − E (ˆ λ s ))( λ (1) s − λ (2) s ) (cid:41) + 8 E (cid:40) K (cid:88) s =1 ( E (ˆ λ s ) − λ (1) s )( λ (1) s − λ (2) s ) (cid:41) (6)24ombine (5) and (6), we have P ( D ,ENV − D ,ENV > ≤ I + II , whereI = 8 E (cid:40) K (cid:88) s =1 (ˆ λ s − E (ˆ λ s ))( λ (1) s − λ (2) s ) (cid:41) (cid:44)(cid:34) K (cid:88) s =1 ( λ (1) s − λ (2) s ) (cid:35) , II = 8 E (cid:40) K (cid:88) s =1 ( E (ˆ λ s ) − λ (1) s )( λ (1) s − λ (2) s ) (cid:41) (cid:44)(cid:34) K (cid:88) s =1 ( λ (1) s − λ (2) s ) (cid:35) , We analyze these two terms separately. For the ﬁrst term I, we have, the numerator8 E (cid:40) K (cid:88) s =1 (ˆ λ s − E (ˆ λ s ))( λ (1) s − λ (2) s ) (cid:41) = O ( B T )from Lemma 2. From Assumption 2, we have the denominator (cid:80) Ks =1 ( λ (1) s − λ (2) s ) is oforder T . Combine these results we have I = O ( B T T − ). Similarly, using Lemma 1 andAssumption 2, we have II = O ( B T T − ). Thus, complete the proof. Proof of Theorem 2

Recall that ˆ γ = { ˆ γ ( ω ) (cid:48) , . . . ˆ γ ( ω K ) (cid:48) } (cid:48) , a K × ( m −

1) matrix, D ,SCA = || ˆ γ − Γ (1) || and D ,SCA = || ˆ γ − Γ (2) || . It can be shown that D ,SCA − D ,SCA = − m − (cid:88) (cid:96) =1 K (cid:88) s =1 (ˆ γ (cid:96),s − γ (1) (cid:96),s )( γ (1) (cid:96),s − γ (2) (cid:96),s ) − m − (cid:88) (cid:96) =1 K (cid:88) s =1 ( γ (1) (cid:96),s − γ (2) (cid:96),s ) . we aim to show P ( D ,SCA − D ,SCA > P ( D ,SCA − D ,SCA > ≤ I + II , whereI = 8 E (cid:40) m − (cid:88) (cid:96) =1 K (cid:88) s =1 (ˆ γ (cid:96),s − E (ˆ γ (cid:96),s ))( γ (1) (cid:96),s − λ (2) (cid:96),s ) (cid:41) (cid:44)(cid:34) m − (cid:88) (cid:96) =1 K (cid:88) s =1 ( γ (1) (cid:96),s − γ (2) (cid:96),s ) (cid:35) , II = 8 E (cid:40) m − (cid:88) (cid:96) =1 K (cid:88) s =1 ( E (ˆ γ (cid:96),s ) − γ (1) (cid:96),s )( γ (1) (cid:96),s − γ (2) (cid:96),s ) (cid:41) (cid:44)(cid:34) m − (cid:88) (cid:96) =1 K (cid:88) s =1 ( γ (1) (cid:96),s − γ (2) (cid:96),s ) (cid:35) , Combine Lemma 1 and 2, and Assumption 3, we have P ( D ,SCA − D ,SCA >

0) = O ( B T T − ).25 roof of Theorem 3 We would like to show P ( D ,EnvSca − D ,EnvSca >

0) is bounded. It can be shown than D ,EnvSca − D ,EnvSca = A + B, where A = κ (cid:34) − (cid:80) Ks =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) (cid:80) Ks =1 ˆ λ s − (cid:80) Ks =1 ( λ (1) s − λ (2) s ) (cid:80) Ks =1 ˆ λ s (cid:35) , and B = (1 − κ ) (cid:34) − (cid:80) m − (cid:96) =1 (cid:80) Ks =1 (ˆ γ (cid:96),s − γ (1) (cid:96),s )( γ (1) (cid:96),s − γ (2) (cid:96),s ) (cid:80) m − (cid:96) =1 (cid:80) Ks =1 ˆ γ (cid:96),s − (cid:80) m − (cid:96) =1 (cid:80) Ks =1 ( γ (1) (cid:96),s − γ (2) (cid:96),s ) (cid:80) m − (cid:96) =1 (cid:80) Ks =1 ˆ γ (cid:96),s (cid:35) . Using the results in the proof of Theorems 1 and 2, and Assumption 4, we have P ( A > P (cid:32)(cid:34) − m − (cid:88) (cid:96) =1 K (cid:88) s =1 (ˆ γ (cid:96),s − γ (1) (cid:96),s )( γ (1) (cid:96),s − γ (2) (cid:96),s ) − m − (cid:88) (cid:96) =1 K (cid:88) s =1 ( γ (1) (cid:96),s − γ (2) (cid:96),s ) (cid:35) > (cid:33) = O ( B T T − ) . and P ( B >

0) = P (cid:32)(cid:34) − K (cid:88) s =1 (ˆ λ s − λ (1) s )( λ (1) s − λ (2) s ) − K (cid:88) s =1 ( λ (1) s − λ (2) s ) (cid:35) > (cid:33) = O ( B T T − ) . Since P ( D ,EnvSca − D ,EnvSca > ≤ P ( A >

0) + P ( B > , we have the desired results. References

Aggarwal, C. C. (2002), “On Eﬀective Classiﬁcation of Strings with Wavelets,” in

Proceedingsof the Eighth ACM SIGKDD International Conference on Knowledge Discovery and DataMining , New York, NY, USA: Association for Computing Machinery, KDD ’02, p. 163–172.Billingsley, P. (1961),

Statistical Inference for Markov Processes , University of Chicago Press.Brillinger, D. R. (2002),

Time Series: Data Analysis and Theory , Philadelphia: SIAM.26ai, M. and Guo, W. (2004), “Multivariate spectral analysis using Cholesky decomposition,”

Biometrika , 91, 629–643.D’Cruz, O. F. and Vaughn, B. V. (1997), “Nocturnal seizures mimic REM behavior disorder,”

American Journal of Electroneurodiagnostic Technology , 37, 258–264.Deshpande, M. and Karypis, G. (2002), “Evaluation of Techniques for Classifying BiologicalSequences,” in

Advances in Knowledge Discovery and Data Mining , eds. Chen, M.-S., Yu,P. S., and Liu, B., Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 417–431.Fahrmeir, L. and Kauifmann, H. (1987), “Regression models for nonstationary categoricaltime series,”

Journal of Time Series Analysis , 8, 147–160.Fokianos, K. and Kedem, B. (1998), “Prediction and classiﬁcation of nonstationary categor-ical time series,”

Journal of Multivariate Analysis , 67, 277–296.— (2003), “Regression theory for categorical time series,”

Statistical Science , 18, 357–376.Foldvary-Schaefer, N. and Alsheikhtaha, Z. (2013), “Complex nocturnal behaviors: Noctur-nal seizures and parasomnias,”

Continuum: Lifelong Learning in Neurology , 19, 104–131.Fryzlewicz, P. and Ombao, H. (2009), “Consistent classiﬁcation of nonstationary time seriesusing stochastic wavelet,”

Journal of the American Statistical Association , 104, 299–312.Goldberger, A., Amaral, L., Glass, L., Hausdorﬀ, J., Ivanov, P., Mark, R., Mietus, J., Moody,G., Peng, C.-K., and Stanley, H. (2000), “PhysioBank, PhysioToolkit, and PhysioNet:components of a new research resource for complex physiologic signals,”

Circulation , 101,e215–e220.Huang, H., Ombao, H., and Stoﬀer, D. (2004), “Discrimination and classiﬁcation of nonsta-tionary time series using the SLEX model,”

Journal of the American Statistical Associa-tion , 99, 763–774.Ifrim, G. and Wiuf, C. (2011), “Bounded Coordinate-Descent for Biological Sequence Classi-ﬁcation in High Dimensional Predictor Space,” in

Proceedings of the 17th ACM SIGKDD nternational Conference on Knowledge Discovery and Data Mining , Association for Com-puting Machinery, p. 708–716.Institute of Medicine (2006), Sleep Disorders and Sleep Deprivation: An Unmet Public HealthProblem , Washington, DC: The National Academies Press.Jurafsky, D. and Martin, J. (2009),

Speech and Language Processing. , Pearson EducationInternational, 2nd ed.Krafty, R. T. (2016), “Discriminant Analysis of Time Series in the Presence of Within-GroupSpectral Variability,”

Journal of Time Series Analysis , 37, 435–450.Krafty, R. T. and Collinge, W. O. (2013), “Penalized multivariate Whittle likelihood forpower spectrum estimation,”

Biometrika , 100, 447–458.Krafty, R. T., Xiong, S., Stoﬀer, D. S., Buysse, D. J., and Hall, M. (2012), “Envelopingspectral surfaces: covariate dependent spectral analysis of categorical time series,”

Journalof Time Series Analysis , 33, 797–806.Navarro, G. (2001), “A Guided Tour to Approximate String Matching,”

ACM ComputingSurveys , 33, 31–88.Rechtschaﬀen, A. and Kales, A. (1968),

A Manual of Standardized Terminology, Techniquesand Scoring System for Sleep Stages of Human Subjects , Washington DC: US GovernmentPrinting Oﬃce.Rosen, O. and Stoﬀer, D. (2007), “Automatic estimation of multivariate spectra via smooth-ing splines,”

Biometrika , 94, 335–345.Schenck, C. H., Bundlie, S. R., Ettinger, M. G., and Mahowald, M. W. (1986), “ChronicBehavioral Disorders of Human REM Sleep: A New Category of Parasomnia,”

Sleep , 9,293–308.Shumway, R. and Stoﬀer, D. (2016),

Time series analysis and its applications , Springer:New York, 4th ed. 28toﬀer, D., Tyler, D., and McDougall, A. (1993), “Spectral analysis for categorical timeseries: scaling and the spectral envelope,”

Biometrika , 80, 611–632.Stoﬀer, D. S., Tyler, D. E., and Wendt, D. A. (2000), “The spectral envelope and its appli-cations,”

Statist. Sci. , 15, 224–253.Terzano, M., Parrino, L., Sherieri, A., Chervin, R., Chokroverty, S., Guilleminault, C.,Hirshkowitz, M., Mahowald, M., Moldofsky, H., Rosa, A., Thomas, R., and Walters, A.(2001), “Atlas, rules, and recording techniques for the scoring of cyclic alternating pattern(CAP) in human sleep,”

Sleep Med , 2, 537–553.Tinuper, P. and Bisulli, F. (2017), “From nocturnal frontal lobe epilepsy to sleep-relatedhypermotor epilepsy: a 35-year diagnostic challenge,”

Seizure , 44, 87–92.Zepelin, H., Siegel, J., and Tobler, I. (2005),