Multitaper Spectral Estimation HDP-HMMs for EEG Sleep Inference
Leon Chlon, Andrew Song, Sandya Subramanian, Hugo Soulat, John Tauber, Demba Ba, Michael Prerau
MMultitaper Spectral Estimation HDP-HMMs for EEGSleep Inference
Leon Chlon ∗ MIT Department of BCSCambridge, MA 02139 [email protected]
Andrew H. Song ∗ MIT Department of EECSCambridge, MA 02139 [email protected]
Sandya Subramanian
Harvard-MIT HSTCambridge, MA 02139 [email protected]
Hugo Soulat
MIT Department of BCSCambridge, MA 02139 [email protected]
John Tauber
MIT Department of BCSCambridge, MA 02139 [email protected]
Demba Ba
SEAS Department of HarvardCambridge, MA 02138 [email protected]
Michael Prerau
MGH Department of Anesthesia, Critical Care and Pain Medicine55 Fruit st, GRJ 4, Boston, MA 02114 [email protected]
Abstract
Electroencephalographic (EEG) monitoring of neural activity is widely used forsleep disorder diagnostics and research. The standard of care is to manuallyclassify 30-second epochs of EEG time-domain traces into 5 discrete sleep stages.Unfortunately, this scoring process is subjective and time-consuming, and thedefined stages do not capture the heterogeneous landscape of healthy and clinicalneural dynamics. This motivates the search for a data-driven and principled way toidentify the number and composition of salient, reoccurring brain states presentduring sleep. To this end, we propose a Hierarchical Dirichlet Process HiddenMarkov Model (HDP-HMM), combined with wide-sense stationary (WSS) timeseries spectral estimation to construct a generative model for personalized subjectsleep states. In addition, we employ multitaper spectral estimation to furtherreduce the large variance of the spectral estimates inherent to finite-length EEGmeasurements. By applying our method to both simulated and human sleep data, wearrive at three main results: 1) a Bayesian nonparametric automated algorithm thatrecovers general temporal dynamics of sleep, 2) identification of subject-specific"microstates" within canonical sleep stages, and 3) discovery of stage-dependentsub-oscillations with shared spectral signatures across subjects.
During sleep, the brain displays highly heterogeneous cortical oscillatory dynamics consisting of acomplex interplay of numerous networks related to arousal and loss of consciousness [1, 2, 3]. Thecurrent clinical standard uses 30-second epochs of electroencephalogram (EEG) time-domain tracesto categorize brain state during sleep into 5 discrete sleep stages: wake, rapid eye movement (REM)sleep, and non-REM sleep, which consists of 3 sub-stages, notated NREM1 through NREM3 [4].The progression of these sleep stages through the night, called a hypnogram, is used to characterize ∗ These authors contributed equally to this work.Preprint. Work in progress. a r X i v : . [ s t a t . M L ] M a y leep architecture. However, as these stages are determined subjectively through visual inspection bytrained sleep technicians, this process is both time-consuming and inherently suffers from ∼ inter-scorer variability [1, 5], which is greatly exacerbated in the case of pathological sleep [6].The chief explanation for this variability is that sleep, in every dimension thus far studied, is acontinuous and dynamic process [7], which is cleaved into a low-resolution, discrete, state-spaceby clinical staging. Furthermore, the current clinical standard is based on features easily observedin the time-domain by eye, and limited to a crude time and state resolution that is designed toreduce variability and scoring time in subjective visual categorization, rather than to maximizeinformation content. Recent studies have highlighted the information-rich nature of neural oscillatorytime-frequency dynamics within the sleep EEG that can observed through the multitaper spectrogram[5]. While there is indeed a continuum of changes in the sleep EEG, the identification of relevantand recurring combinations of oscillatory activity is vital for enhancing our understanding of theinteraction of neural mechanisms underlying sleep, as well as pathophysiological deviations fromthe norm. It is therefore useful to devise a method for parcellating sleep, which is data rather thansemantically driven, incorporates time-frequency dynamics, and can determine the state resolutionwithout limitations imposed by the need for human scorers or a pre-assumed number of states.Unsupervised machine learning methods are ideal candidates for this class of problem. However,existing parametric approaches including Deep Belief Network - Hidden Markov Models (DBN-HMMs) [8], k-Nearest Neighbor (k-NN)-based methods [9], Conditional Random Fields (CRFs) [10]and others [11, 12] constrain the total number of states. To address this shortcoming, we proposea Bayesian nonparametric framework that integrates 1) the Hierarchical Dirichlet Process HiddenMarkov Model (HDP-HMM) [13] for the underlying state dynamics and, 2) spectral representationsand asymptotic properties of the wide-sense stationary (WSS) time series for the emission distributionof the generative model. The multitaper spectral estimation framework [14] is incorporated tofurther optimize the bias and variance of the estimates of spectral characteristics for each state,compared to the estimates based on simple Fourier coefficients of the observations. Finally, we usethe symmetrized form of the Kullback–Leibler (KL) divergence [15] to cluster the inferred statesacross our patient cohort, with interpretation provided by subject-matter experts. To our knowledge,ours is the first HDP-HMM to applied to human sleep EEG data, along with the multitaper framework.Our work is readily extensible beyond sleep inference to other domains of neuroscience, such asnonparametric state modeling of the neurophysiology underlying epilepsy or Parkinson’s disease. If the discrete time series, { y t } t , can be assumed to be realizations of a wide-sense stationary (WSS)stochastic process, we can apply several useful properties for parameter estimation and inference intime series analysis. WSS, or second-order stationarity, states that the mean and the autocovarianceof the stochastic process is invariant with respect to time t . In practice, since the long time series( ∼ several hours of data for our study) may demonstrate nonstationarity, we segment the data intomultiple non-overlapping windows, within which we assume WSS.Let N be the number of sample points for the entire time series and J the number of samples in eachwindow. Then, T= (cid:98) NJ (cid:99) denotes the number of windows for the data. With t = 1 , · · · , T as the indexfor a window, we define y t = { y ( t − J , · · · , y tJ − } as the samples in the window t . To analyze the discretely sampled time series with sampling rate F s in the frequency domain, we usethe Discrete Fourier Transform (DFT) [16]. Denoting y ( F ) t ∈ C J as the DFT coefficients of y t ∈ R J , y ( F ) t ( w j ) = 1 J J (cid:88) l =1 y ( t − J + l W ( j − l − J (1)where W J = exp (cid:0) − i πJ (cid:1) and w j = j − J for j = 1 , · · · , J is the normalized frequency. Theconversion between frequency in Hz, f j , and w j is given as f j = F s w j . We will use w j torefer the frequency index throughout this work. Also for notational convenience, we will use d Rej,t = Re (cid:8) y ( F ) t ( w j ) (cid:9) and d Imj,t = Im (cid:8) y ( F ) t ( w j ) (cid:9) . We now introduce the following lemma [17]:2igure 1: The generative model for the HDP-HMM and example spectrogram. Left : GraphicalModel of HDP-HMM. Top-Right : Multi-tapered spectrogram of 80 minutes of sleep EEG data.Bottom right : extracted PSD of 15 second windows. Proposition 1. If y t is a WSS time series, the DFT coefficients, d Rej,t and d Imj,t , are distributed asasymptotically independent normal, as J → ∞ , d Rej,t ∼ N (cid:16) , f ( w j )2 (cid:17) and d Imj,t ∼ N (cid:16) , f ( w j )2 (cid:17) (2)where f ( w j ) is the true underlying PSD of the stochastic process at w j [17] (for further explanation,see Supplementary Material Section 1). The independence property is the guiding principle for ourgenerative model in section 2.3 and clustering analysis in section 3.2. The spectral estimation problem, or the problem of estimating the true PSD, f ( w j ) , from observations(DFT coefficients), has received great attention. The simplest estimate, albeit with high bias andvariance, is (cid:98) f ( w j ) = (cid:107) y ( F ) t ( w j ) (cid:107) , the periodogram . Numerous methods have attempted to lowerboth bias and variance of (cid:98) f ( w j ) [18]. Multitaper spectral estimation [19] optimizes the reduction inbias and variance of the PSD estimate by applying a set of specific tapers, or windowing functions, tothe observed time series to obtain M pseudo-observations, { y ( m,F ) t ( w j ) } Mm =1 , y ( m,F ) t ( w j ) = 1 J J (cid:88) l =1 h ( m ) l y ( t − J + l W ( j − l − J where h ( m ) l ∈ R J is the m th taper, M being the number of tapers. The discrete prolate spheroidalsequences, which are mutually orthogonal with optimal energy concentration properties, are usedas tapers. With these tapers, the periodogram estimates formed from the pseudo-observations, {(cid:107) y ( m,F ) t ( w j ) (cid:107) } Mm =1 , are approximately uncorrelated [14]. Therefore, we can write the finalestimator, (cid:98) f MT ( w j ) , as, (cid:98) f MT ( w j ) = 1 M M (cid:88) m =1 (cid:107) y ( m,F ) t ( w j ) (cid:107) (3)Figure 1 shows an example of the multitapered spectrogram and PSD for a segment of sleep EEG. Our generative model, as depicted in Figure 1, follows an HDP-HMM framework [13]. The observa-tions, or DFT coefficients for each time window across frequency bands of interest, are generatedby the spectral representation emission model . We denote the spectral representation at each timewindow t as y ( m,F ) t ( w j ) , where t = 1 , · · · , T is the time window, m = 1 , · · · , M the tapers, and w j for j = 1 , · · · , J is the frequency index. 3 .3.1 Spectral Representation Emission Model We use the normal distribution in eq.(2) as the likelihood, p (cid:16) y ( F ) t (cid:12)(cid:12)(cid:12) θ ( s t ) (cid:17) , where θ ( s t ) = { f ( w ( s t ) j ) } j , (cid:2) d Re ,t , d Im ,t , · · · , d Re J ,t , d Im J ,t (cid:3) ∼ p (cid:18) y ( F ) t (cid:12)(cid:12)(cid:12) θ ( s t ) (cid:19) = N (cid:0) ¯ , Σ s t (cid:1) = J (cid:89) j =1 (cid:110) N (cid:16) , f ( w ( s t ) j )2 (cid:17)(cid:111) (4)where Σ s t = diag ( f ( w ( s t )1 ) , f ( w ( s t )1 ) , · · · , f ( w ( s t ) J ) , f ( w ( s t ) J )) ∈ R J × J . Note that the diagonalcovariance matrix is due to Proposition 1 in section 2.1. In practice, only indices that correspond tothe frequency of interest are used. For the tapered case, we assume that the observation y ( m,F ) t isgenerated by the same state-dependent covariance matrix Σ s t . p (cid:16) y ( m,F ) t (cid:12)(cid:12)(cid:12) θ ( s t ) (cid:17) = N (cid:0) ¯ , Σ s t (cid:1) for m = 1 , · · · , M (5)For the prior distribution on θ ( s t ) , we use the Inverse Gamma distribution, due to the conjugacybetween the normal likelihood and the Inverse Gamma prior. We model { y ( m,F ) t ( w j ) } Tt =1 as the realization of a generative process driven by a set of local latentvariables { s t } Tt =1 ∈ { , ..., K } that we term "sleep states". Sleep state modeling for each subjectproceeds via the following construction for our HDP-HMM: βββ ∼ GEM ( γ ) , πππ k | β ∼ DP ( α, βββ ) , b ( s t ) j ∼ Gamma ( a , b ) , (6) f ( w ( s t ) j ) ∼ IG ( a, b ( s t ) j ) , s t | s t − ∼ Multinomial ( π s t − ) , y ( m,F ) t | s t ∼ N (¯ , Σ s t ) (7)where IG denotes the Inverse Gamma distribution for each frequency and state specific f ( w ( s t ) j ) asoutlined in section 2.3.1. To capture the frequency-level characteristics, we place an uninformativeGamma prior over the Inverse Gamma hyperparameter b ( s t ) j . Following standard convention, βββ ∼ GEM ( γ ) denotes the stick-breaking construction for the Dirichlet Process (DP) given by β k = β (cid:48) k k − (cid:89) i =1 (1 − β (cid:48) i ) , (8)and β (cid:48) k ∼ Beta (1 , γ ) . { β k (cid:48) } Kk (cid:48) =1 represents a prior over the transition probabilities into a state k (cid:48) ,and πππ k = { π k,k (cid:48) } Kk (cid:48) =1 represents the transition probability from state k to k (cid:48) . Our HDP-HMM isfully nonparametric when we take the limit as K → ∞ in the GEM stick breaking construction.However, in our work, it is reasonable to truncate K since many poorly represented states will notnecessarily confer any additional neurophysiological context, especially considering the finite natureof the data. Finally, we place uninformative gamma hyperpriors over the HDP hyperparameters γ and α : γ ∼ Gamma ( α γ , β γ ) and α ∼ Gamma ( α α , β α ) . For model inference, we utilize beam sampling [20], which integrates slice sampling and dynamicprogramming to sample whole state trajectories from the model posterior. HDP-HMMs are infinite-dimensional by formulation, which complicates the computation of state trajectories using ordinary dy-namic programming schema. By iteratively sampling an auxiliary variable u t ∼ Uniform (0 , π s t − ,s t ) ,the beam sampler imposes a constraint π s t − ,s t ≥ u t on sampled state transitions p ( s t | y ( m,F )1: t , u t ) ,and consequently sets an upper bound on the set of dynamically computed state trajectories (fulldetails outlined in [20]). For the posterior distribution of f (cid:0) w ( s t ) j (cid:1) , we use following facts: 1) the4osterior distribution is also an Inverse Gamma distribution due to conjugacy and 2) there are M pseudo-observations for the DFT coefficients. p (cid:16) f (cid:0) w ( s t ) j (cid:1)(cid:12)(cid:12)(cid:12) { d Re, ( m ) j,k , d Im, ( m ) j,k } Mm =1 (cid:17) = 12 IG (cid:16) a j + M, b ( s t ) j + 12 M (cid:88) m =1 (cid:110)(cid:0) d Re, ( m ) j,k (cid:1) + (cid:0) d Re, ( m ) j,k (cid:1) (cid:111)(cid:17) (9)with a gamma prior over the cluster and frequency-specific parameter b ( s t ) j . Following subject-level inference, we perform posthoc clustering analysis to investigate the sharedset of states across different subjects. Since the HDP-HMM is modeled separately for each individual,there is no guarantee that any pair of states from different subjects share similar spectral characteristics.
Symmetric KL divergence as the distance metric
We use the symmetric KL divergence betweenthe Gaussian likelihoods [21], with the choice of the likelihood justified by the asymptotic normalityfrom
Proposition 1 . Let c = 1 , · · · , C , denote the cluster index and f c the spectral characteristic ofthe cluster c . Likewise, let (cid:98) f s p be the spectral characteristic of the state s p of the subject p . This ischosen to be the MAP estimate of p (cid:16) f s p (cid:12)(cid:12)(cid:12) θ (cid:17) in eq. (9). If the observation y ( F ) is generated from state s p , the log-likelihood is given as (modulo the constants) log p (cid:16) y ( F ) (cid:12)(cid:12)(cid:12) (cid:98) f s p (cid:17) = (cid:88) w j (cid:104) − log (cid:98) f s p ( w j ) − (cid:107) y ( F ) (cid:107) / (cid:98) f s p ( w j ) (cid:105) (10)If s p is clustered into cluster c , the log-likelihood with respect to f c would be log p (cid:16) y ( F ) (cid:12)(cid:12)(cid:12) f c (cid:17) = (cid:88) w j (cid:104) − log f c ( w j ) − (cid:107) y ( F ) (cid:107) /f c ( w j ) (cid:105) (11)Taking the expectation of the log-ratio of the likelihoods, we obtain both sets of KL divergence I ( (cid:98) f s p ; f c ) = E p ( y ( F ) | (cid:98) f sp ) (cid:34) log p ( y ( F ) | (cid:98) f s p ) p ( y ( F ) | f c ) (cid:35) = (cid:88) w j (cid:40) − log (cid:32) (cid:98) f s p ( w j ) f c ( w j ) (cid:33) + (cid:98) f s p ( w j ) f c ( w j ) − (cid:41) (12)with I ( f c ; (cid:98) f s p ) computed similarly. The symmetric KL divergence, J ( (cid:98) f s p ; f c ) , is defined as follows J ( (cid:98) f s p ; f c ) = I ( (cid:98) f s p ; f c ) + I ( f c ; (cid:98) f s p ) = (cid:88) w j (cid:40) (cid:98) f s p ( w j ) f c ( w j ) + f c ( w j ) (cid:98) f s p ( w j ) − (cid:41) (13) Weighted K-means clustering
With eq.(13) as the distance metric, we now perform K-meansclustering on all the states across the subjects, with f c as the centroid of the cluster c . Furthermore,we weight [22] J ( (cid:98) f s p ; f c ) by n s p , the number of occurrences of s p in subject p . n s p effectively actsas another covariate for clustering the heterogeneous states. For instance, high n s p would indicatethat the state is likely to be one of the canonical sleep stages (REM or NREM).We perform two additional preprocessing steps for the clustering analysis. First, we normalize powerwithin each state, (cid:98) f (cid:48) s p ( w j ) = (cid:98) f s p ( w j ) / (cid:80) w j (cid:98) f s p ( w j ) , to prevent the total power from affecting theclustering. The distance metric in Eq.(13) has a tendency to assign a high power state to a high powercluster, and vice versa, regardless of the relative power distribution between frequency bands. Afternormalization, the algorithm is able to cluster the heterogeneous states based on a shared spectralsignature. As a second procedure, we take the median value of n s p across posterior samples as therepresentative number of occurrences for the specific state for the patient. We observe that after theburn-in period, n s p is quite consistent, which justifies our choice of the median. Simulated Sleep-Inspired Data
To test the robustness of our model, we simulate EEG time serieswith realistic spectral content and temporal dynamics. We do not try to reproduce the extremely5igure 2: Simulated Data. For each time window, an EEG signal was generated according to thecurrent hidden state and its corresponding state-space model parameters. A typical EEG trace forone state is depicted on the top left panel. Its multitapered PSD is plotted on the top right panel(orange). The theoretical PSD (black) is determined by the state-space model parameters. Thesimulated and sampled state trajectories are presented on the bottom left panels. The 5 differentstates alternate according to a predefined transition matrix, and the difference between simulated andsampled transition matrices are represented on the bottom right panel.complex short-time dynamics of sleep EEG; rather our simulated data gauges the model’s capacity toextract a set of discrete spectral characteristics from time-series data. Simulated data is generated bycombining the oscillation components time series decomposition method [23] and the sleep dynamics[5]. Our state-space model is described as follow. For l = 1 , · · · , Jx l +1 = Rx l + v l ∈ R D , y l = (1 0 1 ... x l + w l ∈ R (14)where D is the number of frequency components, v t ∼ N (¯ , Q ) and w t ∼ N (0 , σ r ) . We define R as a block diagonal matrix composed of 2 by 2 rotation matrices R i where F s is the samplingfrequency, and f i is a frequency (in Hz) of interest in the range i = 1 , ..., D and : R i = a i (cid:18) cos (2 πf i /F s ) sin (2 πf i /F s ) sin (2 πf i /F s ) − cos (2 πf i /F s ) (cid:19) where < a i < , (15)Our observation PSD is the sum of the observation noise w t and the PSD of each oscillation. Forany given frequency, the latter peaks around f i and is fully determined by a i , f i and Q i, i (the i th diagonal element of Q) [23]. Specific details on frequency bands modeled and the associated PSD fora typical 15s window of simulated data are available in the Supplementary Material Section 2. Sleep Study Data
Our experimental data consists of 200Hz high-density (64-channel) EEG record-ings from nine healthy right-handed subjects (full clinical details in the original study [5]). Weused the occipital lobe (channel O1) recordings in accordance with a subject-matter expert, whoselected the channel to assign canonical sleep stages to 30-second windows. The observations forour model consist of 15-second windows, intended to delineate high frequency sleep phenomenasuch as sleep ripples and spindles from the signal. Windows containing a total power greater than the95th percentile of the entire signal were rejected from further analysis. Observation data consistedof multitapered spectral observations (5 tapers, Time Bandwith=4) in [0 . − . Hz, [2 . − . Hz, [4 . − . Hz, [6 . − . Hz, [10 . − . Hz, and [12 . − Hz, where the increased resolution atthe slow and alpha frequencies is believed to assist the stratification of the NREM1, NREM2, andNREM3 sleep stages [5]. Specifically, for each tapered window, we take the DFT and examine thepower in each frequency band to determine the frequency index corresponding to the median. DFTcoefficients corresponding to the frequency index are selected as the spectral observations in thefreqency band. 6igure 3: Top left: A multitaper spectogram across the entire sleep study duration for Subject 5.Middle left: A state trajectory chosen from the posterior distribution samples of the HDP-HMMaccording to the median Spearman rho estimate. Bottom left: Sleep expert hypnogram. Topright: Histogram of Spearman’s rho values computed between all sampled state trajectories and thehypnogram, where the green vertical line highlights the median of the distribution. Bottom right: Theestimated power spectral density for each state in the state trajectory.
For each run of the beam sampler, we initialize the following HDP hyperprior distributions: γ ∼ Gamma (1 , , α ∼ Gamma (1 , and b ( k ) j ∼ IG(1,1). We "burn-in" the first 2000 poste-rior distribution samples, and draw a subsequent 100 samples for the simulation and 1000 samplesfor the real data with a step size of 50 to minimize inter-sample autocorrelation.
Simulated Sleep-Inspired Data
A typical simulation result is presented in Figure 2 in which wesimulate 2000, 15-second time windows. For each sample from the posterior, our model successfullyrecovers the correct number of states and the exact state trajectory. Furthermore, the ground truthtransition matrix is sampled almost perfectly. Finally, our model recovers the discrete spectralcharacteristics of the different simulated states. The estimated covariances matrices correspond tothe median of the theoretical PSD for each frequency band. As expected, the use of the multitaperframework resulted in a smaller variance in the estimated PSD when compared to a standard DFTapproach. More detailed comparison can be found in the Supplementary Material Section 2.
Human Subject EEG
Hypnograms illustrate canonical sleep stage traces in terms of descendinglevels of arousal, where a 5 on the hypnogram corresponds to the highest level of arousal (wake), and a1 to the lowest (NREM3). For comparison, our state labels are also reordered, using descending state-wise alpha band power as a proxy for arousal. Spearman’s rho ρ t is used to compute the similarity intemporal dynamics between sampled state trajectories and the hypnogram. We present Subject 5 as acase study in Figure 3, where all sampled state trajectories are used to construct a distribution over ρ t values. The distribution is tightly localized around a median value of ρ t = 0 . , demonstrating thatthe model effectively captures the transition dynamics in the hypnogram. Remarkably, transitionsbetween NREM and REM states in the hypnogram are mirrored by similar transitions between slowstates S1, S2 and S3 to REM-like states such as S5, S7 and S8 in the state trajectory. Similarly,inter-NREM transitions in the hypnogram coincide with synonymous transition events in the statetrajectory between S1, S2 and S3. Our model introduces additional "micro"-states beyond the standardcanonical sleep stages, typically coinciding with more volatile dynamics in the spectrogram. This canclearly be discerned in Figure 3, where epochs of REM in the hypnogram are broken into alternatingS8/S7 (characteristic REM PSDs) and S5 (slow dynamics dominated PSD) in the state trajectory. Thisis unsurprising considering that a) sleep consists of highly heterogeneous and fluctuating oscillatory7igure 4: Detailed Cluster Analysis for Two Subjects. Top: Heatmaps illustrating the proportion oftime spent in each sleep expert scored stage belongs to each cluster. Bottom: The cluster trajectorieson the right show the temporal dynamics of the clusters compared to the sleep expert scored stages.dynamics, b) current defined canonical sleep stages are unable to account for inter-subject variabilityand c) both additional and combined sleep stages have been reported [24, 5]. When considering thevariation introduced from additional subject-specific micro-states, the general temporal dynamics arerecovered well across all the subjects with an average distribution median value of ρ t = 0 . . The clustering analysis aims to group states based on the underlying hypothesis that every subject hasboth "REM-like" and "NREM-like" states; however, their specific spectral characteristics may vary.Across the nine subjects, a total of 103 states and nine clusters were recovered. The HDP-HMMrecovered sleep states for each subject are then mapped to their respective clusters. Finally, the clustertrajectory and hypnogram are compared for each subject. We discuss their spectral characteristics inthe Supplementary Material Section 3.1.Although nine clusters were identified, not all clusters were visited by each subject. An exampleof the heterogeneity in cluster dynamics across subjects is shown for two subjects in Figure 4. Theheatmaps link the proportion of time spent in each sleep expert scored stage to the time spent ineach cluster. The clusters are organized from bottom to top in order of increasing power in the alphaband (8-12 Hz). For example, for Subject 3, the fourth column (REM) indicates that 90 percentof the windows labeled REM by the sleep expert were classified as belonging to Cluster 8 and theremaining 10 percent as belonging to Cluster 7. The cluster trajectory plots in Figure 4 show thetemporal dynamics of the clusters compared to the sleep expert scored stages.There are several noteworthy observations. Firstly, across all subjects, including the two in Figure4, there is a clear shift towards increasing power in the alpha band from deep sleep (NREM3) tolighter sleep (NREM1), REM, and then wake. This is evidenced by increasing proportions of timespent in higher clusters going from left to right on the heatmaps, which agrees with known sleepstage physiology ([5]). Secondly, certain clusters are dominant during the same scored sleep stageacross subjects, while others modulate their behavior between subjects. For example, across all ninesubjects, Clusters 1 and 6 are only dominant during the deepest sleep states (NREM3 and NREM2),while Clusters 3,5 and 9 are only dominant during wake states. However, Cluster 4 is dominantduring NREM2, NREM1, and REM in different subjects, which suggests that each person may havea slightly different spectral signature for the same scored sleep stage and that certain combinations ofnetwork activity may not be relegated solely to one stage. (For more details on cluster dominanceduring scored sleep stages, see Table S1 in the Supplementary Material Section 3.2.)8inally, both the heatmaps and cluster trajectories demonstrate that there are faster sub-oscillationsoccurring within each sleep expert scored stage across subjects. However, these sub-oscillationsoccur at different times across subjects. For example, Subject 3 oscillates rapidly between clustersduring deep sleep (NREM3), but is very stable during REM. On the other hand, Subject 7 is stableduring deep sleep but oscillates more during lighter stages of sleep and wake. This stage-dependentoscillatory heterogeneity can been seen clearly in Table S2 (Supplementary Material Section 3.2),which has the computed transition rates per minute for each sleep stage for both subjects. In abroader context, stage-dependent sub-oscillations may help describe heterogeneity across subjectsand populations.
Overall, this method provides a robust, Bayesian nonparametric framework for identifying the spectralcontent, number, and dynamics of salient recurring oscillatory states during sleep across patients. Inaddition, we identify subject-specific "microstates" within canonical sleep stages and furthermore,discover stage-dependent sub-oscillations with shared spectral signatures across subjects. This workcan serve as a basis for novel mechanistic studies focusing on the network dynamics of these states,as well as their clinical and scientific relevance. By liberating sleep from the practical necessitiesof human scoring and a priori assumptions of machine learning algorithms, we pave the way for ahigher dimensional, data-driven feature space for biomarker detection and clinical intervention.9 eferences [1] Madeleine M Grigg-Damberger. The AASM Scoring Manual four years later.
Journal of clinical sleepmedicine : JCSM : official publication of the American Academy of Sleep Medicine , 8(3):323–32, 6 2012.[2] A. L. Loomis, E. N. Harvey, and G. A. Hobart. Cerebral states during sleep, as studied by human brainpotentials.
Journal of Experimental Psychology , 21(2):127–144, 1937.[3] Ritchie E. Brown, Radhika Basheer, James T. McKenna, Robert E. Strecker, and Robert W. McCarley.Control of Sleep and Wakefulness.
Physiological Reviews , 92(3):1087–1187, 7 2012.[4] Anthony Kales.
A manual of standardized terminology, techniques and scoring system for sleep stages ofhuman subjects . United States Government Printing Office, Washington DC, 1968.[5] Michael J. Prerau, Ritchie E. Brown, Matt T. Bianchi, Jeffrey M. Ellenbogen, and Patrick L. Purdon. SleepNeurophysiological Dynamics Through the Lens of Multitaper Spectral Analysis.
Physiology , 32(1):60–92,1 2017.[6] Michael H Silber, Sonia Ancoli-Israel, Michael H Bonnet, Sudhansu Chokroverty, Madeleine M Grigg-Damberger, Max Hirshkowitz, Sheldon Kapen, Sharon A Keenan, Meir H Kryger, Thomas Penzel, Mark RPressman, and Conrad Iber. The visual scoring of sleep in adults.
Journal of clinical sleep medicine :JCSM : official publication of the American Academy of Sleep Medicine , 3(2):121–31, 3 2007.[7] Robert D. Ogilvie. The process of falling asleep.
Sleep Medicine Reviews , 5(3):247–270, 6 2001.[8] Martin Längkvist, Lars Karlsson, and Amy Loutfi. Sleep Stage Classification Using Unsupervised FeatureLearning.
Advances in Artificial Neural Systems , 2012:1–9, 7 2012.[9] Salih Güne¸s, Kemal Polat, and ¸Sebnem Yosunkaya. Efficient sleep stage recognition system basedon EEG signal using k-means clustering based feature weighting.
Expert Systems with Applications ,37(12):7922–7928, 12 2010.[10] Gang Luo and Wanli Min. Subject-adaptive real-time sleep stage classification based on conditionalrandom field.
AMIA ... Annual Symposium proceedings. AMIA Symposium , pages 488–92, 10 2007.[11] I. Gath, C. Feuerstein, and A. Geva. Unsupervised classification and adaptive definition of sleep patterns.
Pattern Recognition Letters , 15(10):977–984, 10 1994.[12] Genshiro A. Sunagawa, Hiroyoshi Séi, Shigeki Shimba, Yoshihiro Urade, and Hiroki R. Ueda. FASTER:an unsupervised fully automated sleep staging method for mice.
Genes to Cells , 18(6):502–518, 6 2013.[13] Matthew J Beal, Zoubin Ghahramani, and Carl Edward Rasmussen. The Infinite Hidden Markov Model.
Advances in Neural Information Processing Systems 14 , pages 577–584, 2002.[14] Behtash Babadi and Emery N. Brown. A Review of Multitaper Spectral Analysis.
IEEE Transactions onBiomedical Engineering , 61(5):1555–1564, 5 2014.[15] Kullback. The Kullback-Leibler Distance.
The American Statistician , 41(4), 1987.[16] Alan V. Oppenheim and Ronald W. Schafer.
Discrete-time signal processing . Pearson, 2010.[17] David R. Brillinger and Society for Industrial and Applied Mathematics.
Time series : data analysis andtheory . Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia,PA 19104), 2001.[18] Donald B. Percival and Andrew T. Walden.
Spectral analysis for physical applications : multitaper andconventional univariate techniques . Cambridge University Press, 1993.[19] D.J. Thomson. Spectrum estimation and harmonic analysis.
Proceedings of the IEEE , 70(9):1055–1096,1982.[20] Jurgen Van Gael, Yunus Saatci, Yee Whye Teh, and Zoubin Ghahramani. Beam Sampling for the InfiniteHidden Markov Model.
Proceedings of the 25th International Conference on Machine Learning , pages1088–1095, 2008.[21] Robert H. Shumway and David S. Stoffer.
Time Series Analysis and Its Applications . Springer Texts inStatistics. Springer International Publishing, Cham, 2017.[22] G. C. Tseng. Penalized and weighted K-means for clustering with scattered objects and prior informationin high-throughput biological data.
Bioinformatics , 23(17):2247–2255, 9 2007.
23] Takeru Matsuda and Fumiyasu Komaki. Multivariate Time Series Decomposition into Oscillation Compo-nents.
Neural Computation , 29(8):2055–2075, 8 2017.[24] Kyle Ulrich, David E Carlson, Wenzhao Lian, Jana Schaich Borg, Kafui Dzirasa, and Lawrence Carin.Analysis of Brain States from Multi-Region LFP Time-Series.
Advances in Neural Information ProcessingSystems 27 , pages 2483–2491, 2014., pages 2483–2491, 2014.