Brain Waves Analysis Via a Non-parametric Bayesian Mixture of Autoregressive Kernels
Guillermo Granados-Garcia, Mark Fiecas, Babak Shahbaba, Norbert Fortin, Hernando Ombao
MMODELING BRAIN WAVES AS A MIXTURE OF LATENT PROCESSES B Y G UILLERMO G RANADOS -G ARCIA , M ARK F IECAS B ABAK S HAHBABA N ORBET F ORTIN AND H ERNANDO O MBAO King Abdullah University of Science and Technology, [email protected];[email protected] University of Minnesota, * mfi[email protected] University of California Irvine, [email protected]; [email protected]
The standard approach to analyzing brain electrical activity is to exam-ine the spectral density function (SDF) and identify predefined frequencybands that have the most substantial relative contributions to the overall vari-ance of the signal. However, a limitation of this approach is that the pre-cise frequency localization and bandwidth of oscillations vary with cognitivedemands, thus ideally should not be defined a priori in an experiment. Inthis paper, we develop a data-driven approach to identifies (i) the numberof prominent peaks, (ii) the frequency peak locations, and (iii) their corre-sponding bandwidths (or spread of power around the peaks). We propose aBayesian mixture auto-regressive decomposition method (BMARD), whichrepresents the standardized SDF as a Dirichlet process mixture based on akernel derived from second-order auto-regressive processes characterized bylocation (peak) and scale (bandwidth) parameters. We present a Metropolis-Hastings within Gibbs algorithm to sample from the posterior distribution ofthe mixture parameters. Simulation studies demonstrate the robustness andperformance of the BMARD method. Finally, we use the proposed BMARDmethod to analyze local field potential (LFP) activity from the hippocampusof laboratory rats across different conditions in a non-spatial sequence mem-ory experiment to examine the link between specific patterns of activity andtrial-specific cognitive demands.
1. Introduction.
Considerable research indicates that the hippocampus — a brain re-gion highly conserved across mammals — plays a key role in our ability to remember theorder in which daily life events occur (Eichenbaum, 2014). To identify the neuronal mech-anism underlying this capacity, we have conducted an experiment in which neural activitiesare recorded in the hippocampus of rats as they perform a complex nonspatial sequence mem-ory task. Visualization of the LFP activity in Figure 1 revealed a highly dynamic pattern ofhippocampal oscillations during task performance detailed on Section 4, reflecting the dis-tinct cognitive demands at different moments in time. Notably, the specific frequencies andbandwidth of the observed oscillations did not map well with standard predefined frequencybands (delta, theta, alpha, beta and gamma bands). Our goal in this paper is to develop astatistical method to quantify the spectral properties of the LFPs, in particular identify thefrequency peaks and bandwidth, and link them with specific types of information processing.
Keywords and phrases:
Spectral Density estimation, Bayesian nonparametrics, local field potentials, DirichletProcess, Markov Chain Monte Carlo. a r X i v : . [ s t a t . M E ] F e b F IG . LFP signals recorded from the 5 rats during a odor sequence recognition experiment. We considered twophases of a trial: before the odor is presented="PreSeq" vs. after the odor is delivered="Odor". Thee are twotrial types: odor in a correct specified order = "InSeq" vs. odor presented in incorrect order="OutSeq" The focus of the present work on estimating the spectral density function (SDF). The SDFis a good descriptive of any stochastic process because it quantifies the amount of variabilityin a signal (such as the LFP) that are explained by the different frequency bands (Shumwayand Stoffer, 2017; Kass et al., 2014; Ombao et al., 2017; Prado and West, 2010). There aretwo general classes of methods for estimating the SDF. One class based on the time domainwhere a parametric model, typically an autoregressive moving average (ARMA), is fit to thedata and an estimate of the SDF is obtained by plugging in the estimated ARMA parameters.Another class is nonparametric and is based on smoothing or denoising the observed peri-odograms. The main limitations of this standard approach of having pre-defined frequencyand the existing estimation methods are the following: (1) a lack of direct connection of theSDF estimators parameters to the time domain properties of the signal; (2) imprecise identi-fication of frequency location of the spectral peaks; and (3) the reliance on having to increasethe number of basic components of a model leading to an unnecessarily more complex repre-sentation of the estimator to ensure minimizing an error criteria, implying a less parsimoniousrepresentation of the SDF. Simpler representations of the final SDF estimators lead to easierinterpretations for practitioners to address their scientific questions.To overcome these limitations, we developed a nonparametric method, the Bayesian mix-ture auto-regressive decomposition (BMARD), by setting a Dirichlet process prior on thestandardized SDF leading to a characterization as a weighted mixture of kernels. We proposea kernel derived from the standardized SDF of a second order autoregressive process whichleads to a representation of the signal as a weighted mixture of latent second order autore-gressive processes where the number of components in the mixture is an unknown parameterin the model.The remainder of this paper is organized as follows. In Section 2, we describe our modelon Bayesian non-parametric framework through a Dirichlet Process prior and proposed a newkernel based on the standardized SDF of a second-order autoregressive (AR(2)) process. Wedemonstrate the robustness of the BMARD method to model misspecification by simulation
ODELING BRAIN WAVES AS A MIXTURE OF LATENT PROCESSES studies in Section 3 when the target is the SDF of a process that is not a true mixture of AR(2)processes (e.g., ARMA, higher order AR, MA). We also conducted a realistic simulation set-ting where the observed simulated LFP is a mixture of AR(2) latent processes with peaks thatwere actually observed in the the LFP signals from the laboratory rats. In Section 4, we ana-lyze the hippocampal LFP signals of 5 laboratory rats using the BMARD method and addresssubstantive questions in a non-spatial working memory experiment. The MCMC algorithmfor obtaining posterior samples of the BMARD estimator is described in the Appendix.
2. Bayesian Mixture Auto-regressive Decomposition.
The standard analysis for elec-trophysiological signals examines spectral power at frequency bands that are predefined,namely, delta (0.5-4 Hertz), theta (4-8 Hertz), alpha (8-12 Hertz), beta (12-30 Hertz), gamma( >
30 Hertz). The power at the delta band indicates the contribution of slow oscillations to thetotal variance of the signal; whereas the gamma power is the contribution of fast oscillations.The current segmentation of the frequency range into the delta-to-gamma frequency bands isad-hoc and is primarily driven by pragmatic considerations (Buzsáki, 2009). However, in cur-rent research, many neuroscientists now consider some of these bands to be too wide to havethe required level of precision in order to identify differences between stimulus types andbetween patient or treatment groups. We observe a greater demand for more precise analyseswith finer subdivisions within the bands for example as "low"-alpha and "high"-alpha (seeAllen et al. (2016), Gao et al. (2016)). Moreover, different frequency bands are predefinedaccording to the species under study. The proposed BMARD method will produce spectralestimates with peaks and bandwidths that are determined by the data – rather than arbitrarilydefined by the standard bands.2.1.
Overview of spectral analysis.
To develop the specific ideas on our proposed ap-proach, we first give a brief overview. Consider a time series X t that is stationary withinan epoch with zero mean and autocovariance sequence γ ( h ) = E ( X t + h X t ) that is abso-lutely summable, i.e., (cid:80) h | γ ( h ) | < ∞ . The SDF (within this stationary epoch) is formallydefined as f ( ω ) = (cid:80) h γ ( h ) exp( − i πωh ) where ω ∈ ( − . , . . Consider now the ob-served signal within an epoch { X t } Tt =1 (where T is even). A nonparameteric estimate ofthe SDF f ( ω ) derived from the observed signal is derived from the Fourier periodograms I ( ω k ) = T | (cid:80) Tt =1 X t exp( − i πω k t ) | which is computed at the fundamental frequencies ω k = kT (where k ∈ {− ( T − . . . , T } . The periodogram is an asymptotically unbiased forthe SDF. However, it is not a consistent estimator because it is extremely noisy and its vari-ance does not decay to 0 even when the length T of the observed time series increases.One way to construct a consistent estimator for the SDF f ( ω ) is by smoothing (or de-noising) the periodogram. Several nonparametric methods have been proposed. Bandwidthselection methods for kernel and spline smoothing have been developed for this approach(see for example, Lee (1997), Ombao et al. (2001), and Wahba (1980)). These nonparamet-ric methods aim to find a spectral estimator that minimizes a well-defined global criterionsuch as complexity-penalized deviance or integrated mean squared error. In addition, Kraftyand Collinge (2013) proposed a Whittle likelihood based approach. In Krafty et al. (2011), afunctional mixed models approach was developed to account for the variation of the spectrain the setting with multiple time series data.An alternative class of methods use a Bayesian framework. In Cadonna et al. (2017),a Bayesian method was developed for estimating the log-SDF, denoted (cid:96) ( f ( ω )) , which ismodeled as a mixture of Gaussian distributions with frequency-specific means and logisticweights. A related approach, using the ideas in Bayesian nonparametric methods, uses ker-nels that are based on the Bernstein polynomial (BP). This was first proposed in Petrone (1999) to estimate a probability density function. The ideas were transported to spectral den-sity function estimation in Choudhuri et al. (2004) and Hart et al. (2018), where the estimatoruses a Dirichlet process (DP) mixture model with BP kernels. This was extended to the multi-variate time series in Macaro and Prado (2014) which gives a decomposition in the frequencydomain in terms of BP-DP approximation. Furthermore, Edwards et al. (2019) generalize theBP kernels with a procedure using B-splines prior thereby reducing the L -error. In Kirchet al. (2019), the BP-DP prior is applied to spectral estimation with a nonparametric correc-tion to the Whittle likelihood.We note that mixture models based on Bernstein polynomials have the advantage thatthey are able to provide consistency in the pseudo posterior estimates. However, a generallimitation of these methods is that they do not offer a data-generating mechanism in the timedomain and tend to either oversmooth the peaks of the spectral estimates or require a highnumber of polynomials to achieve certain levels of accuracy.A recent advancement on spectral estimation is the evolutionary state-space model (E-SSM) proposed in Gao et al. (2016). The E-SSM offers a representation of the time seriesas a mixture of second-order auto-regressive processes. The AR(2) SDF, derived from thephase of the complex-valued roots of the AR(2) polynomial function, are assumed to beconstant in time. A property of this method is that the discrepancy between the true SDF andthe approximate SDF derived from the AR(2) mixture vanishes by increasing the number ofcomponents. However, in practice requires to fix the number of components and their phasesand then, fit the optimal modulus of the AR(2) characteristic roots.In some practical scenarios, can be necessary to consider additional components in orderto achieve an smoother SDF estimator, as we show in the simulation study for a movingaverage process. However, in neuroscientific applications is a better solution to describe thepeak activity by giving an estimate of the frequency peaks location of the true spectral density.Nonparametric methods are flexible but their main disadvantage is that they are difficult touse for prediction and generally do not have a straightforward data-generating mechanism.The parametric approach, on the other hand, is efficient but may not be always directly justi-fied from the underlying brain physiology and may suffer from model misspecification. Here,we propose BMARD as a Bayesian nonparametric approach for spectral density estimationthat combines the best of both worlds.Our proposal, the BMARD, is a new statistical approach which decomposes a signal asa mixture of stochastic processes considered as building blocks. Our approach provides aframework that describes precisely each specific oscillatory content in the signal. More pre-cisely, we model the standardized SDF as a multimodal probability density function by aDP-mixture model with kernels derived from the standardized SDF of second auto-regressiveprocesses. The advantage of the proposed BMARD is that it data-adaptively provides an es-timate of the number of latent processes each with a unique location and scale parametersmatching a single peak of the target standardized SDF. Unlike standard methods for spec-tral estimation, the BMARD method can precisely estimate the frequencies that produce thehighest peaks in the standardized SDF without the need to constrain them to pre-specifiedbands, and determines how wide are each of these peaks through the bandwidths. This willprovide the neuroscientist with a more informative estimator of the oscillatory activity ofbrain signals and thus find a more direct mapping between physiology and animal behavior.2.2. Dirichlet Process Mixture Model.
Denote X t to be a zero-mean weakly station-ary time series with SDF f ( ω ) . Whittle (1957) proposed a quasi-likelihood of the joint dis-tribution of the periodogram values at frequency ω k , denoted I ( ω k ) , expressed as the log-likelihood (cid:96) ( f | x , x , . . . , x T ) = (cid:98) ( T − / (cid:99) (cid:88) k =1 − log( f ( ω k )) − I ( ω k ) /f ( ω k ) . (1) ODELING BRAIN WAVES AS A MIXTURE OF LATENT PROCESSES Where ω k = 2 πk/T, k ∈ { , . . . , ( T / − } . This quasi log-likelihood is based on theasymptotic property of the { I ( ω k ) } being approximately jointly distributed as uncorrelatedexponential random variables with E ( I ( ω k )) ≈ f ( ω k ) when T is sufficiently large.The formulation of the model, based on the periodogram values as input data, assumes that f ∼ F , where F is a probability measure specified given a parameter vector θ with unknownrandom probability measure G associated to a Dirichlet process prior as follows: I ( ω k ) ∼ f ( ω k ) exp ( − I ( ω k ) /f ( ω k )) ,f ( ω k ) | θ ∼ F ( θ ) ,θ | G ∼ G,G ∼ DP ( G , α ) . The prior DP ( G , α ) refers to the Dirichlet process (Ferguson, 1973; Antoniak, 1974; Sethu-raman, 1994) with parameter α and probability measure G . Then for any finite partition ofmeasurable sets ( S , S , . . . , S k ) the probabilities ( G ( S ) , . . . , G ( S k )) have a Dirichlet priorwith parameters ( αG ( S ) , . . . , αG ( S k )) . In our case, due to the symmetry of the SDF, weconsider a partition over the interval (0,.5). The parameter α is a scale parameter of the DPthat gives an indication on the number of estimated components since low values of α leads toa posterior distribution of G dominated by a few components (see Müller et al. (2015); Shah-baba and Neal (2009)). The prior G is called a base measure and is associated to the priordistributions of the components of θ . Other related work (e.g., Gelman et al. (2013), Cong-don (2007), Neal (2000), Shahbaba and Neal (2009)) describe, in more detail, the role of α indetermining the estimated number of components. In the implementation, the Dirichlet pro-cess is truncated up to C components with a prior distribution over the positive integers, alsocalled the truncated Dirichlet process (TDP), introduced in Ishwaran and Zarepour (2000).Note that in the usual DP-based clustering model, individual observations are assigned toeach cluster with some probability. In contrast, our goal here is to estimate the standardizedSDF f ( ω ) (cid:82) f ( ω ) dω using the periodograms obtained via the Fourier transform of the standardizedtime series to have variance 1. These periodograms are not being "clustered"; rather, they areassociated with a convex combination of kernels.The next section constructs a kernel that is based on a parametric time series model, wherethe shape of the standardized SDFs is a single peak represented in terms of a location (fre-quency peak) and scale (frequency bandwidth) parameter in the frequency domain. The newkernel allows us to specify the standardized SDF distribution, F , in the DP mixture model.2.3. Autoregressive Kernel and the DP mixture.
In the proposed model, we construct akernel, g , as the standardized spectral density function (SDF) of an AR(2) process, leadingto representation of the observed time series as a linear mixture of multiple latent stochasticAR(2) processes.A weakly stationary process Z t is said to be autoregressive of order 2, AR(2), if it has therepresentation Z t − φ Z t − − φ Z t − = W t , where W t is a white noise process with variance σ W and the roots of the AR(2) polynomial function Φ( u ) = 1 − φ u − φ u do not lie on theunit circle. Furthermore, when the roots of Φ( u ) have magnitudes greater than 1 this AR(2)process is causal. When the roots, denoted as u and u , are non-real complex-valued, thenthey are related to be complex-conjugates of each other, that is, u = u ∗ and | u j | > ensurecausality of the process. Under causality (and if the roots are non real-valued), then both rootshave the polar representation u = M exp( i πψ ) and u = M exp( − i πψ ) with magnitude M > . The AR(2) polynomial function above is completely character-ized either by the coefficients ( φ , φ ) or by the roots ( u , u ) through the following one − to − one relation between the roots and the coefficients in terms of the log-modulus L = log( M ) > :(2) φ = 2 cos(2 πψ ) exp( − L ) , φ = − exp( − L ) ψ ∈ ( − / , / L > . Due to symmetry of the SDF f ( ω ) at , it is sufficient to specify the standardized SDF onlyat the frequency range ω ∈ (0 , / . In order to represent the SDF of Z t as a valid probabilitydensity function, it must integrate to 1 which is achieved by scaling by (or dividing by) (cid:82) / f ( ω ) dω = V ar ( Z t ) / (1 − φ ) σ W φ )((1 − φ ) − φ ) . Then the standardized SDF is defined as(3) g ( ω ; ψ, L ) = 2(1 − e − L )((1 + e − L ) − (2 πψ ) e − L )(1 + e − L ) | − πψ ) e − L ( e − i πω ) + e − L ( e − i πω ) | where ω, ψ ∈ (0 , / and L > . Here, ψ is the location parameter of the kernel that attains alocalized peak of the SDF of Z t . The role of L is to control the spread of the kernel as a scaleparameter and hence it will be called the bandwidth parameter. In Figure 2, we illustrate theroles of ψ and L in producing the different kernels. F IG . Examples of the AR(2) kernels g ( ω ; ψ, L ) valued at different parameters values of ψ = 0 . , . , . aslocation (frequency peak) parameter and L = 0 . , . as scale (bandwidth) parameter. A more formal justification for the selection of a second order autoregressive model isbased on structural modeling of time series ( see Ozaki (2012), Brockwell and Davis (1986),Shumway and Stoffer (2017)), where an AR(p) process is represented in such a way that isequivalent to a consecutive input-output system of simpler processes. If the p roots of its char-acteristic equation consist of p real roots and p pair of complex roots, then the characteristicequation is expressed as (Λ − λ )(Λ − λ ∗ ) ... (Λ − λ p )(Λ − λ ∗ p ) ... (Λ − λ p +1 ) ... (Λ − λ p + p ) ,where ∗ denotes the complex conjugate. Then the AR(p) can be represented through p + p consecutive input-output system composed by p AR(1) processes and p AR(2) processes.Each pair of roots associated to an AR(2) process are constructed with following coeffi-cients φ ( i )1 = λ i + ¯ λ i and φ ( i )2 = −| λ | for i ∈ , ..., p , while the real root associated to anAR(1) has coefficient φ ( j )2 = λ j for j ∈ p + 1 , ..., p + p . Our method propose to approxi-mate the AR(1) components by AR(2) processes based on the the flexibility to fit simple SDFillustrated previously by estimating appropriate values for ψ and M . ODELING BRAIN WAVES AS A MIXTURE OF LATENT PROCESSES This representation along with the results in Shumway and Stoffer (2017), it follows that ingeneral the SDF of a weakly stationary process can be approximated by a mixture of AR(2)as latent factors, equivalent to a parallel structural model, furthermore, this mixture locateand fit the shape of the peak activity which describes another advantage of our proposedmethod since does not use pre-specified frequency bands but will data-adaptively identifypeaks along the whole frequency range.In general for an arbitrary set of weakly stationary process { Z ct , c = 1 , . . . , C } with auto-correlation function ρ Z c ( h ) and SDF g c ( ω ) , their linear combination is still a weakly station-ary process, then if we define it as (cid:80) Cc =1 a c Z ct = X t then we have the next relations: C (cid:88) c =1 a c g c ( ω ) = f X ( ω ) and C (cid:88) c =1 a c ρ Z c ( h ) = ρ X ( h ) where f X ( ω ) and ρ X ( h ) are the SDF and auto-correlation function of X t respectively. Go-ing backwards in this reasoning, a decomposition in the frequency domain also has a directinterpretation in the time domain. As a side note (as this is beyond the scope of the paper),one should be careful with the last interpretation when dealing with multivariate time series,because of potential identifiability issues, i.e., two independent but identical processes canhave the same SDF and thus some constraints will need to be imposed.2.4. Specification of the prior.
The definition of the parametric kernel completed thegeneral specification of the DP mixture model state in equation 2.2. Now we discuss theprior distributions of the location and scale parameters of the kernel as well as the choice ofa prior probability mass function for the number of components.In the practical implementation of the MCMC algorithm we avoid the potential labelswitching problem of the location parameter ψ c , observed in mixture models, see (Jasra et al.,2005), by defining a random partition over the interval (0 , . as < (cid:15) < · · · < (cid:15) C . We fixedthe first and last frequency block as (cid:15) = 0 and (cid:15) C = 0 . . The partition protects the mix-ture components identifiability since we restrict each location parameter ψ c over the interval (cid:15) c − < ψ c < (cid:15) c for c = 1 , . . . , C limiting only one component per frequency block.At each iteration of the MCMC algorithm the random cut off points (cid:15) c are key to the proce-dure of generating the proper number of components. First, we randomly select a componentlabel c and propose to delete (Death) or create (Birth) with equal probability. When Birth ischosen, we draw uniformly the candidate value for the partition cut off (cid:15) ∗ over the interval ( (cid:15) c − , (cid:15) c ) and propose randomly ψ ∗ over the interval (cid:15) c − < ψ c < (cid:15) ∗ . In case of Death theselected cut off (cid:15) c is eliminated and a random ψ ∗ is proposed.In the BMARD method, we select a prior for the bandwidth parameter L c that penalizes thefull log-likelihood conditional on the DP mixture model parameters to capture sharp peaks.One example of the prior takes the form L c | C ∼ L δc ; note here that δ = − defines theJeffrey’s prior. Utilizing this type of prior on the full conditional likelihood has some effecton the identification of the components as well as the smoothness of the components. In theM-H step, the resulting contribution of a new proposal L ∗ , relative a previous value L onthe log-likelihood, is δ log( L/L ∗ ) which depends on the sign of δ . If δ < ( δ > ) sharper(broader) peaks update are penalized. However, if the update steps are small (i.e., L ∗ = L + (cid:15) ) then this contribution will vanish due to the ratio L/L ∗ being close to 1.When the M-H step involves the creation of a new component, then the contribution ofa new L c +1 is − δ log( L c +1 ) ; if the generated L c +1 is close to 0 then the contribution willbe substantial. Then, for δ < ( δ > ) the penalization is over a higher (smaller) number ofcomponents. The prior for the number of components was chosen from the general form π ( c ) = exp ( λc q ) based on Choudhuri et al. (2004) which suggested λ = − . , q = 2 . This prior penal-izes decreasing the number of components (in order to guarantee a more precise identificationof peaks) since the contribution to the likelihood in the M-H step will be λ ( c q − ( c + 1) q ) > .This prior will have more significant impact in cases when the observed signal is composedof low frequencies and when all the possible peaks are not well separated. Then the modelcould collapse to the simplest representation of one component and over-smooth the peaks.Allowing the number of components to increase as needed helps to detect and isolate peaksthat are not easily distinguishable.Given the prior definitions, the base measure G of the DP Mixture model associated withthe prior distribution over the parameter vector θ = ( ψ, L, (cid:15) ) , is as follows: (cid:15) c | (cid:15) − c , C ∼ U ( (cid:15) c − , (cid:15) c +1 ) , ψ c | ψ − c , ¯ (cid:15), C ∼ U ( (cid:15) c − , (cid:15) c ) , L c | C, b ∼ L δc . The posterior distribution of the DP mixing weights are estimated with the “stick break-ing" representation introduced by Sethuraman (1994). The posterior distribution of the pa-rameter α is sampled based on two different priors: (i) Gamma prior (as motivated in Ish-waran and James (2002)) and updating the sample for α i by a Gibbs sampling through themarginal α i +1 | V ∼ gamma ( M + a − , b − log ( q M )) , where V is the vector of “breaks" V j ∼ Beta (1 , α ) , q M = V M (cid:81) M − i =1 (1 − V i ) and M is the length of V , which is the level oftruncation; (ii) log-normal prior and sampling from the distribution of α using a slice sam-pler (Robert and Casella, 2013) based on the posterior disribution given by Escobar and West(1995).
3. Simulation Study.
Setting and criteria.
We conducted a simulation study to examine the relativestrengths and weaknesses of the BMARD method compared to three methods: (i) kernelregression smoothing using the Nadaraya-Watson estimator; (ii) a cubic smoothing splineapproach and (iii.) the non-parametric Bayesian estimator developed by Choudhuri et al.(2004) based on Bernstein polynomials. Both (i) and (ii) are optimized with respect to leaveone out cross validation (LOOCV) for the smoothness parameters.
The criteria.
In this study, three different parametric processes were used (see descriptionsbelow).The methods were compared under the following criteria: (A.) The local integratedabsolute error criterion (local IAE). Let ω max be the true value of the frequency at which thetrue SDF attains a peak and define f ( ω max ) to be the value of the SDF at the peak. Moreover,define a local interval around the peak to be ( ω max − (cid:15) , ω max + (cid:15) ) where (cid:15) > and (cid:15) > satisfy f ( ω max − (cid:15) ) ≈ . f ( ω max ) and f ( ω max + (cid:15) ) ≈ . f ( ω max ) . Then the local IAE ofthe estimator ˆ f around the frequency peak ω max is defined to be(4) Local IAE ω max = (cid:90) ω max + (cid:15) ω max − (cid:15) | ˆ f ( ω ) − f ( ω ) | dω. (B.) The maximal-phase disparity criterion. This criterion is inspired by Dickinson et al.(2018). Here, the focus is on identifying the frequency at which the SDF is maximized for aspecific band – as opposed to the local IAE criterion which focused on estimating the peakvalue of the SDF. This criterion was used in particular for the alpha frequency band motivatedfrom cognitive studies where the alpha band is associated with learning (or cognitive impair-ment as studied in children with autism). Based on this potential use, we compute the absolutedifference between ω max and the location of the local maximizer of an estimator ˆ f within a ODELING BRAIN WAVES AS A MIXTURE OF LATENT PROCESSES frequency band b , i.e., ˆ ω max = arg max ω ∈ b ( ˆ f ( ω )) and compute the maximal-phase-disparityas between the true and estimated maximizer to be | ˆ ω max − ω max | .Note that the integrated absolute error (IAE) computed through the whole frequency rangeas a global metric, i.e., it examines the performance across the entire range of frequencies.However, the local IAE measures the performance of the methods only in a local frequencyrange that contains the spectral peak. On the other hand the phase-disparity helps to evaluatethe performance of the estimator to properly locate the peaks of the SDF. The simulation settings . The first is a mixture of three AR(2) processes Z ct , c = 1 , , with peak locations chosen similar to the ones observed in the LFPs in the data analysis. Thefrequency peaks were located at 8, 30, and 60 Hz (assuming that the sampling rate is Hertz per second but we extract only 500 time points to estimate the SDF over half second).The peaks are associated to the following weights p = 0 . , p = 0 . , p = 0 . . Moreover, L c = 0 . , c = 1 , , , which produce sharp peaks in the SDF. As noted, this constructionsimulates realistic brain signals since we observe a common component at 8 Hz, a secondpeak at 30 Hz, and the third peak represents an artifact at 60 Hz common to be found in brainsignals measurements. Here, each pair of values ( ψ c , L c ) defines a unique AR(2) process Z ct .The goal of the next two settings is to test the robustness of BMARD with respect to a de-liberately misspecified parametric model. The second simulation setting is an AR(12) processthat was studied in Wahba (1980) to test the smoothing splines estimator of the standardizedSDF. Moreover, the same setting defined in 5 was also studied in Choudhuri et al. (2004)using Bernstein polynomials to estimate the AR(12) standardized SDF:(5) X t = 0 . X t − + 0 . X t − − . X t − + (cid:15) t where (cid:15) t is a white noise process. This process is useful to test the robustness of the BMARDmethod under model misspecification since the true process is not a mixture of AR (2) pro-cesses. Here, we are examining how well the mixture can approximate a model with 3 mainpeaks at ω = 0 , , Hz and two smaller peaks at ω = 150 , Hz. As a side remark,Shumway and Stoffer (2017) explain that higher order AR models can approximate the SDFof any arbitrary stationary linear process. Here, the importance of estimating this type ofmodel. The third setting is a MA(4) process generated as(6) X t = − . η t − − . η t − − . η t − + . η t − + η t , where η t is a white noise process. Similar processes are discussed in Wahba (1980), Lee(1997), Ombao et al. (2001), Fan and Gijbels (1996), and Pawitan and O’sullivan (1994).The standardized SDF of this moving average process is a smooth curve centered at ω = 250 Hz with an extra bump around 500 Hz. This model would help to test our model under thescenario of fitting broad and smooth peaks, when the misspecification is not only in termsof the order but also in the structure of dependency since the MA processes have a zerocorrelation beyond the order (or when the absolute value of the lag exceeds the order).To evaluate the performance of the BMARD method, 1000 time series were generated persetting each of T = 500 time points. The data settings match the window size in the LFP dataanalysis. The spectral spline estimation was deployed using the package connection betweenR and C++: Rcpp Eddelbuettel and François (2011), Eddelbuettel (2013), Eddelbuettel andBalamuta (2017) in order to boost its efficiency, with the number of MCMC samples fixed to100000 for six chains discarding 95000 as burn-in samples.The pointwise median of the sampled curves is computed considering 5000 after burn-in samples and all MCMC chains. The reported results are based on the implementationbased on the gamma prior for parameter α , the initial number of components was randomlyselected in the set { , . . . , } for each chain to start with different initial conditions aiming to convergence to the same posterior distribution. The level of truncation for the stick breakingrepresentation was set random per each chain in the set { , . . . , } as a conservative rulebased on Choudhuri et al. (2004).3.2. Results.
Figure 3 displays the logarithm of all the pointwise median curves per sim-ulated time series for each of the methods. The first row corresponds to the AR(2) mixturesetting. The results demonstrate that the BMARD method more accurately retrieves the peaksof the true standardized SDF compared to the other methods. It is also evident that the Bern-stein polynomial method consistently smooths out the peaks. The spline and kernel estima-tors also identify the peaks in different simulations but, in general, produce higher variabilityacross all frequencies. It capturing the main components but the estimated curves have morepeaks than the true SDF.The results from modeling the AR(12) process standardized SDF shows how the BMARDmethod outperforms the Bernstein polynomial method at retrieving the shapes of the peaks.The curves in log-scale indicate that when the SDF contains well-spaced and sharp peaks, theBMARD method produces better estimates. Regarding the algorithm sensitivity, we testeddifferent values of δ and λ for the bandwidth prior, and the level of truncation of the DPprior. The BMARD curves across chains generally consistently converged to similar curvesand parameter settings.In the MA(4) setting, the BMARD method was able to locate the peaks around the maxi-mum of the true standardized SDF. However, it requires several components to approximatethe smooth shape of the target standardized SDF due to the convexity of the autoregressivekernel - whereas the shape of the modes of MA(4) standardized SDF is concave. We pointout this behavior of the BMARD method as a limitation if there is an interest in the shapearound the peak (not only the actual location of the peak). The cubic spline estimator has abetter performance followed by the Kernel smoother, showing as well higher variability onthe curves observed in Figure 3.The local measures were compared mainly for the AR(2) mixture. The local IAE wassimilar for all methods, most likely because we sampled a window of only 500 points whilethe sampling rate is 1000 Hz, leading to compute the local IAE with only three frequencieswithin the band for each peak. The Bernstein method has a lower local error in the firsttwo peaks, at 8 and 30 Hz, while the BMARD method and the classical smoothers behavesimilarly. For the last peak at 60 Hz, the BMARD method achieved a lower local error inseveral simulations. This pattern leads us to conclude that when the peaks occur in closely-placed frequencies, implying potentially indistinguishable periodogram peaks, the BMARDmethod tends to identify them as a single peak, which turns out in higher error. Of coursethis problem can be alleviated by increasing the number of observations but keeping thesampling rate fixed, i.e., by observing the data for a longer physical time. In contrast, whenthere is sufficient space between peaks (higher frequency resolution), the BMARD methodcan identify the peak activity even when the contribution to the total SDF is small.We use each of the single MCMC chain estimates of the parameters ( ψ, L, p ) to evaluatethe average of the absolute for the maximal-phase disparity of the estimates compared tothe true values. To this end, we consider only the chains that correctly estimate the truenumber of components to compute the average difference across the six chains over the 1000simulations. Table 1 reports the average absolute disparity and its standard deviation acrossthe 1000 simulations for all the parameters of the individual components. The interval ofthe mean disparity plus two standard deviations contains 0 for the location parameters ψ c and the bandwidth parameters L c , c = 1 , , , which leads to conclude the estimation of bothparameters is unbiased, for all components. However, the weights errors show the estimationsdiffer from their true values, even when the shapes in the log scale of the mean curves inFigure 3 demonstrate the estimated curves are close enough to the true SDF. ODELING BRAIN WAVES AS A MIXTURE OF LATENT PROCESSES This behavior is due to having two parameters that contribute to the scale of the individualcomponents of the mixture since the bandwidth L c at lower values narrows the Kernel im-plying a higher maximum. On the other hand, the weight p c directly shrinks or expands thescale of the c − th autoregressive kernel in the mixture. In further joint analysis of the parame-ters, we noted that the overall shape and contribution to the SDF estimation of the individualcomponents is more sensitive to the bandwidth values.To assess model fit, we investigated the convergence of the Whittle log-likelihood to en-sure the stationarity of the MCMC for each of the generated datasets and each chain run. Theevolution of the Whittle likelihood (not shown here) displayed an initial increase followed bya stationary behavior for all chains before reaching 50,000 iterations of the MCMC. Conver-gence of the whittle log-likelihood to a stationary pattern was also observed for the Bernsteinpolynomial method. F IG . Estimated standardized SDF curves in log scale for all 1000 simulations highlighting in black the estimatorwith median IAE for the BMARD method included the MAP of the location parameters ψ of the median IAE curvein order to visualize the components associated with the decomposition. In summary, the simulations show that the BMARD method demonstrates yields better es-timates of the SDF associated when the true underlying process has auto-regressive (evenhigher order). However, it is less desirable than the nonparametric Bernstein polynomialmethod to fit the MA(4) SDF. Based on the AR(2) mixture simulations, the BMARD methodwas able to identify the SDF peaks are more accurately located compared with other meth-ods as the BMARD estimated curves neither oversmooth nor overfit the periodogram. Thishigher accuracy of identifying peaks and bandwidth in the SDF is the central contribution ofthe BMARD and is directly addressing the current limitations of spectral analysis of electro-physiological signals in neuroscience. T ABLE Mean and standard deviation of the absolute difference with respect to the AR(2) mixture simulation parameters.Only for generated time series with at least one chain that correctly identified the true number of components.All components were generated with L = . p = . p = . p = . Mean Disparity ψ = 8 Hz ψ = 30 Hz ψ = 60 Hz | ψ c − ˆ ψ c | Hz 7.65(5.17) 11.66(10.32) 7.96(8.76) | L c − ˆ L c | | p c − ˆ p c | ABLE Mean and standard deviation of the absolute difference with respect to the AR(12) peaks in the SDF. Only forgenerated time series with at least one chain that correctly identified the true number of components (5).
Mean Disparity ψ = 0 Hz ψ = 150 Hz ψ = 250 Hz ψ = 350 Hz ψ = 500 Hz | ψ c − ˆ ψ c | Hz 4.77(2.19) 32.67(21.36) 3.94(14.38) 31.09(19.58) 3.68(2.07)
4. Analysis of hippocampal LFPs from 5 rats.
A major scientific goal in our collabo-rator’s research (Fortin laboratory, UC Irvine) is to understand how the hippocampus supportsthe ability of animals to remember the specific sequence in which events occurred which isa capacity critical to daily life function. While it is well-established that the hippocampusplays a key role in this capacity, we have little insight into how this is accomplished at theneural level. In particular, neuroscientists need to identify features in the observed signals(e.g., most prominent oscillatory patterns) that provide information about memory. To helpachieve this goal, we apply the proposed BMARD method on the hippocampal LFP activityrecorded from laboratory rats as they performed a nonspatial sequence memory task (similarto paradigms used in humans; Allen et al. (2014)). Indeed, the need for precise identificationof oscillations in LFPs has been the primary motivation for developing the BMARD method.The second goal is apply BMARD to detect stimuli-induced changes in brain signals throughchanges in the peak activity or shifts in the frequency content.In the experiment performed in the Fortin lab, 5 rats were trained to recognize a sequenceof five different odors (A = Lemon, B = Rum, C = Anise, D = Vanilla, E = Banana). A trial(i.e., a single odor presentation within the sequence) is labeled as "in sequence" (InSeq) ifthe odor is presented in the correct sequence position (e.g., ABCDE); otherwise, the trial islabeled as "out of sequence" (OutSeq; e.g., ABE. . . ). Each rat is trained to identify InSeqtrials by holding its nose in the port for 1.2s (when an auditory signal is delivered), andOutSeq trials by withdrawing its nose from the odor port before 1.2s, illustrated in Figure 4.The LFPs were recorded from 20 electrodes (tetrodes) positioned in the pyramidal layer ofthe dorsal CA1 region of the hippocampus measured at a sampling rate of 1000 Hertz (1000time points per second).
ODELING BRAIN WAVES AS A MIXTURE OF LATENT PROCESSES F IG . In this experiment, rats receive multiple sequences of 5 odors (left; odors ABCDE). The animal is requiredto correctly identify whether the odor is presented "in sequence" (top right; by holding its nose in the port for ∼ This data was first examined in Allen et al. (2016) where the authors analyzed the LFPactivity using predefined frequency bands (4-12 Hz and 20-40 Hz). They found that, as ani-mals ran toward the odor port, power was high in the 4-12 Hz band, particularly in the 7-10Hz range. Upon odor delivery (when animals were immobile with their nose in the port), thatoscillation seems to reduce in frequency (stronger in the 5-8 Hz range). Power in the 20-40Hz range increased during odor presentations (particularly in the 19-35 Hz range), but wasweak during the running period. Notably, 20-40 Hz power showed an association with sessionperformance (higher in sessions with high performance). It also differed between InSeq andOutSeq trial types (higher on InSeq trials), although that analysis could not completely ruleout the effects of uncontrolled differences in the animal’s behavior. Clearly, such dynamicand frequency-specific (narrow band) patterns analysis in the LFP is necessary but cannotbe derived from standard analyses using broad, predefined frequency bands. To address thisissue, here we use the BMARD posterior curves to conduct inference over all the frequen-cies and locate those with significant changes in power across experimental conditions. Inthe analysis, the first odor was omitted because, regardless of the sequence, odor A is alwayspresented first (and hence always in correct order). In order to focus on the interaction be-tween temporal context and sequence type the trials in which a rat made the wrong responsewere excluded since different and more complex brain processes is expected to be presentover wrong responses. The number of the trials during the experiment is displayed in table 3. T ABLE Odors presentation numbers of trials for 5 rats per odor and temporal context.
Subject
B InSeq B OutSeq C InSeq C OutSeq D InSeq D OutSeq E InSeq E OutSeqS1 34 1 25 0 26 3 21 2S2 38 8 26 8 42 6 29 7S3 57 3 47 5 37 4 23 4S4 40 2 30 4 29 8 24 2S5 41 3 37 5 31 8 26 5
To identify the LFP dynamics associated with the processing of the odor stimuli we fo-cused the analysis on a single electrode aligned at the same brain location for all rats for two time periods: a Pre-Odor baseline period (500 ms before odor presentation), and an odorperiod (focusing on the first 500 ms post-odor presentation, during which the animal’s be-havior is consistent across trial types). LFPs are generally non-stationary but it is reasonableto model each of the LFP records to be locally stationary and hence quasi-stationary withinthe very brief intervals of 500 milliseconds. Separate analysis on a single electrode usingBMARD consistently retrieved the peak activity observed in the observed periodograms.It is also of interest to consider the joint variability in oscillatory behavior across differenttetrodes. Indeed there is a keen interest in the community to study potential lead-lag rela-tionships between tetrodes and also various types of spectral dependence between pairs oftetrodes including coherence Ombao et al. (2006), partial coherence (Park et al., 2014; Wanget al., 2016), and partial directed coherence (Baccala and Sameshima, 2001). Future workwill be on the generalization of BMARD for multivariate time series models. Under thisframework we will develop methods for inference on cross-tetrodes connectivity - but this isoutside of the scope of the current paper.The BMARD was used to explore the posterior curves of each periodogram running eightchains of size 100000 considering a burn-in period of 90000 samples. We randomly set theinitial number of components (of latent AR(2) processes) to be between 10 and 20. The sam-pler used for the posterior distribution of α is based on a gamma prior with initial parameters a = . , b = . to set a less informative prior. We set the initial value of the chain α = 1 .The level of truncation of the stick-breaking representation was selected randomly between20 and 30. The final estimated curves were computed as the point-wise median of 10000after burn-in posterior curves. With respect to the model fit assessment, we review the log-likelihood trace plots showing an increase and convergence to a stationary behavior for allthe different trials and temporal contexts.In our analysis, we use the LFPs at all trials and use the two-stage approach to estimatingthe SDF for the Inseq condition separately for the Outseq condition. In the first stage, the SDFis estimated separately for each trial; in the second stage, we combine information acrosstrials within each of the Inseq and Outseq condition. There are many possible ways to obtainsome "summary" across the SDFs including (a.) functional median curve (Ngo et al., 2015)(the option we used here); (b.) point-wise median for each frequency; (c.) weighted averageof all trial-specific SDF estimates where the weight is inversely proportional to the varianceof the estimate (a spectral curve estimate derived from a very noisy trial should have lowweight). Regardless, the summary tells us about the center of the distribution of true SDFsacross all hypothetically infinitely many trials.Some of the major questions posed in the Fortin laboratory is to examine (a.) differencesin peak activity for pre-odor presentation vs post-odor presentation for the Inseq condition;(b.) differences in peak activity for pre- vs. post-odor presentation for the Outseq condition;(c.) potential interaction between condition (Inseq vs Outseq) and temporal context (pre-odorvs post-odor presentation).We summarize the distribution of the peak frequency of each rat by considering the AR(2)component with the highest estimated contribution to the variance for each trial. More specif-ically, for each rat, trial, and temporal context, we select the component with the biggestestimated weight, usually their values were of at least with high concentration around . The main-peak activity rat-specific distributions (derived across all trials for each com-bination of pre vs post-odor and Inseq vs Outseq conditions) are shown in Figure 5. We firstconsider the Inseq trials (top row) for each of the 5 rats (subjects). For rat 1 (subject 1), thedistribution of the peak frequency pre-odor is unimodal with support over 0-25 Hertz; forpost-odor the distribution of the peak frequencies is also unimodal with the same support.The main difference between the pre-odor and post-odor for the distribution of the peak fre-quencies during the Inseq condition is the mode: it is 8 Hertz for pre-odor while it is higher ODELING BRAIN WAVES AS A MIXTURE OF LATENT PROCESSES at 10 Hertz for post-odor. For rat 2, the distributions are unimodal for both pre- and post-odorbut the support is narrower with concentration on 0-18 Hertz. The mode for the peak fre-quency for pre-odor is 6 Hertz, while it is 8 Hertz for post-odor. For rat 3, the support is evennarrower with concentration on 0-12 Hertz and the modes are almost identical for pre-odorand post-odor peak activity at approximately 6 Hertz. Rat 4 has a similar support as rat 1but displays a unique feature because the mode for the pre-odor is 7 Hertz which is higherthan that for the post-odor which is at 5 Hertz. For rat 5, the mode for the peak frequencyat pre-odor is 5 Hertz vs 7 Hertz for the post-odor. Note that for the Inseq condition we seequite a variation in the brain functional response across the 5 rats. Indeed for this reasonwe cannot find a solid justification for developing a single unifying model for these 5 rats.Thus, we shall proceed with an individual modeling for each rat and describe similaritiesand differences in the results across the rats. We conducted a formal test for the hipothe-sis of equality of the distributions of pre-odor vs post-odor using the Kolmogorov-Smirnovmethod. These tests were conducted separately for each of the 5 rats and the findings shownon each of the distribution graph in Figure 5. Under a of significance level subject 5 isthe only rat with significant difference of the pre and post distributions. Subjects 1, 2, and 3show clear equality of distributions while subject 4 is not significant despite the bimodalityof the post-sequence distribution.Note that the identification of these precise peaks were made possible because theBMARD method gives a representation of the SDF in terms of the building blocks whichare the AR(2) spectra. These precise differences in the modes would not have been detectedunder the standard approach where the frequency bands were predetermined (rather thanadapted to the specific data that is being analyzed). F IG . Peak activity distribution for one hour session for all subjects, the vertical lines correspond to the mode ofthe localized peaks with the highest mixture weight by OutSeq and InSeq contexts accordingly. The right of eachgraph contains the p-value and test statistic D of the Kolmogorov-Smirnov test. The distribution of peaks for the Outseq condition look different from the Inseq conditionuniformly across the 5 rats. We observe some evidence of bimodality and also a greater visual separation between the distribution of the pre-odor and post-odor presentation, againuniformly across the 5 rats. Most notable is rat 5 (Super-Chris, a laboratory favorite) whosedistribution of pre-odor and post-odor peaks have a different support - despite the estimatedmodes being very similar. For Super-Chris, the distribution of the peaks for pre-odor is verytight from roughly 4-10 Hertz while the post-odor peak has more variation with the spreadfrom approximately 4 - 16 Hertz. As in the Inseq condition, a Kolmogorov-Smirnov test forthe equality of the distributions of peaks for pre-odor vs post-odor was conducted. These testswere conducted separately for each of the 5 rats which results are found on the side of eachgraph of the bottom row of Figure 5 where now subject 3 and 5 display significant differencesamong distributions while subject 1, 2, and 4 show similar test outcomes that in the InSeqtrials.To address questions (a.) and (b.) above, we first define the frequency-specific differencein the SDF within InSeq trials and within Outseq trials to be, respectively, ∆ I ( ω ) = f IA ( ω ) − f IB ( ω )∆ O ( ω ) = f OA ( ω ) − f OB ( ω ) for all ω ∈ (0 , . where f IA ( ω ) and f IB ( ω ) are the SDF for Inseq trials for, respectively,the pre-odor and post-odor presentation. Thus, the quantity ∆ I ( ω ) measures the extent ofthe frequency-specific change after the rat detects an odor presented under the setting ofcorrect sequential order. The SDFs for Outseq trials the frequency-specific change ∆ O ( ω ) are defined in a similar manner. The functional boxplot for the differences for the pre-odorvs post-odor for the Inseq and Outseq conditions ( ∆ I ( ω ) and ∆ O ( ω ) ) are given in Figure 6where is observed positive differences for higher frequencies, consistent with Figure 5 on theactivation of this frequencies after the stimulus is presented to each rat, and is stronger in theInSeq trials shown on the top row. Besides, some peaks on the boxplots between 8 and 14 Hzaligned with the mode of the main peaks locations. F IG . (top row) Functional Boxplot with of internal region computed from 100000 iterations of a resamplingscheme on the difference ∆ I ( ω ) = f IA ( ω ) − f IB ( ω ) for each rat. (bottom row) Functional Boxplot with ofinternal region computed from 100000 iterations of a resampling scheme on the difference ∆ O ( ω ) = f OA ( ω ) − f OB ( ω ) for each rat. We now address the question of (c.) interaction between conditions and temporal context.Indeed, a natural question posed by the neuroscientists is whether the change (pre vs post-odor) differs between the Inseq and Outseq conditions. In particular, the task is to identify thespecific frequencies (or bands) where the changes (pre vs post-odor) are more highlighted for
ODELING BRAIN WAVES AS A MIXTURE OF LATENT PROCESSES Outseq trials (and which are more emphasized for Inseq trials). To conduct a formal inferenceon the interaction, we first define ∆ I − O ( ω ) = ∆ I ( ω ) − ∆ O ( ω ) . For a particular frequency ω ∗ , when ∆ I − O ( ω ∗ ) > then the change in pre vs post-odor forthe Inseq condition is greater than that for the Outseq condition. This will be important foridentifying physiological features in signals that differentiate between the two experimentalconditions. In our implementation, the last MCMC 5000 posterior samples were extractedfor each of the conditions before described for all trials.The use of the proposed "difference of the change" ∆ I − O ( ω ) can also be interpretedfrom another point of view. For example, when ∆ I − O ( ω ) > then ∆ Ii ( ω ) > ∆ Oj ( ω ) = ⇒ f IA ( ω ) − f IB ( ω ) > f OA ( ω ) − f OB ( ω ) . This can be rewritten in another form and thus leads tothe interpretation f IA ( ω ) − f OA ( ω ) > f IB ( ω ) − f OB ( ω ) . The quantity f IA ( ω ) − f OA ( ω ) measures the difference between the spectral power for Inseqvs Outseq conditions during the pre-odor presentation; whereas the f IB ( ω ) − f OB ( ω ) measurethe difference between Inseq and Outseq during the post-odor presentations.The posterior inference of the curve ∆ I − O ( ω ) , shown in Figure 7, displays the functionalboxplot from the fda package in R set to show the internal region. For rats 1, 3, and 4we observe similar pattern of decay for frequencies higher to 8 Hz but only subject 3 showssignificant differences below 0 for frequencies between 12 and 24 Hz. Rat 4 has a significant ∆ I − ( ω ) for < = ω < = 32 Hz while the significant differences for rat 4 appear only for ω < Hz . A different behavior is noted for rat 2 who shows positive ∆ I − ( ω ) for ω > Hz.Rat 5 showed quite a distinct pattern and most notably the change for pre-vs-post duringOutseq is significantly greater than the change during Inseq at a very narrow band around6-10 Hertz. Positive difference is observed on subject 5 for ω < = 22
Hz but showed quitea distinct pattern and most notably the change for pre-vs-post during Outseq significantlygreater than the change during Inseq at a very narrow band around 6-10 Hertz. Thus, theBMARD method produced a highly specific band which identifies the difference in the brainreaction to a correct sequence vs incorrect sequence of presentation of the odorThe potential impact to neuroscience brought by the new findings obtained from BMARDincludes (a.) identifying the specific frequency (or narrow bands) of the most dominant neu-ronal oscillations that are engaged in memory; (b.) leading to new sets of hypothesis aboutmemory and designing new experiments that test for intervention effects such as applyingelectrical stimulation at the identified frequency peaks. F IG . Functional Boxplots with of internal region for the curves ∆ I − O ( ω ) = ∆ I ( ω ) − ∆ O ( ω ) = f IA ( ω ) − f IB ( ω ) − ( f OA ( ω ) − f OB ( ω )) showing the median curve. The curves used in the boxplot were computed using aresample scheme of 100000 samples from the BMARD posterior SDF curves.
5. Conclusion.
The primary contribution of the BMARD method is that it provideshighly specific frequency information about spectral density function. It gives the estimatednumber of spectral components (peaks). It gives precise identification of the dominant fre-quencies where the SDF attains localized peaks. It also gives the corresponding spread (band-width) for each of the spectral peaks that were identified. The determination of the number ofcomponents, the location of the peaks and the spread are all data-adaptive rather than imposedapriori using the standard methods. The construction of the SDF under the BMARD methoduses a family of autoregressive kernels which, along with a Dirichlet process prior, givesrise to a Bayesian nonparametric discrete mixture model. BMARD decomposes a stationaryunivariate time series as a linear mixture of latent AR(2) processes, where each componentis associated to a unique peak on the SDF. The weights of the mixture provides an insightinto the components contribution of each latent process to the total variance of the observedsignal. Moreover, as demonstrated in the simulations, BMARD gives very good estimateswithout requiring a higher number of components, which is essential when the sampling rateis low. Thus, even with a relatively few components, the location of the frequencies corre-sponding to spectral peaks are well estimated because the method data-adaptively identifiesthe optimal placement of these peak frequencies.The comparison of BMARD with other approaches points to a limitation when estimatinga moving average SDF due to its concavity and smooth shape. Since the AR(2) kernel is aconvex function, fitting BMARD to a smooth SDF results into a mixture of several kernelslocalized by peaks arising from the variability of the periodogram. When the SDF is shapedby sharp peaks as the autoregressive models, BMARD provides parsimonious estimator forthe SDF, and outperform other methods. However, despite this limitation, the BMARD stillperforms very well using the metric of identifying the frequencies that produce the spectralpeaks.The LFP analysis of 5 rats shows how the signal is decomposed into different frequencycomponents during performance on an odor sequence memory task, and how the distributionof peak activity varies across trial types. The significant difference between InSeq and OutSeqtrials (in which odors were presented in the correct or incorrect order, respectively) provides
ODELING BRAIN WAVES AS A MIXTURE OF LATENT PROCESSES compelling evidence that hippocampal LFP activity carries significant information about thesequential organization of our experiences, specifically whether or not events occurred inthe expected order. This is an important finding because LFP activity reflects the summedinfluence of large groups of neurons near the electrode tip; therefore, unlike the spiking ac-tivity of individual neurons, LFP is not the type of signal expected to carry much informationabout the content of specific trials. In fact, hippocampal oscillations are generally viewedas playing an important role in synchronizing neural activity across neuronal ensembles andcircuits, or promoting distinct information processing states (reviewed in Colgin (2016)). Be-yond our specific findings, the development of this model may have broader implications inneuroscience as a novel approach to extract additional trial-specific information from LFPrecordings, an electrophysiological approach extensively used in the field.The decomposition representation in the BMARD method gives a different insight ofweakly stationary processes as composed of latent processes with various oscillatory be-havior. The application of BMARD is broad and could extend well beyond neuroscience. Itis applicable to other kinds of data such as weather evolution composed of natural cycleswith different periodicity or financial data that exhibit economical cycles that are not nec-essarily sinusoidal but can be better represented by simpler stochastic processes explainingshort, medium and long term tendencies. Acknowledgments.
The authors thank Dr. Hart (see Hart et al. (2018)) for generouslysharing his computer codes. We acknowledge financial support from the KAUST ResearchFund and the NIH 1R01EB028753-01 to B. Shahbaba and N. Fortin.REFERENCES
Allen, T. A., A. M. Morris, A. T. Mattfeld, C. E. Stark, and N. J. Fortin (2014). A sequence of events model ofepisodic memory shows parallels in rats and humans.
Hippocampus 24 (10), 1178–1188.Allen, T. A., D. M. Salz, S. McKenzie, and N. J. Fortin (2016). Nonspatial sequence coding in ca1 neurons.
Journal of Neuroscience 36 (5), 1547–1563.Antoniak, C. E. (1974). Mixture of Dirichlet process with applications to Bayesian nonparametric problems.
Annals of Statistics 273(5281) , 1152–1174.Baccala, L. A. and K. Sameshima (2001, May). Partial directed coherence: a new concept in neural structuredetermination.
Biological Cybernetics 84 (6), 463–474.Brockwell, P. J. and R. A. Davis (1986).
Time Series: Theory and Methods . Berlin, Heidelberg: Springer-Verlag.Buzsáki, G. (2009, 01).
Rhythms of The Brain , pp. xiv, 448 p.Cadonna, A., A. Kottas, and R. Prado (2017). Bayesian mixture modeling for spectral density estimation.
Statis-tics & Probability Letters 125 , 189 – 195.Choudhuri, N., S. Ghosal, and A. Roy (2004). Bayesian estimation of the spectral density of a time series.
Journalof the American Statistical Association 99 (468), 1050–1059.Colgin, L. L. (2016). Rhythms of the hippocampal network.
Nature Reviews Neuroscience 17 (4), 239–249.Congdon, P. (2007).
Bayesian statistical modelling , Volume 704. John Wiley & Sons.Dickinson, A., C. DiStefano, D. Senturk, and S. S. Jeste (2018). Peak alpha frequency is a neural marker ofcognitive function across the autism spectrum.
European Journal of Neuroscience 47 (6), 643–651.Eddelbuettel, D. (2013).
Seamless R and C++ Integration with Rcpp . New York: Springer. ISBN 978-1-4614-6867-7.Eddelbuettel, D. and J. J. Balamuta (2017, aug). Extending extitR with extitC++: A Brief Introduction to extitR-cpp. , e3188v1.Eddelbuettel, D. and R. François (2011). Rcpp: Seamless R and C++ integration.
Journal of Statistical Soft-ware 40 (8), 1–18.Edwards, M. C., R. Meyer, and N. Christensen (2019). Bayesian nonparametric spectral density estimation usingb-spline priors.
Statistics and Computing 29 (1), 67–78.Eichenbaum, H. (2014). Time cells in the hippocampus: a new dimension for mapping memories.
Nature ReviewsNeuroscience 15 (11), 732–744.Escobar, M. D. and M. West (1995). Bayesian density estimation and inference using mixtures.
Journal of theamerican statistical association 90 (430), 577–588. Fan, J. and I. Gijbels (1996).
Local polynomial modelling and its applications: monographs on statistics andapplied probability 66 , Volume 66. CRC Press.Ferguson, T. S. (1973). A bayesian analysis of some nonparametric problems.
The annals of statistics , 209–230.Gao, X., W. Shen, B. Shahbaba, N. Fortin, and H. Ombao (2016). Evolutionary state-space model and its appli-cation to time-frequency analysis of local field potentials. arXiv preprint arXiv:1610.07271 .Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin (2013).
Bayesian data analysis .CRC press.Hart, B., M. Guindani, S. Malone, and M. Fiecas (2018). Multi-subject spectral analysis of resting-state eegsignals from twins using a nested bernstein dirichlet prior.Ishwaran, H. and L. F. James (2002). Approximate dirichlet process computing in finite normal mixtures: smooth-ing and prior information.
Journal of Computational and Graphical statistics 11 (3), 508–532.Ishwaran, H. and M. Zarepour (2000). Markov chain monte carlo in approximate dirichlet and beta two-parameterprocess hierarchical models.
Biometrika 87 (2), 371–390.Jasra, A., C. C. Holmes, and D. A. Stephens (2005). Markov chain monte carlo methods and the label switchingproblem in bayesian mixture modeling.
Statistical Science , 50–67.Kass, R. E., U. T. Eden, E. N. Brown, and S. O. service) (2014).
Analysis of Neural Data . New York, NY:Springer New York.Kirch, C., M. C. Edwards, A. Meier, and R. Meyer (2019, 12). Beyond whittle: Nonparametric correction of aparametric likelihood with a focus on bayesian time series analysis.
Bayesian Anal. 14 (4), 1037–1073.Krafty, R. and W. Collinge (2013). Penalized multivariate whittle likelihood for power spectrum estimation.
Biometrika 100 (4), 447–458.Krafty, R., M. Hall, and W. Guo (2011). Penalized multivariate whittle likelihood for power spectrum estimation.
Biometrika 98 (3), 583–598.Lee, T. C. (1997). A simple span selector for periodogram smoothing.
Biometrika 84 (4), 965–969.Macaro, C. and R. Prado (2014). Spectral decompositions of multiple time series: a bayesian non-parametricapproach.
Psychometrika 79 (1), 105–129.Müller, P., F. A. Quintana, A. Jara, and T. Hanson (2015).
Bayesian nonparametric data analysis . Springer.Neal, R. M. (2000). Markov chain sampling methods for dirichlet process mixture models.
Journal of computa-tional and graphical statistics 9 (2), 249–265.Ngo, D., H. Ombao, Y. Sun, M. G. Genton, J. Wu, R. Srinivasan, and S. Cramer (2015). An exploratory dataanalysis of electroencephalograms using the functional boxplots approach.
Frontiers in neuroscience 9 , 282.Ombao, H., M. Lindquist, S. D. Thompson, Wesley (Of University of California, and J. O. U. o. C. Aston (2017).
Handbook of neuroimaging data analysis . Boca Raton: CRC Press, Taylor & Francis Group.Ombao, H., S. Van Bellegem, et al. (2006). Coherence analysis of nonstationary time series: a linear filteringpoint of view.
IEEE Transactions on Signal Processing 56 , 2259–2266.Ombao, H. C., J. A. Raz, R. L. Strawderman, and R. von Sachs (2001). A simple generalised crossvalidationmethod of span selection for periodogram smoothing.
Biometrika 88 (4), 1186–1192.Ozaki, T. (2012).
Time series modeling of neuroscience data . CRC press.Park, T., I. A. Eckley, and H. C. Ombao (2014). Estimating time-evolving partial coherence between signals viamultivariate locally stationary wavelet processes.
IEEE Transactions on Signal Processing 62 (20), 5240–5250.Pawitan, Y. and F. O’sullivan (1994). Nonparametric spectral density estimation using penalized whittle likeli-hood.
Journal of the American Statistical Association 89 (426), 600–610.Petrone, S. (1999). Bayesian density estimation using bernstein polynomials.
Canadian Journal of Statis-tics 27 (1), 105–126.Prado, R. and M. West (2010).
Time series: modeling, computation, and inference . CRC Press.Robert, C. and G. Casella (2013).
Monte Carlo statistical methods . Springer Science & Business Media.Sethuraman, J. (1994). A constructive definition of dirichlet priors.
Statistica sinica , 639–650.Shahbaba, B. and R. Neal (2009). Nonlinear models using dirichlet process mixtures.
Journal of MachineLearning Research 10 (Aug), 1829–1850.Shumway, R. H. and D. S. Stoffer (2017).
Time series analysis and its applications: with R examples . Springer.Wahba, G. (1980). Automatic smoothing of the log periodogram.
Journal of the American Statistical Associa-tion 75 (369), 122–132.Wang, Y., C.-M. Ting, and H. Ombao (2016). Modeling effective connectivity in high-dimensional cortical sourcesignals.
IEEE Journal of Selected Topics in Signal Processing 10 (7), 1315–1325.Whittle, P. (1957). Curve and periodogram smoothing.
Journal of the Royal Statistical Society: Series B (Method-ological) 19 (1), 38–47.ODELING BRAIN WAVES AS A MIXTURE OF LATENT PROCESSES Appendix: MCMC Algorithm.
We implemented a Metropolis-Hastings within Gibbsto sample from the posterior distribution of the parameters. Our algorithm first updates thenumber of components with a birth-death process, which at each iteration proposes withequal probability to increase the number of components by one or decrease it by one. In thecase of a birth step, the M-H ratio is.(7) q ( θ | θ ∗ ) /q ( θ ∗ | θ ) = (cid:0) I ( (cid:15) ∗ ( i ) j < ψ ( i ) j )( (cid:15) ∗ ( i ) j − (cid:15) j − ) − + I ( (cid:15) ∗ ( i ) j > ψ ( i ) j )( (cid:15) j − (cid:15) ∗ ( i ) j ) − (cid:1) − Where (cid:15) ∗ ( i ) represent a new generation for the partition in the interval ( (cid:15) j − , (cid:15) j ) assumingthe random selection of the index j to split that subinterval and I is the indicator function.While in the the case of choosing a death step the M-H probability is computed as(8) q ( θ | θ ∗ ) /q ( θ ∗ | θ ) = I ( (cid:15) j > ψ ∗ ( i ) j )( (cid:15) j − (cid:15) j − ) − + I ( (cid:15) j < ψ ∗ ( i ) j )( (cid:15) j +1 − (cid:15) j ) − where ψ ∗ represents the uniform draw when two component are joined by deleting the value (cid:15) j from the partition. The proposal distribution to update of the location parameters is uniformin the interval ( ψ c − (cid:15), ψ c + (cid:15) ) with appropriate conditions to take the modulus over thesubinterval defined by the partition. For the scale parameter, we use a similar uniform drawover the interval ( L c − (cid:15), L c + (cid:15) ) .The alpha parameter is updated based on the slice sampling to generate a random walkover the subgraph of the marginal posterior distribution of α given by:(9) π ( α | C ) ∝ π ( α ) α C − ( α + T ) β ( α + 1 , T ) where π ( α ) is the prior over α in our algorithm we set π ( α ) as log-normal, T the observedtime series size, and β ( . ) is the beta function. The next algorithm presents the steps described. Algorithm 1:
M-H within Gibbs DP AR(2) mixture for stationary time series
Propose: C (0) , M, d ;Initialize randomly: ψ (0)1 , . . . , ψ (0) C , L (0)1 , . . . , L (0)
C , V (0)1 , . . . , V (0)
M , (cid:15) (0)1 , . . . , (cid:15) (0) C − ; while i ≤ MCMC chain size do Choose Death or Birth with probability equal to 1/2; if Death then
Choose j randomly with probability /C ( i ) ;Remove (cid:15)j ;Propose a new ψ ∗ ( i ) j uniformly on ( (cid:15)j − , (cid:15)j +1) ;Propose a new L ∗ ( i ) j uniformly on (0 , d ) ;Compute the M-H probability ;Make a Reject-Acceptance M-H step for the new C ∗ ( i ) = C ( i − − , ¯ psi ∗ , ¯ L ∗ ; else Birth: Generate a new component ;Choose a j randomly with probability /C ( i ) ;Propose a new (cid:15) ∗ ( i ) j uniformly on ( (cid:15)j − , (cid:15)j ) ;Propose a new ψ ∗ ( i ) j uniformly on the interval where is not yet a ψ ;Propose a new L ∗ ( i ) j uniformly on (0 , d ) ;Compute the M-H probability ;Make a Reject-Acceptance M-H step for the new C ∗ ( i ) = C ( i −
1) + 1 , ¯ psi ∗ , ¯ L ∗ ; end Sample jointly the vector ¯ ψ ( i ) ;Sample jointly the vector ¯ L ( i ) ;Sample jointly the vector ¯ V ( i ) ;Sample jointly the vector ¯ Z ( i ) ;Sample α ( i ) ;;