[PDF] Sequential Detection of Regime Changes in Neural Data

Abstract

The problem of detecting changes in firing patterns in neural data is studied. The problem is formulated as a quickest change detection problem. Important algorithms from the literature are reviewed. A new algorithmic technique is discussed to detect deviations from learned baseline behavior. The algorithms studied can be applied to both spike and local field potential data. The algorithms are applied to mice spike data to verify the presence of behavioral learning.

Full PDF

SSequential Detection of Regime Changes in Neural Data

Taposh Banerjee, Stephen Allsop, Kay M. Tye, Demba Ba and Vahid Tarokh

Abstract — The problem of detecting changes in ﬁring pat-terns in neural data is studied. The problem is formulated as aquickest change detection problem. Important algorithms fromthe literature are reviewed. A new algorithmic technique isdiscussed to detect deviations from learned baseline behavior.The algorithms studied can be applied to both spike and localﬁeld potential data. The algorithms are applied to mice spikedata to verify the presence of behavioral learning.

I. INTRODUCTIONA major research direction in the area of brain-computerinterfaces (BCIs) is the decoding of brain signals [1]. Mostof the research in this area focus on classiﬁcation of brainsignals based on spike data or local ﬁeld potential (LFP)data [2], [3], [4]. The classiﬁcation algorithms studied inthe literature are based on the idea that the ﬁring patternof neurons will be different under different classes. Thedifferent ﬁring patterns affect the number and positions ofspikes in the spike data, and may also affect the frequencyspectrum of the LFP data. The latter fact is used to trainclassiﬁers based on the Fourier or wavelet coefﬁcients, orbased on the power spectrum of the LFP data.In this paper, we study another important aspect of brainsignal processing that is of detecting changes in neural ﬁringpatterns. For future BCIs, it is envisioned that the BCIs willbe capable of decoding brain signals and controlling externaloutputs, e.g., a robotic arm. In these applications, while isimportant to decode what the brain is trying to do in a giventask, it is also important to learn the transition boundariesbetween different types of tasks. Thus, algorithms are neededthat can observe spike or LFP data in real-time and detectchanges in regimes in neural data. Such change detectionalgorithms can also be used for classiﬁcation purposes. Forexample, a change in ﬁring pattern compared to a baselinepattern may be used to test a hypothesis, for example, toverify a change in behavior or to test if an animal learned toassociate an activity to a cue, etc.In this paper, we study algorithms that can be used todetect changes in statistical behavior of spike and LFP data.In Statistics, such change detection algorithms are developed

Banerjee is with the University of Texas at San Antonio (ECE), Allsopis with the Harvard Medical School, Tye is with the MIT Dept. ofBrain and Cognitive Sciences, Ba is with Harvard University (SEAS),and Tarokh is with Duke University (ECE). Corresponding author email:[email protected], Ba and Tarokh acknowledges support of the Army ResearchOfﬁce under Contract Number W911NF-16-1-0368. This is part of thecollaboration between US DOD, UK MOD and UK Engineering andPhysical Research Council (EPSRC) under the Multidisciplinary UniversityResearch Initiative. Tye acknowledges support of the Pioneer Award DP1-AT009925 (NCCIH). Allsop acknowledges support of the Jeffrey and NancyHalis Fellowship, the Henry E. Singleton Fund, and an NLM training grant.

Fig. 1: A change in the mean and variance of a sequence ofindependent Gaussian random variables.in the framework of quickest change detection [5], [6], [7].In Section II, we review some fundamental algorithms fromthe literature. As will be discussed, to effectively use thealgorithms, we need knowledge of both the pre- and post-change distributions of the data. In Section III, we proposechange detection algorithms that can be used to detectdeviations from learned baseline behavior, and hence can beused without knowledge of the post-change distribution. InSection IV, we apply the algorithms to spike data to verifybehavioral learning in mice.II. R

EVIEW OF Q UICKEST C HANGE D ETECTION

We model the spike data or the LFP data as a stochasticprocess { X n } with probability law in a parametric family P θ .Here, θ could be inﬁnite dimensional making the probabilitylaw nonparametric. We assume that in the nominal regimethe law of the process is P θ . We assume that at some pointin time γ called the change point in the following, the lawof the process changes to P θ . See Fig. 1. The objective isto observe this process in real-time and detect this changein the law as quickly as possible. The algorithm to detectthis change is expressed in terms of a stopping time τ : aninteger-valued random variable such that the event { τ ≤ n } is only a function of the ﬁrst n observations ( X , · · · , X n ) .The variable τ has to be selected so as to minimize a versionof the delay τ − γ subject to a constraint on the event of falsealarm { τ < γ } . The random variables can be dependentand the parameters θ and θ may not be known. In thesubsections below we discuss change detection algorithmsthat are effective under various modeling assumptions. a r X i v : . [ ee ss . SP ] S e p . IID Data with Known θ and θ We assume that the random variables are i.i.d. with proba-bility density function (p.d.f.) f (under law P θ observationsare i.i.d. with density f ). At time γ the density of the randomvariables changes to f (cid:54) = f (under law P θ observationsare i.i.d. with density f ). Thus, the variables { X n } areindependent conditioned on the change point γ . There aremany popular formulations of the QCD problem, the mostpopular once are the formulations of Lorden [8] and Pollak[9]. We do not discuss the problem formulations here. But,an algorithm that is optimal in some well-deﬁned sense withrespect to both these formulations is the Cumulative Sum(CUSUM) algorithm. The CUSUM algorithm was proposedby Page in [10]. It is described as follows. We compute asequence of statistics { W n } using the log likelihood ratio ofthe observations: W n = max ≤ k ≤ n +1 n (cid:88) i = k log f ( X i ) f ( X i ) , (1)and a change is declared, i.e., and an alarm is raised, the ﬁrsttime the statistic is above a threshold A : τ C = min { n ≥ W n > A } . (2)The statistic W n is a maximum likelihood statistic: the term (cid:80) ni = k log f ( X i ) /f ( X i ) is the log likelihood ratio of theobservations given γ = k . The statistic W n is the maximumof this conditioned log likelihood ratio over all possiblechange points ≤ k ≤ n before n and k = n + 1 . The latterrepresents no change for which the log likelihood ratio iszero keeping the statistic positive. The statistic W n can becomputed recursively as follows: W = 0 , and W n = (cid:18) W n − + log f ( X n ) f ( X n ) (cid:19) + , (3)where ( x ) + := max { x, } . It is possible that a change neveroccurs ( γ = ∞ ). In that case, τ C is the time to false alarm andcan be controlled by the threshold A . Thus, the threshold A provides a trade-off between delay and false alarm because alarger value of threshold also leads to a larger delay when thechange actually occurs. We refer the readers to [7] and [11]for delay and false alarm analysis of the CUSUM algorithm.To understand why the CUSUM algorithm works, deﬁnethe notion of Kullback-Leibler divergence between probabil-ity densities f and g : D ( f (cid:107) g ) = (cid:90) f ( x ) log f ( x ) g ( x ) dx. It is well known that [7] D ( f (cid:107) g ) ≥ , with equality iff f = g. At each time step, the log likelihood ratio of the observations log f ( X n ) /f ( X n ) is added to the statistic W n . If there isno change or anomaly, then the observations have density f , and the average value of the log likelihood ratio under f is − D ( f (cid:107) f ) < . After the change, the observationshave density f and the mean of log likelihood ratio is D ( f (cid:107) f ) > . Thus, before the change, the statistic W n has a tendency to go to −∞ (but is stopped at by the ( · ) + operation). After the change, the statistic W n has atendency to grow to ∞ , this is detected by using a suitablelarge threshold of A .If the spike data is modeled as a Poisson or a Bernoulliprocess with known pre- and post-change parameters, thenwe can use the CUSUM algorithm to detect a change in theﬁring rate or pattern by detecting a change in Poisson orBernoulli parameters. B. IID Data with θ and θ Unknown but Finite Dimensional

The CUSUM algorithm can be applied only when both thepre- and post-change densities f and f are precisely known.If f and f are not known to us beforehand and has to beestimated based on some training data, then the CUSUMalgorithm is no longer optimal. In fact, the algorithm mayeven fail to detect changes accurately due to the error inestimating the densities f and/or f . In such a situationwe can use the Generalized CUSUM (GCUSUM) algorithmbased on the Generalized Likelihood Ratio (GLR) approach.In the GLR approach, roughly speaking, we replace theunknown by its Maximum Likelihood (ML) estimate. TheCUSUM algorithm is also an example of a GLR test wherethe GLR part is the max operation over the unknown changepoint. In the GCUSUM algorithm, in addition to a max overchange point, we also replace the unknown parameters θ and θ by their ML estimates [8], [12], [11].We assume that the densities f and f belong to theparameteric family { f θ } , θ ∈ Θ . Also, let f = f θ and f = f θ . We do not know the values of θ and θ butknow that θ ∈ Θ and θ ∈ Θ ; Θ i ⊂ Θ , i = 1 , , Θ ∩ Θ = ∅ . Then, the GCUSUM test statistic basedon observations { X , · · · , X n } when the change occurs at γ = k is given by G n ( k ) = max θ ∈ Θ k − (cid:88) i =1 log f θ ( X i ) + max θ ∈ Θ n (cid:88) i = k log f θ ( X i ) − max θ ∈ Θ n (cid:88) i =1 log f θ ( X i ) , (4)and the GLR statistic at time n is deﬁned as G n = max ≤ k ≤ n G n ( k ) . (5)A change is declared at the stopping time τ G = min { n ≥ G n > A } . (6)The above GCUSUM test is discussed in [12]. Althoughthe test performs well in practice, it is known to be optimal orasymptotically optimal only if some additional assumptionsare made about the family of densities. One speciﬁc caseis when the densities belong to an exponential family, andthe pre-change distribution is known. Such an analysis wascarried out by Lorden in [8]. We refer the readers to [13] formore details on the Lorden’s test and its detailed analysisnder misspecﬁcation of the pre-change distribution. Thereare also other approaches and algorithms using which onecan detect changes under model uncertainty. We refer thereaders to [7] for a review.If the spike data is modeled as a Poisson or a Bernoulliprocess with unknown pre- and post-change parameters, thenwe can use the GCUSUM algorthm to detect a change inthe ﬁring rate or pattern by detecting a change in Poisson orBernoulli parameters. C. Algorithm for Dependent Data

Both the CUSUM algorithm and the GCUSUM algorithmare designed to work with i.i.d. data. In general, the datasequence need not be i.i.d., and we need more generalalgorithms to detect changes. Algorithms for non-i.i.d. datacan be obtained by replacing the product densities in thedeﬁnition of the CUSUM or GCUSUM algorithms by jointdensities.Let X n denote the vector ( X , · · · , X n ) . Also, let f ( k ) ( x n ) denote the joint density of X n given that changeoccured at time γ = k . Then the CUSUM statistic for non-i.i.d. data is given by (compare with (1)) W n = max ≤ k ≤ n +1 log f ( k ) ( X n ) f ( ∞ ) ( X n ) . (7)Let the joint density of X n under law P θ be of the form f θ ( x n ) = n (cid:89) i =1 f ( x i | x i − ; θ ) . Then, one way to write the CUSUM statistic for the non-i.i.d.data above is W n = max ≤ k ≤ n +1 n (cid:88) i = k log f ( X i | X i − ; θ ) f ( X i | X i − ; θ ) . (8)Another way is to assume independent pre- and post-changedata W n = max ≤ k ≤ n +1 n (cid:88) i = k log f ( X i | X i − k ; θ ) f ( X i | X i − k ; θ ) . (9)If the spike data is modeled as a hidden Markov model ora state-space process, then we can use the modiﬁed CUSUMalgorithm for dependent data to detect a change in the ﬁringpattern. Also, if the LFP data is modeled as a ARMA time-series model, then also we can use algorithms in (8) or (9)to detect the change.III. D ETECTING D EVIATIONS FROM A K NOWN B ASELINE B EHAVIOR

In order to employ the CUSUM and the GCUSUMalgorithm, we need prior information on the post-changedistribution: we need to know the exact post-change dis-tribution for the CUSUM algorithm and need to know thepost-change parametric family for the GCUSUM algorithm.In many applications, we may not always know the statisticalcharacteristics of the data in the anomalous regime or havetoo few samples from the anomalous regime to learn the post-change distribution. In this section, we discuss an algorithmsthat we can employ in such a scenario. Suppose we have a summary statistics h : R d → R of thedata such that E [ h ( X n − d , · · · , X n )] = µ , ∀ n, (10)in the nominal regime, and we expect this mean to increasein the post-change regime. Then we can use W n = ( W n − + h ( X n − d , · · · , X n ) − µ − λ ) + N = min { n ≥ W n > A } (11)to detect the change. Here, λ > is the minimum amount ofchange in the mean of h ( X n − d , · · · , X n ) that the algorithmcan detect, and is a design parameter. Before the change, themean of the increment to W n is E [ h ( X n − d , · · · , X n ) − µ − λ ] = − λ < . Thus, like the CUSUM algorithm, the statistic W n here also has a negative drift before the change. Further,if E [ h ( X n − d , · · · , X n )] > µ + λ after the change, then thedrift will be positive, and the change can be detected usingthis algorithm. In the following, we call this algorithm theDeviation-CUSUM algorithm.We now provide some examples of summary statistics thatcan be employed in practice.1) Change in observation mean : h ( X n − d , · · · , X n ) = X n .2) Change in variance : Suppose the process is zero meanand h ( X n − d , · · · , X n ) = X n .3) Change in entropy : h ( X n − d , · · · , X n ) =log 1 /f ( X n ) , where f is the pre-change density.4) Change in power spectrum : Let { X n } be a zeromean stationary time-series. Then it has the spectralrepresentation in terms of a Levy process Z [14] X n = (cid:90) π − π e inν dZ ( ν ) . (12)This implies that the autocorrelation function { R ( h ) } of the process X has the spectral representation R ( h ) = (cid:90) π − π e ihν dF ( ν ) , (13)where F is the quadratic variation of the process Z : dF = | dZ | . (14)The function F is also called the power spectral densityof the process X . In [15], Priestley proposed a classof nonstationary processes with the representation X n = (cid:90) π − π A n ( ν ) e inν dZ ( ν ) . (15)Priestley called these processes locally stationary. Theproperties of the function A n ( ν ) can be found in [15].The quantity dF n = | A n | dF (16)is called the evolutionary spectrum of the process.Then we can deﬁne h ( X n − d , · · · , X n ) = (cid:90) ν ˆ dF n ( ν ) , (17)ig. 2: Mice Experiment.Fig. 3: The Deviation-CUSUM algorithm applied to binnedspike data. The algorithm detects the change in the ﬁringpattern.where ˆ dF n is an estimate of the evolutionary spectrumusing the variables ( X n − d , · · · , X n ) with mean µ = E [ h ( X n − d , · · · , X n )] = (cid:90) ν dF ( ν ) (18)before change. There are other ways the summarystatistic can be deﬁned to capture a change in the powerspectrum. We do not discuss them here.IV. B EHAVIORAL L EARNING IN M ICE

In a mice experiment, an observer mouse learns to asso-ciate a cue with a shock given to a demonstrator mouse. SeeFig.2. The details of the two-day experiment can be foundin [16]. There were a total of 45 trials on Day 1. In the ﬁrst15 trials, a cue is not followed by shock. The ﬁring patternin these 15 trials is used as a baseline. In the next 30 trials,the cue is followed by a shock to the demonstrator mouse.Invasive data is collected from the observer mouse during thetrial. The objective is to detect a possible change in neuralﬁring pattern in the observer mouse after the shocks start inthe trial 15, after the cue but before the shock is actuallyapplied. This change in ﬁring pattern from the baseline isseen as an indication of behavioral learning.In Fig. 3, we have plotted the result of applying theDeviation-CUSUM algorithm to the mice data, also shownin the ﬁgure. The binner spike data is modeled as a Bernoulliprocess. The baseline is learned from the data from theﬁrst ﬁve trials. The sequence { X n } for the algorithm isobtained by concatenating the data from different trials as asingle binary sequence. As seen in the ﬁgure, the algorithmsuccessfully detect the change indicated by a change in thedrift of the statistic.In Fig. 4, we have shown spike data where the responseis delayed. For this type of data, it may be hard to detect thechange using the techniques used for Fig. 3 due to a lack ofpersistent ﬁring. As a result, we apply the spectrum based Fig. 4: The Deviation-CUSUM algorithm applied to delayedbinned spike data. The algorithm detects the change in theﬁring pattern starting trial 15.technique (17) to the data. Here, to obtain the spectrum, weused d equal to the length of a trial. Also, the value of W n is computed only at the beginning of the trials. Hence, thetime index here is trials rather than the bin-level slots. Thebaseline is again learned from the ﬁrst ﬁve trials. As seenin the ﬁgure, the Deviation-CUSUM detects the change inﬁring pattern here as well.R EFERENCES[1] R. P. Rao,

Brain-computer interfacing: an introduction . CambridgeUniversity Press, 2013.[2] D. A. Markowitz, Y. T. Wong, C. M. Gray, and B. Pesaran, “Opti-mizing the decoding of movement goals from local ﬁeld potentials inmacaque cortex,”

Journal of Neuroscience , vol. 31, no. 50, pp. 18412–18422, 2011.[3] T. Banerjee, J. Choi, B. Pesaran, D. Ba, and V. Tarokh, “Classiﬁcationof local ﬁeld potentials using gaussian sequence model,” in

Proc. ofthe IEEE Statistical Signal Processing Workshop , Apr. 2018.[4] T. Banerjee, J. Choi, B. Pesaran, D. Ba, and V. Tarokh, “Waveletshrinkage and thresholding based robust classiﬁcation for brain com-puter interface,” in

Proc. of the 43rd IEEE International ConferenceonAcoustics, Speech and Signal Processing , July 2018.[5] H. V. Poor and O. Hadjiliadis,

Quickest detection . CambridgeUniversity Press, 2009.[6] A. G. Tartakovsky, I. V. Nikiforov, and M. Basseville,

SequentialAnalysis: Hypothesis Testing and Change-Point Detection . Statistics,CRC Press, 2014.[7] V. V. Veeravalli and T. Banerjee,

Quickest Change Detection . Aca-demic Press Library in Signal Processing: Volume 3 – Array andStatistical Signal Processing, 2014. http://arxiv.org/abs/1210.5552 .[8] G. Lorden, “Procedures for reacting to a change in distribution,”

Ann.Math. Statist. , vol. 42, pp. 1897–1908, Dec. 1971.[9] M. Pollak, “Optimal detection of a change in distribution,”

Ann.Statist. , vol. 13, pp. 206–227, Mar. 1985.[10] E. S. Page, “Continuous inspection schemes,”

Biometrika , vol. 41,pp. 100–115, June 1954.[11] T. L. Lai, “Information bounds and quick detection of parameterchanges in stochastic systems,”

IEEE Trans. Inf. Theory , vol. 44,pp. 2917 –2929, Nov. 1998.[12] T. L. Lai, “Sequential changepoint detection in quality control anddynamical systems,”

J. Roy. Statist. Soc. Suppl. , vol. 57, no. 4, pp. pp.613–658, 1995.[13] T. Banerjee, H. Firouzi, and A. O. Hero III, “Quickest detectionfor changes in maximal knn coherence of random matrices,”

IEEETransactions on Signal Processing , vol. 66, no. 17, pp. 4490–4503,2018.[14] P. J. Brockwell and R. A. Davis,

Time series: theory and methods .Springer Science & Business Media, 2013.[15] M. B. Priestley, “Evolutionary spectra and non-stationary processes,”

Journal of the Royal Statistical Society. Series B (Methodological) ,pp. 204–237, 1965.[16] Y. Zhang, N. Malem-Shinitski, S. A. Allsop, K. M. Tye, and D. Ba,“Estimating a separably markov random ﬁeld from binary observa-tions,”