[PDF] Smooth Online Parameter Estimation for time varying VAR models with application to rat's LFP data

Abstract

Multivariate time series data appear often as realizations of non-stationary processes where the covariance matrix or spectral matrix smoothly evolve over time. Most of the current approaches estimate the time-varying spectral properties only retrospectively - that is, after the entire data has been observed. Retrospective estimation is a major limitation in many adaptive control applications where it is important to estimate these properties and detect changes in the system as they happen in real-time. One major obstacle in online estimation is the computational cost due to the high-dimensionality of the parameters. Existing methods such as the Kalman filter or local least squares are feasible. However, they are not always suitable because they provide noisy estimates and can become prohibitively costly as the dimension of the time series increases. In our brain signal application, it is critical to develop a robust method that can estimate, in real-time, the properties of the underlying stochastic process, in particular, the spectral brain connectivity measures. For these reasons we propose a new smooth online parameter estimation approach (SOPE) that has the ability to control for the smoothness of the estimates with a reasonable computational complexity. Consequently, the models are fit in real-time even for high dimensional time series. We demonstrate that our proposed SOPE approach is as good as the Kalman filter in terms of mean-squared error for small dimensions. However, unlike the Kalman filter, the SOPE has lower computational cost and hence scalable for higher dimensions. Finally, we apply the SOPE method to a rat's local field potential data during a hippocampus-dependent sequence-memory task. As demonstrated in the video, the proposed SOPE method is able to capture the dynamics of the connectivity as the rat performs the sequence of non-spatial working memory tasks.

Full PDF

SSubmitted to the Annals of Applied Statistics

SMOOTH ONLINE PARAMETER ESTIMATION FOR TIME VARYING VARMODELS WITH APPLICATION TO RAT’S LFP DATA B Y A NASS E L Y AAGOUBI B OURAKNA ,M ARCO P INTO ,N ORBERT F ORTIN AND H ERNANDO O MBAO Statistics Program, King Abdullah University of Science and Technology, [email protected],[email protected] Oslo Metropolitan University, [email protected] University of California Irvine, [email protected]

Multivariate time series data appear often as realizations of non-stationary processes where the covariance matrix or spectral matrix smoothlyevolve over time. Most of the current approaches estimate the time-varyingspectral properties only retrospectively - that is, after the entire data has beenobserved. Retrospective estimation is a major limitation especially in situa-tions where it is important to estimate these properties and detect changesin the system as they happen in real-time. Thus, the goal of this paper is todevelop a method that estimates the statistical properties in real-time. On-line estimation allows a real-time update of the time-varying parameters asnew observations arrive, it is a critical task in many adaptive control ap-plications. One approach to modeling non-stationary time series is to ﬁttime-varying vector autoregressive models (tv-VAR). However, one majorobstacle in online estimation of such models is the computational cost dueto the high-dimensionality of the parameters. Existing methods such as theKalman ﬁlter, parametric models or local least squares are feasible but, forsome particular situations, they are not always suitable because they providenoisy estimates and can become prohibitively costly as the dimension of thetime series increases. In our brain signal application, it is critical to develop arobust method that can estimate, in real-time, the underlying stochastic pro-cess, in particular, the spectral brain connectivity measures. For these reasonswe propose a new smooth online parameter estimation approach (SOPE) thathas the ability to control for the smoothness of the estimates with a reason-able computational complexity. Consequently, we are able to ﬁt models, inreal-time, for high dimensional time series. We demonstrate that our proposedSOPE approach can be as good as the Kalman ﬁlter in terms of mean-squarederror for small dimensions. However, unlike the Kalman ﬁlter, the SOPE hasmuch smaller computational cost and hence scalable for higher dimensions.Finally, we apply the SOPE method to a rat local ﬁeld potential data dur-ing a hippocampus-dependent sequence-memory task. The proposed SOPEmethod is able to capture the dynamics of the connectivity as the rat performsthe sequence of non-spatial working memory tasks.

Keywords and phrases: online parameter estimation, locally stationary processes, dynamic brain connectivity,dynamic spectral connectivity. a r X i v : . [ s t a t . M E ] F e b

1. Introduction.

Many time series data recorded from many ﬁelds, such as ﬁnance,econometrics, biology, atmospheric sciences and neuroscience, exhibit nonstationarity. Thesetime series data may have second moment structure (variance and cross-covariance) that isnot constant over the duration of the observation period. In this paper, we examine the lo-cal ﬁeld potential (LFP) signals recorded from a rat brain (see Figure 1 and 2) that displaychanges in the second moment as the rat brain responds to the sequence of odors presentedduring an experiment. Our goal here is to develop a computationally efﬁcient approach forestimating rat brain functional connectivity that can track these changes in real-time. F IG . Non-spatial memory task experiment. T3 channelT4 channelT5 channelT6 channelT10 channelT18 channelT20 channel F IG . Rat’s LFP recordings from seven channels. There are many available methods for analyzing non-stationary time series data. The mostcommon approach is to model the data as a realization of some "locally stationary" processwhich essentially assumes that the second moment structure (i.e., the covariance matrix orequivalently the spectral matrix) is approximately constant within a narrow time interval.This idea was introduced in Priestley (1965) and then reformulated in Dahlhaus (1997) toadmit consistent estimators for the time-varying spectral matrix. These two approaches usethe Fourier complex exponentials as the stochastic building blocks for representing signals.However, there are other potential approaches for signal representation, see Ombao, Sachsand Guo (2005) and Nason, Von Sachs and Kroisandt (2000) for approaches that use stochas-tic representations with time-localized bases (e.g., wavelets, wavelet packets and the SLEX- smooth localized complex exponentials). Thus, using these stochastic representations, onecan then deﬁne the time-varying auto-spectra and time-varying cross-coherence.Another approach to modeling non-stationary signals is to use time-domain representa-tions with time-varying coefﬁcients. Vector autoregressive (VAR) models became very popu-lar over the last forty years, mainly because they are a natural extension of the autoregressivemodel to the multivariate and thus can capture the cross-dependence between different com-ponents. In a VAR model, we explicitly express the conditional mean of one time series as alinear combination of its own past as well as the past values of the other time series. Thus,the VAR model provides a direct mechanism for forecasting future values of one time seriesbased on its own past values as well as the past value of others, see Sims (1980) and Lütkepohl(1991). In general, VAR models are extensively used in numerous ﬁelds such as economics,weather forecasting and brain imaging, see Brockwell and Davis (1986) and Shumway andStoffer (2005). Some of the main qualities that make VAR models attractive to practitionersare mainly their ability to capture contemporaneous and lagged linear relationships betweendifferent components of a time series and also their ability to provide analytical deﬁnitionsof connectivity based on the spectral density information, see Granger (1969), Baccala andSameshima (2001), Wang, Ting and Ombao (2016), Ting et al. (2018). For a technical re-view of the VAR models see Lütkepohl (1991). There are some approaches that allow ofﬂineestimation of the parameters of high-dimensional time series under the VAR model. For ex-ample the use of regularization techniques, such as the l regularization (LASSO-type) can MOOTH ONLINE PARAMETER ESTIMATION ensure sparsity in high-dimensional settings, see Basu and Michailidis (2015). Other exam-ples leverage the low-rank property of the transition matrix of the VAR model to gain insighton the network of Granger causal interactions between time series components, see Basu, Liand Michailidis (2019).Over the last two decades, several methods have been proposed for analyzing brain signalsincluding Ombao, Sachs and Guo (2005), Cole et al. (2010), Park, Eckley and Ombao (2014),Samdin et al. (2016), Nason, Von Sachs and Kroisandt (2000), Ting et al. (2018), Li et al.(2019) and Zhao and Prado (2020). However, none of these methods can track changes inbrain signals in real-time. This is a limitation since these methods cannot provide immediatefeedback to the experimenter on the efﬁcacy of the stimulus (type and intensity) that is beingapplied. Hence, the adequacy of the experiment settings can be checked only after the dataacquisition process is over. Here, we will develop a procedure that allows the experimenterto adapt the stimulus type or stimulus intensity over the course of the entire experiment. Thiswill avoid unnecessary repetitions of the experiment. In human studies, it will be difﬁcult tosubject a participant to repeat an experiment over and over again. Our goal in this paper is to develop a procedure that is able to describe these changesof connectivity in brain signals in real-time – rather than after the end of the experiment.These non-stationary brain signals will be modeled using time-varying vector autoregres-sive (tv-VAR) processes. Under the tv-VAR framework, we will develop an online procedurefor estimating the parameters. Consequently, the procedure will also be able to estimate anyfunctional of these parameters including cross-coherence, partial cross-coherence and par-tial directed coherence. While the tv-VAR model has the ability to capture the dynamics inthe signal, see Pagnotta and Plomp (2018), this comes at a heavy computational cost due tothe high dimensionality of the parameter space (which is quadratic in the dimension of themultivariate time series). In this paper we present a novel smooth online parameter estima-tion procedure using the penalized least squares criterion. Compared to the industry standard,which is the Kalman ﬁlter, our method has the advantage of being computationally very efﬁ-cient and that it has a Bayesian interpretation. In Section 2 we describe the model, introducethe idea behind smooth online parameter estimation (SOPE) as well as the inﬁll behaviorof our estimator as a function of the penalization parameters. In Section 3 we present a briefsummary/recall of some classical brain connectivity measures that rely on the tv-VAR model.In Section 4 we show that our method can provide similar estimates to the Kalman ﬁlter’sin terms of mean square error (MSE) using simulated data, with the advantage of having amuch shorter computational time. In Section 5 we apply our method to LFP data taken froma rat during a hippocampus-dependent sequence-memory task, our method show results thatare consistent with results found in the literature. Finally, in Section 6 we present a briefdiscussion of our results.

2. Smooth Online Parameter Estimation (SOPE).

Model deﬁnition.

Let X ( t ) = [ X ( t ) , ..., X P ( t )] (cid:48) be a P dimensional time series ofa network with P nodes. X ( t ) could represent the observed brain signal (e.g., electroen-cephalogram or local ﬁeld potential) over P locations (electrodes) on the scalp or cortex,over time points t = 1 , , . . . , T . Thus, the tv-VAR model of order K , denoted tv-VAR( K ),is deﬁned as follows: X ( t ) = K (cid:88) (cid:96) =1 Φ t,(cid:96) X ( t − (cid:96) ) + E ( t ) , (1) NORBERT: please help explain the importance of real-time estimation. where the set { Φ t,(cid:96) } K(cid:96) =1 represents the tv-VAR parameters at time t ( K, P × P matrices),with E ( t ) being the Gaussian white noise term with zero mean and variance Σ E . In thefollowing we will denote the concatenated unknown time varying parameter matrices by Φ( t ) , and we will denote the concatenated last K observations at time t by U ( t ) : Φ( t ) = [Φ t, , . . . , Φ t,K ] , (2) U ( t ) = [ X ( t − (cid:48) , . . . , X ( t − K ) (cid:48) ] (cid:48) , (3)Now the model deﬁnition simpliﬁes to X ( t ) = Φ( t ) U ( t ) + E ( t ) (4)and the parameter Φ( t ) will be estimated in real-time.2.2. Recursive least squares estimators.

The least squares is a very powerful and fun-damental estimation technique in statistics, which dates back to the late seventeenth cen-tury/early eighteen, thanks to the work of Carl Friedrich Gauss and Adrien-Marie Legendre.For almost a century and a half the least squares estimation procedure did not attract muchattention, until the early forties thanks to the work of Kolmogorov (1939) and Wiener (1949),who independently introduced the ﬁeld of linear ﬁltering from the stochastic processes pointof view. It was only near a decade later thanks to the work of Robin Plackett (1950) that theefﬁcient matrix formulation of the recursive least squares (RLS) was formulated. The fol-lowing major development to the ﬁeld of linear ﬁltering was made by Kalman (1960) whoproposed a new approach using the state space formulation of the problem. For more detailson the historical development of the RLS/linear ﬁltering see Sorenson (1970) and Young(2011). Dahlhaus and Subba Rao investigated online and recursive inference for time vary-ing ARCH models in Dahlhaus and Subba Rao (2006) and Dahlhaus and Subba Rao (2007).However, their model can only handle a one dimensional time series, which is not suitablefor multivariate time series, as it is often the case for brain signals. There are two main paradigms to the RLS. First, there is the classical RLS which corre-sponds to the stationary case where the parameters are static. Second, there is the weightedRLS which corresponds to the non-stationary case where parameters are evolving over time.Naturally, our interest lies in the second case. Weighted RLS can again be decomposed intwo main approaches: sliding window and exponentially weighted RLS.The sliding window recursive least squares (SWRLS) uses a ﬁxed number of observa-tions around the point of interest to estimate the parameters. Which results in time varyingparameter estimates since the estimator only borrows information from time neighboring ob-servations. However, this approach suffers form high sensitivity to noise in the observations.And the only solution to this problem consists in taking a larger window (more observations),but this results in a more biased estimator (variance-bias trade-off).Similarly, the exponentially weighted least squares (EWRLS) uses exponentially decayingweights to estimate the time varying parameters. Therefore, this approach uses all past ob-servations, by giving a smaller weight to older observations it limits their contribution to thepresent estimate. In order to control the variance-bias trade-off, a forgetting factor betweenzero and one has to be selected. Where a forgetting factor close to zero will give more impor-tance to recent observations and therefore be less biased but with a higher variance. And aforgetting factor close one will result in a more biased estimator but with a smaller variance.There is an extensive literature concerning the choice of the decaying weights for the RLSalgorithms depending on the problem at hand. However, the ﬂexibility of such algorithms is I added the following for the work of Dahlhaus and Subba Rao on tvARCHMOOTH ONLINE PARAMETER ESTIMATION limited due to the linear nature of the estimator. The above mentioned approaches all try toreduce the variance by creating a more biased estimator. However, we can not leverage theprior information concerning the smoothness of the estimated curves, as it is the case underlocal stationarity. As opposed to our method, that has the ability to penalize the roughness ofthe estimated curves which results in a smaller variance.2.3. Kalman ﬁlter.

The Kalman ﬁlter is also a recursive algorithm. Under some simplis-tic conditions it can be equivalent to the RLS algorithm, it is however usually viewed as ageneralization of RLS. In order to estimate the parameters of the tv-VAR model ( Φ( t ) ) us-ing the Kalman ﬁlter, we need to formulate the problem under the framework of state-spacemodels. First, vectorize the parameters (5) and deﬁne observation matrix (6): a t = vec (Φ( t ) (cid:48) ) , (5) C t = I P ⊗ vec ([ X (cid:48) ( t − , ..., X (cid:48) ( t − K )]) . (6)Using the above notation, the formal model is deﬁned in terms of the state transition andobservation model, respectively, as follows: a t = a t − + w t , (7) X ( t ) = C t a t + v t , (8)where w t and v t are transition noise and observation noise. The dynamics of the parametersare governed by Equation 7. Under appropriate assumptions, the Kalman ﬁlter (see Kalman(1960)) can deliver optimal results in terms of minimal variance of the error. However, theproblem at hand is more complicated because the optimal parameters of the ﬁlter are un-known and will have to be estimated along with the tv-VAR parameters. Hence, the Kalmanﬁlter will not necessarily be optimal in the sense of minimal error variance. Indeed, the au-tomatic selection of the ﬁlter variance matrices is possible, but that is another problem on itsown, for more details on adaptive ﬁltering see Mehra (1972), Lippuner and Moschytz (2004),Emami and Taban (2018).There are two main limitations attached with the Kalman ﬁlter as a potential tool for onlineestimation. The primary problem is the high level of uncertainty (variance) of the parame-ter estimators. Consequently, since connectivity measures (e.g., coherence, partial directedcoherence) are highly non-linear functions of the tv-VAR parameters, small perturbations inthe tv-VAR estimates can substantially change the connectivity estimates. Thus, resulting ina even higher level of uncertainty in the connectivity estimators. The other problem of theKalman ﬁlter is its prohibitive computational cost. Indeed, the Kalman ﬁlter requires manip-ulation of a covariance matrix of the state vector, which in our case is the vectorized form ofthe unknown parameters. Since the state vector lives in a KP -dimensional space, its covari-ance matrix will be very expensive to compute and to store in memory when the dimension P of the problem is high, which is the case with EEG and LFP data where P is in the range ∼ .2.4. Proposed Method: SOPE.

The SOPE method we propose leverages the local sta-tionarity assumption to provide real-time estimates of the tv-VAR model parameters. For thelocal stationarity assumption to hold, some smoothness conditions over the parameters mustbe made. Thus, in our approach we consider the parameters to be differentiable to some givenorder with respect to rescaled time.Under Gaussianity ( E ( t ) iid ∼ N (0 , Σ E ) ), the generalized least squares approach (GLS) isequivalent to maximizing the conditional likelihood. Using the notation from section 2.1 Equations 2 and 3 we get the the following: X ( t ) (cid:12)(cid:12) Φ( t ) , U ( t ) ∼ N (Φ( t ) U ( t ) , Σ E ) = ⇒ (cid:98) Φ( t ) = arg min b ∈ R P × KP (cid:12)(cid:12)(cid:12)(cid:12) X ( t ) − bU ( t ) (cid:12)(cid:12)(cid:12)(cid:12) − E . (9)In this particular problem the MLE will not be an optimal choice for the following reasons.First, the problem is ill posed since U ( t ) U ( t ) (cid:48) is singular, we only have one observation X ( t ) at time t to estimate the tv-VAR parameters Φ( t ) . Second, even if we solve this issue by usinga ridge regularization for example, the obtained estimator will be biased (shrinkage to zero)and will have a high variance due to the fact that we only use the data for a single time pointfor estimation. Therefore, we need to borrow information from neighboring time points but,since our goal is to develop an online estimation method, this neighborhood will consist onlyof past observations.Our proposed method is based on the assumption that the immediate past observationscontain relevant information for the current observations, which is reasonable under the localstationarity assumption. Thus, the proposed method will be constructed under the frameworkof locally stationary processes. Moreover, one important assumption here is that, for eachtime t , the true physiological signal-generating process can be approximated by a VAR modelwith parameters Φ( t ) that change smoothly with respect to time. Therefore, a meaningfulapproach is to obtain an estimator that is reasonably smooth and at the same time provides agood ﬁt (based on least squares and/or likelihood criterion). This can be formalized in termsof the penalized least squares criterion below: (cid:98) Φ( t ) = arg min b ∈ R P × KP (cid:12)(cid:12)(cid:12)(cid:12) X ( t ) − bU ( t ) (cid:12)(cid:12)(cid:12)(cid:12) − E + λP ( b ) , (10)where λ > is a regularization parameter, and P ( b ) denotes a penalization term that controlsthe smoothness of the estimated function (cid:98) Φ( t ) . In this paper, we will investigate differentforms of the penalty term P ( b ) including the Frobenius norm of the ﬁrst order differencewith previous estimates or the second order difference or a combination of the two: P ( b ) = (cid:12)(cid:12)(cid:12)(cid:12) b − (cid:98) Φ( t − (cid:12)(cid:12)(cid:12)(cid:12) F , (11) P ( b ) = (cid:12)(cid:12)(cid:12)(cid:12) b − (cid:98) Φ( t −

1) + (cid:98) Φ( t − (cid:12)(cid:12)(cid:12)(cid:12) F , (12) P ( b ) = (cid:12)(cid:12)(cid:12)(cid:12) b − (cid:2)(cid:98) Φ( t −

1) + β (cid:0)(cid:98) Φ( t − − (cid:98) Φ( t − (cid:1)(cid:3)(cid:12)(cid:12)(cid:12)(cid:12) F . (13)The penalty function P ( b ) in Equation 11 corresponds to the ﬁrst order difference penaliza-tion; P ( b ) in Equation 12 corresponds to the second order difference penalization. Clearly, P ( b ) in Equation 13 is a combination of the previous two penalty functions with penalizationparameters λ and λ , where λ = λ + λ and β = λ λ + λ . For the following derivations, wewill assume that Σ E = I . This assumption will greatly simplify computations without unnec-essarily limiting the ﬂexibility of the tv-VAR model for capturing the connectivity structureof the brain network. This assumption does not imply that the components of the multivariatetime series are independent but it implies that the connectivity in the brain network is fullycaptured by Φ( t ) . Moreover, this assumption also allows the computation to be carried out inan online fashion that is computationally robust. The previous problems are all quadratic in b . Therefore, minimizing with respect to b leads to the following recursive formulas: the Frobenius norm of a matrix A is the square root of the sum of its elements squared, i.e., || A || F = (cid:112) tr ( A (cid:48) A ) MOOTH ONLINE PARAMETER ESTIMATION (cid:98) Φ( t ) = (cid:16) X ( t ) U ( t ) (cid:48) + λ (cid:98) Φ( t − (cid:17) (cid:16) U ( t ) U ( t ) (cid:48) + λI (cid:17) − , (14) (cid:98) Φ( t ) = (cid:16) X ( t ) U ( t ) (cid:48) + λ (cid:2)(cid:98) Φ( t −

1) + (cid:0)(cid:98) Φ( t − − (cid:98) Φ( t − (cid:1)(cid:3)(cid:17) (cid:16) U ( t ) U ( t ) (cid:48) + λI (cid:17) − , (15) (cid:98) Φ( t ) = (cid:16) X ( t ) U ( t ) (cid:48) + λ (cid:2)(cid:98) Φ( t −

1) + β (cid:0)(cid:98) Φ( t − − (cid:98) Φ( t − (cid:1)(cid:3)(cid:17) (cid:16) U ( t ) U ( t ) (cid:48) + λI (cid:17) − . (16)In order to invert U ( t ) U ( t ) (cid:48) + λI efﬁciently, it is necessary to use the Sherman-Morrison-Woodbury inversion formula, see Hager (1989).2.5. Bayesian interpretation.

The tv-VAR parameters are assumed to vary smoothly overtime and therefore they must have a Taylor expansion at every time point t . Hence, Φ( t ) ∼ Φ( t −

1) + ˙Φ( t − , where the derivative ˙Φ( t − can be approximated by Φ( t − − Φ( t − which leads to Φ( t ) ∼ Φ( t −

1) + (cid:0) Φ( t − − Φ( t − (cid:1) . Now depending on the informationavailable on the smoothness of the estimated function and on the density of the observations,i.e., the ∆ t between consecutive observations X ( t ) and X ( t − , we will have more or lessconﬁdence in the previous approximation leading to Φ( t ) ∼ Φ( t −

1) + β (cid:0) Φ( t − − Φ( t − (cid:1) with β close to one when the function is smooth and when the observations density isvery high and close to zero otherwise.Thus, given estimates (cid:98) Φ of Φ at times t − and t − , a potential prior for the parametersat time t would be N ( µ, Σ) , with µ = (cid:98) Φ( t −

1) + β (cid:0)(cid:98) Φ( t − − (cid:98) Φ( t − (cid:1) and Σ = λ − I ,here λ is the precision parameter of the prior and it is supposed to represent our conﬁdenceon the "location" of the next iterate, and I is a KP × KP identity matrix, the smootherthe function the higher the λ . Of course, moving forward, these parameters need to be vec-torized for the interpretation to make sense. Under Gaussianity of the noise, this Bayesianformulation implies the following conditional posterior: f (cid:0) b (cid:12)(cid:12) X ( t ) , U ( t ) (cid:1) ∝ f (cid:0) X ( t ) (cid:12)(cid:12) b, U ( t ) (cid:1) f (cid:0) b (cid:1) (17)Therefore, we deﬁne the point estimate (cid:98) Φ( t ) of Φ( t ) at time t to be: (cid:98) Φ( t ) = arg max b ∈ R P × KP f (cid:0) X ( t ) (cid:12)(cid:12) b, U ( t ) (cid:1) f (cid:0) b (cid:12)(cid:12) I t − (cid:1) , (18) = arg min b ∈ R P × KP (cid:12)(cid:12)(cid:12)(cid:12) X ( t ) − bU ( t ) (cid:12)(cid:12)(cid:12)(cid:12) − E + λP ( b ) . Notice that the maximization problem of the (log-)posterior in Equation 18 is equivalent tothe minimization problem stated in Equation 10, with penalties deﬁned in Equations 11, 12and 13 corresponding to different prior choices. Note that having β = 0 is equivalent to thechoice of a Gaussian prior that assumes a constant function. This is consistent with penalizingthe gradient since this penalization will have as a result the vanishing of the estimate’s gradi-ent. Moreover, taking β = 1 is equivalent to having a Gaussian prior that assumes a constantgradient/linear function of time. This is consistent with penalizing the curvature since thispenalization produces a vanishing second derivative. Note that it is possible to pursue higherorder penalties, one could extend this reasoning indeﬁnitely to chose the corresponding priorthat assumes some degree of smoothness.Given the penalization parameters α , β and the time series X ( t ) the following recursivealgorithm (SOPE) provides online estimates of the parameters: Algorithm 1

Smooth Online Parameter Estimation for tv-VAR models procedure G ET S MOOTH E STIMATES ( X (1) , . . . , X ( T ) )2: Initialize:3: (cid:98) Φ( K − = Least Squares4: (cid:98) Φ( K ) = (cid:98) Φ( K − α ∈ (0 , ∞ ) , β ∈ [0 , . . . , for t = K + 1 , . . . , T do U ( t ) = (cid:2) X ( t − (cid:48) , . . . , X ( t − K ) (cid:48) (cid:3) (cid:48) A = X ( t ) U ( t ) (cid:48) + λ (cid:104)(cid:98) Φ( t −

1) + β (cid:0)(cid:98) Φ( t − − (cid:98) Φ( t − (cid:1)(cid:105) B = (cid:0) U ( t ) U ( t ) (cid:48) + λI (cid:1) − (cid:98) Φ( t ) = AB end for end procedure Inﬁll asymptotics.

Leaving the stationarity world results in many complications.Classical asymptotic results do not necessarily apply here, since the distant future or the dis-tant past might not provide relevant information about the present due to the nonstationarity.Therefore, by considering locally stationary processes, one should look into inﬁll asymp-totics on the parameter Φ( t ) . Following Dahlhaus (2012), we analyze the behavior of ourestimator using inﬁll asymptotics by rescaling the time index of the tv-VAR parameter to theunit interval [0 , . Deﬁne u = tT , then as T → ∞ , the range of u will become dense in [0 , .Asymptotically, when T → ∞ , the time between consecutive observations becomes in-ﬁnitesimal, i.e., h = T → dt . Furthermore, if the penalization coefﬁcients are selected cor-rectly, the ﬁnite difference penalization terms will converge to the derivatives, see Appendix7.2. Moreover, minimizing the problem globally ( ∀ t ∈ [1 , . . . , T ] ) with penalty terms as de-ﬁned in Equations 11 and 12 will asymptotically result into the following calculus of variationproblems:  L ( u, b, ˙ b ) = (cid:12)(cid:12)(cid:12)(cid:12) X ( (cid:98) uT (cid:99) ) − b ( u ) U ( (cid:98) uT (cid:99) ) (cid:12)(cid:12)(cid:12)(cid:12) + c (cid:12)(cid:12)(cid:12)(cid:12) ˙ b ( u ) (cid:12)(cid:12)(cid:12)(cid:12) F J [ b ] = (cid:82) L ( u, b, ˙ b ) du (cid:98) Φ = arg min b ∈ B J [ b ] , (19)  L ( u, b, ˙ b, ¨ b ) = (cid:12)(cid:12)(cid:12)(cid:12) X ( (cid:98) uT (cid:99) ) − b ( u ) U ( (cid:98) uT (cid:99) ) (cid:12)(cid:12)(cid:12)(cid:12) + c (cid:12)(cid:12)(cid:12)(cid:12) ¨ b ( u ) (cid:12)(cid:12)(cid:12)(cid:12) F J [ b ] = (cid:82) L ( u, b, ˙ b, ¨ b ) du (cid:98) Φ = arg min b ∈ B J [ b ] , (20)where the ﬁrst equations represent the Lagrangian which is just the least squares term (orthe likelihood of the observations) plus a penalty term for the roughness of the function, thesecond equations represent the functionals to be minimized over B (class of smooth enoughfunctions). To solve such problems, we consider the necessary condition for optimality, alsoknown as Euler-Lagrange equations (see Appendix 7.2 for more details and Gelfand andFomin (1964) for more background on Euler-Lagrange equations and calculus of variation ingeneral): ¨ (cid:98) Φ( u ) = 1 c ∇ (cid:98) Φ || X ( (cid:98) uT (cid:99) ) − (cid:98) Φ U ( (cid:98) uT (cid:99) ) || , (21) ... (cid:98) Φ( u ) = − c ∇ (cid:98) Φ || X ( (cid:98) uT (cid:99) ) − (cid:98) Φ U ( (cid:98) uT (cid:99) ) || . (22) MOOTH ONLINE PARAMETER ESTIMATION If we penalize the derivative with c being small (asymptotically "weak" penalization) the so-lution boils down to the least squares estimator. However, if c is large, the second derivativehas to vanish which implies that the solution is afﬁne: (cid:40) c → ⇒ ∀ u, ∇ (cid:98) Φ || X ( (cid:98) uT (cid:99) ) − (cid:98) Φ( u ) U ( (cid:98) uT (cid:99) ) || → c → ∞ = ⇒ ∀ u, ¨ (cid:98) Φ( u ) → , Similarly, if we penalize the second derivative with c being small (asymptotically "weak"penalization) the solution boils down to the least squares estimator. However, if c is large,the third derivative vanishes which means that the solution is quadratic in time: (cid:40) c → ⇒ ∀ u, ∇ (cid:98) Φ || X ( (cid:98) uT (cid:99) ) − (cid:98) Φ( u ) U ( (cid:98) uT (cid:99) ) || → c → ∞ = ⇒ ∀ u, ... (cid:98) Φ( u ) → . Therefore, we can see that asymptotically our procedure will lead to an estimator that isbetween the LSE (or MLE) on one side and the best linear (or parabolic) curves on the otherside, depending on the kind of regularization that is being used.If Σ E is unknown, it needs to be estimated online. Therefore, the previous results will notbe valid anymore. However, if the covariance estimates improve as more data is observed,then previous results will hold after some burn in period that is necessary for getting a goodestimate for Σ E , see Appendix 7.1 for the generalized SOPE algorithm.

3. Online Estimates of Brain Connectivity Measures.

The pioneering work of FranzJoseph Gall on the anatomy of the brain triggered a long process of scientiﬁc endeavor whichaim is to localize mental functions in the brain, also known as functional brain segregation.With the technological advances that have been made during the last century in neuroimagingtechniques, neuroscientists made numerous discoveries that brought light upon the map of thebrain. However, the brain remains a mysterious organ in many ways. The idea of functionalbrain integration is much more difﬁcult to assess because it involves the notion of causality,see Friston (2011) for more details on functional segregation and integration of the brain.Functional connectivity is often deﬁned as the statistical dependence between distant pop-ulations/groups of neurons, it is usually assessed using correlation (time domain) or coher-ence (frequency domain). Over the last twenty years there have been an increasing trend inthe literature related to the assessment of effective connectivity, which can be understood asthe inﬂuence of a group of neurons over another, see Cribben et al. (2012), Xu and Lindquist(2015), Fiecas and Ombao (2016), Zhou et al. (2016), Ombao et al. (2018), Ting et al. (2020).Consequently, effective connectivity is a measure of causal inﬂuence/information ﬂow be-tween two regions. Baccala and Sameshima developed the concept of partial directed coher-ence (PDC) between groups of neurons which measures the direction of information ﬂowinside the brain structure, see Saito et al. (1981) and Baccala and Sameshima (2001).In this paper, we mention three measures of connectivity, coherence, partial coherence andpartial directed coherence. However, we will only use coherence and partial directed coher-ence. Coherence is probably the most frequently used measure of association in the frequencydomain. It measures the linear association between signals, it can also be interpreted as therelative synchrony two signals, it is the frequency analog of cross-correlation. Partial coher-ence between two signals can be deﬁned as the conditional coherence when all other signalshave been observed. In other words partial coherence is the remaining coherence that cannotbe explained by all other signals. Partial directed coherence can be deﬁned as the conditionalGranger causality from one signal to another normalized by the total causal outﬂow from theﬁrst signal. In the following, we provide the parametric deﬁnitions of these spectral measures of con-nectivity in the context of a tv-VAR model. First, we deﬁne the following time varyingquantities of interest at frequency ω with sampling frequency ω s : Φ( t, ω ) = I P − K (cid:88) (cid:96) =1 Φ t,(cid:96) exp ( −√− π(cid:96)ω/ω s ) , (23) H ( t, ω ) = Φ( t, ω ) − , (24) S ( t, ω ) = H ( t, ω )Σ E H ( t, ω ) ∗ , (25) G ( t, ω ) = S ( t, ω ) − , (26) Γ( t, ω ) = Diag ( G ( t, ω )) − / G ( t, ω ) Diag ( G ( t, ω )) − / , (27) S ( t, ω ) is called the spectral matrix of the time series, S i,j ( t, ω ) represents the cross spectrumbetween channel i and channel j at frequency ω and time t , H ( t, ω ) is the time-varying andfrequency-speciﬁc transfer function matrix and Γ( t, ω ) is the partial coherence matrix. Usingthe previous notation in Equations 23, 24, 25, 26 and 27 we can provide the parametricdeﬁnitions the above spectral measures of dependence between channel i and channel j atfrequency ω and time t : ρ i,j ( t, ω ) = | S i,j ( t, ω ) | S i,i ( t, ω ) S j,j ( t, ω ) , (28) γ i,j ( t, ω ) = Γ i,j ( t, ω ) , (29) π i,j ( t, ω ) = (cid:12)(cid:12) Φ i,j ( t, ω ) (cid:12)(cid:12)(cid:113)(cid:80) Pk =1 (cid:12)(cid:12) Φ k,j ( t, ω ) (cid:12)(cid:12) , (30)The previous Equations 28, 29 and 30 represent respectively coherence, partial coherenceand partial directed coherence. To estimate these spectral connectivity measures at time t we ﬁrst ﬁt the tv-VAR model then we plug in the formulas to get the quantities of interest.Therefore, online estimates of the tv-VAR model parameters will naturally provide onlineplugin estimator for spectral connectivity: (cid:98) ρ i,j ( t, ω ) = | (cid:98) S i,j ( t, ω ) | (cid:98) S i,i ( t, ω ) (cid:98) S j,j ( t, ω ) , (31) (cid:98) γ i,j ( t, ω ) = (cid:98) Γ i,j ( t, ω ) , (32) (cid:98) π i,j ( t, ω ) = (cid:12)(cid:12)(cid:98) Φ i,j ( t, ω ) (cid:12)(cid:12)(cid:113)(cid:80) Pk =1 (cid:12)(cid:12)(cid:98) Φ k,j ( t, ω ) (cid:12)(cid:12) , (33)

4. Simulation Studies.

In order to investigate the SOPE method, we need to have accessto a model with time varying parameters and connectivity. We use simulation to generateobservations from a tv-VAR model with known time varying parameters and connectivity. where Diag is the diagonal matrix operator that take a matrix and returns a diagonal matrix containing theoriginal diagonal, ∀ i, ( Diag ( A )) i,i = ( A ) i,i , and ( Diag ( A )) i,j = 0 , if i (cid:54) = j .MOOTH ONLINE PARAMETER ESTIMATION Computational time.

As we mentioned in the previous sections, our method has theadvantage of being computationally faster when compared to the Kalman ﬁlter, since we donot need to keep track of a covariance matrix of the vectorized parameters. We report in thetables bellow the average execution time per iteration in milliseconds, for the SOPE methodas well as for the Kalman ﬁlter:

EXECUTION TIME PER ITERATION FOR KALMAN FILTER P = 2 P = 5 P = 10 P = 15 P = 20 P = 25 K = 1 K = 3 K = 5 ABLE All durations are in milliseconds. P represents the dimension and K represents the order of the tv-VAR model. EXECUTION TIME PER ITERATION FOR SOPE P = 20 P = 50 P = 100 P = 150 P = 200 P = 250 K = 1 K = 3 K = 5 ABLE All durations are in milliseconds. P represents the dimension and K represents the order of the tv-VAR model. Typically, brain signals such as EEG and LFP are well known for their high time resolution(roughly from 256Hz to 2000Hz). Therefore, any method that aims to be applied in real-timeneeds to have an execution time in the order of a few milliseconds per iteration at most,in order to be able to update the estimates for every observation. Clearly, Tables 1 and 2show that the SOPE method is computationally faster than the Kalman ﬁlter by orders ofmagnitude. The SOPE method can handle higher dimensions in the order of 256 (dense EEGsetting), as opposed to the Kalman ﬁlter that can barely handle small dimensions below 25.Furthermore, our implementation uses the Python language which is known to be slow whencompared with lower level languages such as C. Therefore, an implementation of our methodin the C programming language can allow the SOPE method to be applied to even higherdimensions.4.2.

Parameter estimation.

In the following, we start by providing a simple example toshow the quality of the estimates obtained using the SOPE method, we then show a morerigorous comparison between the Kalman ﬁlter and the SOPE method in terms of averageSum of Square Errors (SSE).In the following example we used a ﬁve dimensional tv-VAR model with order K = 1 . Theﬁve components are split in two groups of size three and two, the two groups could potentiallybe represented by individual tv-VAR models of dimension three and two respectively. Everyentry of the coefﬁcient matrix Φ( t ) is either a cosine function with random amplitude andrandom phase shift or a zero function, i.e., Φ( t ) i,j = A i,j cos ( π tT + B i,j ) if i and j are in thesame group and Φ( t ) i,j = 0 otherwise, where A i,j is the random amplitude and B i,j is therandom phase shift. In order to ensure stationarity at every time t we make the amplitudesbigger when i = j and smaller when i (cid:54) = j . A straight forward implementation of Algorithm in Section 2.4 with identity noise co-variance matrix provides the following estimates (with . , . and . quantiles repre-sented by dotted lines based on 1000 sample estimates), see Figure 3: F IG . SOPE results for tv-VAR model: the plot in i th row and j th column represent the results for (cid:0) Φ ( t ) (cid:1) i,j (red line) and (cid:0)(cid:98) Φ ( t ) (cid:1) i,j (blue line), the 2.5, 50 and 97.5 percentiles (based on samples) are represented indoted blue lines. After obtaining the tv-VAR model parameter estimates we can compute the connectivitymeasures of interest, for example coherence and PDC as deﬁned in Section 3, see Figure 4.Equations 31 and 33 clearly show the nonlinear dependence between the tv-VAR parameterestimates and the measures of connectivity estimates. A small noise on the parameter esti-mates can result in a large deviation of the estimator of connectivity from the true values.Therefore, having a method that can control for the smoothness results in a big practical ad-vantage for practitioners. As they can tune the penalization parameter in real-time in order toget precious feedback that could allow them to correct/adapt the experiment settings.

MOOTH ONLINE PARAMETER ESTIMATION F IG . SOPE results for tv-VAR model: the plot in i th row and j th column represent the connectivity estimatesbetween channel i and j . Solid lines represent the truth and dotted lines represent the estimates (Delta band). The SOPE approach is not only computationally faster, and hence can be applied to higherdimensions. It can also provide estimates that compete with the Kalman ﬁlter in terms ofmean square error. Thus, the SOPE approach does not need to trade shorter computationaltime with estimation accuracy.Both the Kalman ﬁlter and the SOPE algorithms have parameters that need to be tuned,the state covariance matrix for the Kalman ﬁlter and the regularization parameters α and β for the SOPE. In the following simulation, we select the optimal parameters that minimizethe MSE for the tv-VAR model parameter estimates (for P = 3 and K = 2 ). For simplicitywe take the state covariance as Σ Q = σ I , experimentally a . value for β seem to workvery well in practice. We report in Figures 5 and 6 the average MSE as a function of thehyperparameters σ and α : F IG . Kalman ﬁlter: average MSE per parameter as a function of σ for a tv-VAR model P=3 and K=2. F IG . SOPE: average MSE per parameter as a function of α for a tv-VAR model P=3 and K=2. Following the two ﬁgures above we can make two remarks. First, both approaches havesimilar MSE per parameter that is between . and . with Kalman ﬁlter being slightlybetter. Second, the optimal σ is in the order of − and the optimal α is in the order of ,which means that the Kalman ﬁlter is much more sensitive to the choice of σ than the SOPEis to the choice of α .Given the previous simulations that assess the behaviour of the estimators in terms of theMSE with respect to the selection of the hyperparameters, we show here that we can usesuch knowledge about the hyperparameters and transfer it to another similar but differentproblem. Thus, using the optimal hyperparameters from the example above, we apply the KFand the SOPE methods to a tv-VAR model with P = 6 and K = 5 in order to investigate theparameter estimates for both models. We report bellow the results for the pairs of channels(2, 2) and (6, 2). k = k = k = k = KF k = SOPE F IG . Parameter estimates for a tv-VAR model with P = 6 and K = 5 , pair of channels (2 , . True parametersare in solid red lines, estimated parameters are in solid blue lines and , and . percentiles (based on samples) are in doted blue lines. KF on the left column and SOPE on the right column. MOOTH ONLINE PARAMETER ESTIMATION k = k = k = k = KF k = SOPE F IG . Parameter estimates for a tv-VAR model with P = 6 and K = 5 , pair of channels (6 , . True parametersare in solid red lines, estimated parameters are in solid blue lines and , , and . percentiles (based on samples) are in doted blue lines. KF on the left column and SOPE on the right column. The previous example clearly shows that the SOPE method can provide similar resultsto the Kalman ﬁlter, even when the hyperparameters have been selected on another example.However, the Kalman ﬁlter does not assume anything about the smoothness of the parametersbeing estimated. Therefore, it is natural to wonder how would the SOPE method behave inthe presence of abrupt changes such as a discontinuity. In the following example we use againthe KF and the SOPE to ﬁt a tv-VAR model that includes some discontinuous parameters,see Figure 9: k = KF k = SOPE F IG . Parameter estimates for a tv-VAR model with with discontinuities, P = 3 and K = 2 , pair of channels (2 , . True parameters are in solid red lines, estimated parameters are in solid blue lines and , , and . percentiles (based on samples) are in doted blue lines. KF on the left column and SOPE on the right column. The previous examples clearly show that in small dimensions the SOPE method is ﬂexibleenough to provide a wide range of estimates, from very smooth to very rough dependingon the choice of the penalization hyperparameter. With the additional advantage of beingcomputationally fast which enables us to get similar quality results even in higher dimensionsat a reasonable computational cost.

5. Online Estimation of Connectivity of Rat Brain Signals.

The ability to rememberevents and the relationship between these events is a fundamental characteristic for mostspecies. It has been shown that the hippocampus plays a critical role in episodic memory andin particular in the encoding of sequences of events. However, the details of this mechanismare not yet well understood . In Allen et al. (2016), a cross species (humans and rats) studywas conducted at the UCI Neurobiology Lab (PI: Fortin) to unravel the underlying neuralmechanisms allowing the hippocampus to encode non spatial sequences of events. A groupof rats was trained to recognize a sequence of ﬁve odors (lemon, rum, anise, vanilla, banana).After reaching some performance level, the rats were tested and their brain electrical activityin the CA1 hippocampus region is recorded, see ﬁgure 10 and 11. F IG . Rat with the implanted tetrodes. F IG . Tetrodes location on the hippocampus. In order to guarantee good quality of the data only seven tetrodes from the LFP recordingsof the rat have been retained, i.e., T3, T4, T5, T6, T10, T18 and T20. In this application wewant to show that our method has the ability to capture the interesting dynamics of brainconnectivity in real-time, which will allow potential practitioners to get immediate feedbackfor adapting the experiment settings to suit the goal of the study.In the following, we will start by presenting an ofﬂine analysis of this dataset using theSOPE method, our goal is to show that by controlling for the smoothness of the connectivityestimates we can detect some interesting patterns that could have been missed otherwise.However, the reader can access the real-time visualization of the estimated brain connectivitynetwork using the following link brain connectivity visualization.After ﬁtting a simple tv-VAR model of order K = 1 and with dimension P = 7 to thisdataset, we can compute different connectivity measures presented in Section 3 (coherenceand partial directed coherence) for slow gamma (20-40 Hz) and theta (4-12 Hz) oscilla-tion bands. As mentioned earlier, those measures of connectivity are very sensitive to noise.Therefore, when we don’t account enough (small penalization) for the estimator’s noise, wecan clearly see in Figure 12 that we get a poor quality estimates regardless of the frequencybands: MOOTH ONLINE PARAMETER ESTIMATION T - T T - T T - T time, t in seconds T - T time, t in seconds time, t in seconds F IG . Coherence estimates using SOPE (slow gamma) during correct sequences of odors (lemon, rum, anise,vanilla, banana), vertical bars indicate the start of each odor (A, B, C, D, E). With regularization parameters α = 100 and β = 0 . . However, when more adequate levels of smoothing are selected the connectivity estimatesfor coherence seem to be strongly modulated by the sequence of smells for the slow gammaband, see Figure 13. Nevertheless, this amplitude modulation does not seem to be present forlower frequencies such as the theta band, see Figure 14. Furthermore, the above mentionedamplitude modulation is not present between all pairs of channels, which suggests someunderlying activity in the hippocampus that is driven by the sequence of odors. This presenceof this amplitude modulation of the dependence between channels for the slow gamma bandonly does not seem to contradict the results found in Allen et al. (2016). T - T T - T T - T time, t in seconds T - T time, t in seconds time, t in seconds F IG . Coherence estimates using SOPE (slow gamma) during correct sequences of odors (lemon, rum, anise,vanilla, banana), vertical bars indicate the start of each odor. With regularization parameters α = 1000 (redline) , α = 1500 (black line), α = 2000 (blue line) and β = 0 . T - T T - T T - T time, t in seconds T - T time, t in seconds time, t in seconds F IG . Coherence estimates using SOPE (theta) during correct sequences of odors (lemon, rum, anise, vanilla,banana), vertical bars indicate the start of each odor. With regularization parameters α = 3000 (red line) , α =3500 (black line), α = 4000 (blue line) and β = 0 . The ofﬂine analysis of the tetrodes connectivity network shows some interesting dynamicsafter an odor is introduced. To analyze such activity, we look at the average connectivitymeasures immediately before (-250ms) and after (+250ms) the odor is presented and compareit with some given quantile levels of connectivity. We present bellow the results for the . -quantile.In the Figures 15, 16, 17 and 18 we can notice that the T tetrode seems to play a centralrole as it is very often responsible in many changes in the graph structure. Also the Aniseodor seem to destroy the connectivity between all channels more often that the other odors.Results for other quantiles such as the . and the . can be found in the Appendix. MOOTH ONLINE PARAMETER ESTIMATION T3T4T5T6T10 T18 T20Before lemon T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After lemonT3T4T5T6T10 T18 T20Before rum T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After rumT3T4T5T6T10 T18 T20Before anise T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After aniseT3T4T5T6T10 T18 T20Before vanilla T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After vanillaT3T4T5T6T10 T18 T20Before banana T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After banana F IG . In the before odor networks an edge is present if the high frequency coherence connectivity is higher thansome threshold as deﬁned by the 0.75 quantile. On the after networks, a blue edge indicates that this connectivitywas high before and it remained after the odor, a dotted line gray edge indicates that this connectivity was highbefore and it is no longer the case and ﬁnally a green edge indicates that this connectivity was low before and itbecame high after the introduction of the smell. T3T4T5T6T10 T18 T20Before lemon T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After lemonT3T4T5T6T10 T18 T20Before rum T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After rumT3T4T5T6T10 T18 T20Before anise T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After aniseT3T4T5T6T10 T18 T20Before vanilla T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After vanillaT3T4T5T6T10 T18 T20Before banana T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After banana F IG . In the before odor networks an edge is present if the high frequency PDC connectivity is higher thansome threshold as deﬁned by the 0.75 quantile. On the after networks, a blue edge indicates that this connectivitywas high before and it remained after the odor, a dotted line gray edge indicates that this connectivity was highbefore and it is no longer the case and ﬁnally a green edge indicates that this connectivity was low before and itbecame high after the introduction of the smell. MOOTH ONLINE PARAMETER ESTIMATION T3T4T5T6T10 T18 T20Before lemon T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After lemonT3T4T5T6T10 T18 T20Before rum T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After rumT3T4T5T6T10 T18 T20Before anise T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After aniseT3T4T5T6T10 T18 T20Before vanilla T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After vanillaT3T4T5T6T10 T18 T20Before banana T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After banana F IG . In the before odor networks an edge is present if the high frequency coherence connectivity is higher thansome threshold as deﬁned by the 0.75 quantile. On the after networks, a blue edge indicates that this connectivitywas high before and it remained after the odor, a dotted line gray edge indicates that this connectivity was highbefore and it is no longer the case and ﬁnally a green edge indicates that this connectivity was low before and itbecame high after the introduction of the smell. T3T4T5T6T10 T18 T20Before lemon T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After lemonT3T4T5T6T10 T18 T20Before rum T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After rumT3T4T5T6T10 T18 T20Before anise T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After aniseT3T4T5T6T10 T18 T20Before vanilla T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After vanillaT3T4T5T6T10 T18 T20Before banana T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After banana F IG . In the before odor networks an edge is present if the low frequency PDC connectivity is higher than somethreshold as deﬁned by the 0.75 quantile. On the after networks, a blue edge indicates that this connectivity washigh before and it remained after the odor, a dotted line gray edge indicates that this connectivity was high beforeand it is no longer the case and ﬁnally a green edge indicates that this connectivity was low before and it becamehigh after the introduction of the smell. MOOTH ONLINE PARAMETER ESTIMATION Therefore, from the above example it is clear that the SOPE method can successfullycontrol the level of smoothness of the connectivity estimates. Thus, the SOPE method canprovide access to smooth (robust to noise and artefacts) estimates in real-time that can cap-ture reliably the brain connectivity. Using the SOPE method will allow practitioners to haveimmediate feedback on the efﬁcacy of the stimulus (type and intensity) that is being applied.Which would allow them to select the experiment settings before the completion of the dataacquisition process.

6. Conclusion.

We presented a new online parameter estimation method (SOPE) forlocally stationary time series that can be modeled using a tv-VAR model. Our method canprovide smooth and online estimates with a reasonable computational time that allows it toscale for higher dimensions. Furthermore, our approach is well motivated since it relies on apenalized least squares/likelihood approach, we showed that the penalization term can havea Bayesian interpretation, with different penalization terms corresponding to different priorchoices.When compared to the Kalman ﬁlter, the SOPE method can provide very similar estimates,but with the big advantage of being computationally more efﬁcient, and thus it can be appliedto higher dimensional time series. Furthermore, this new method provides a meaningful wayto control the smoothness of the estimates using only two hyperparameters ( α and β ), whichare much simpler to tune unlike the Kalman ﬁlter parameters. Asymptotically our approachhas theoretical guaranties that indicate its asymptotic behaviour depending on the choice ofthe penalization coefﬁcients. The interest behind this approach is to increase the robustnessof the estimates and to stabilize the connectivity measures (which often involve non-smoothfunctions of the parameters e.g, Coherence, as it was shown in Figures 12 and 14.Using our approach we estimated Coherence and PDC between different tetrodes presentin the CA1 region of the hippocampus. We were able to detect the interesting dynamics inconnectivity, the slow-gamma oscillations were more strongly modulated by temporal context(sequence of odors) than the theta oscillations, which is consistent with the ﬁndings in Allenet al. (2016).

7. Appendix.

Generalized SOPE.

If the innovation components are not iid ( Σ E (cid:54) = I ), the algo-rithm presented in section . can be adapted similarly to the generalized least squares. − logf (cid:0) X ( t ) (cid:12)(cid:12) b, U ( t ) , Σ E (cid:1) ∝ (cid:12)(cid:12)(cid:12)(cid:12) X ( t ) − bU ( t ) (cid:12)(cid:12)(cid:12)(cid:12) − E (34)which in turn leads to the following problem: (cid:98) Φ( t ) = arg max b ∈ R P × KP f (cid:0) X ( t ) (cid:12)(cid:12) b, U ( t ) (cid:1) f (cid:0) b (cid:12)(cid:12) I t − (cid:1) ⇐⇒ (cid:98) Φ( t ) = arg min b ∈ R P × KP (cid:12)(cid:12)(cid:12)(cid:12) X ( t ) − bU ( t ) (cid:12)(cid:12)(cid:12)(cid:12) − E + P ( b ) (35)Solving Equation (26) , is not straight forward. Assuming the covariance is known, a changeof variable ( ˜ U = (cid:104)(cid:0) Σ − E X ( t − (cid:1) (cid:48) , . . . , (cid:0) Σ − E X ( t − K ) (cid:1) (cid:48) (cid:105) (cid:48) and ˜Φ( t ) = Σ − E Φ( t ) (cid:0) I K ⊗ Σ E (cid:1) )could be performed to decorrelate the time series components, which allows the problem tobe solved in a similar way to the least squares problem, then another change of variable wouldbe necessary to cancel the initial change of variable. In practice, the covariance matrix Σ E is unknown and consequently it has to be estimated simultaneously with the parameters ofinterest Φ( t ) based on the residuals: (cid:98) Σ E = Σ T = T − T Σ T − + 1 T R ( T ) R ( T ) (cid:48) = T (cid:88) t =1 T R ( t ) R ( t ) (cid:48) , this leads to the following general SOPE algorithm: MOOTH ONLINE PARAMETER ESTIMATION Algorithm 2

Smooth Online Parameter Estimation for tv-VAR models (general covariance) procedure G ET S MOOTH E STIMATES ( X (1) , . . . , X ( T ) )2: Initialize:3: (cid:98) Φ( K − = Least Squares4: (cid:98) Φ( K ) = (cid:98) Φ( K − Σ K = I (noise covariance)6: α ∈ (0 , ∞ ) , β ∈ [0 , . . . , for t = K + 1 , . . . , T do ˜ X = Σ − / t − X ( t ) ˜ U = (cid:104)(cid:0) Σ − / t − X ( t − (cid:1) (cid:48) , . . . , (cid:0) Σ − / t − X ( t − K ) (cid:1) (cid:48) (cid:105) (cid:48) (cid:98) ˜Φ( t ) = (cid:16) ˜ X ˜ U (cid:48) + λ (cid:104)(cid:98) ˜Φ( t −

1) + β (cid:0)(cid:98) ˜Φ( t − − (cid:98) ˜Φ( t − (cid:1)(cid:105)(cid:17) − (cid:0) ˜ U ˜ U (cid:48) + λI (cid:1) − (cid:98) Φ( t ) = Σ / t − (cid:98) ˜Φ( t ) (cid:0) I K ⊗ Σ − / t − (cid:1) R ( t ) = X ( t ) − (cid:98) Φ( t ) U ( t ) Σ t = t − t Σ t − + t R ( t ) R ( t ) (cid:48) end for end procedure Inﬁll asymptotics.

Let h = T → T →∞ dt . In the following, we use the dot notationas a shortcut: ˙ b = ddu b and ¨ b = d du b etc., thus, for properly speciﬁed coefﬁcients h ( T ) , thefollowing limits are derived:  λh → T →∞ c = ⇒ λ (cid:12)(cid:12)(cid:12)(cid:12) b ( tT ) − b ( t − T ) (cid:12)(cid:12)(cid:12)(cid:12) F = λh (cid:12)(cid:12)(cid:12)(cid:12) b ( tT ) − b ( t − T ) h (cid:12)(cid:12)(cid:12)(cid:12) F → c (cid:12)(cid:12)(cid:12)(cid:12) ˙ b ( u ) (cid:12)(cid:12)(cid:12)(cid:12) F λh → T →∞ c = ⇒ λ (cid:12)(cid:12)(cid:12)(cid:12) b ( tT ) − b ( t − T ) + b ( t − T ) (cid:12)(cid:12)(cid:12)(cid:12) F = λh (cid:12)(cid:12)(cid:12)(cid:12) b ( tT ) − b ( t − T )+ b ( t − T ) h (cid:12)(cid:12)(cid:12)(cid:12) F → c (cid:12)(cid:12)(cid:12)(cid:12) ¨ b ( u ) (cid:12)(cid:12)(cid:12)(cid:12) F hence, considering the overall estimation problem, one can deduce that the minimizationproblem of the sum of square errors will turn into a minimization of an integral provided thatthe penalization parameters grow at the adequate rate: T (cid:88) t =1 1 T (cid:104)(cid:12)(cid:12)(cid:12)(cid:12) X ( t ) − b ( t ) U ( t ) (cid:12)(cid:12)(cid:12)(cid:12)

22 + λ (cid:12)(cid:12)(cid:12)(cid:12) b ( t ) − b ( t − (cid:12)(cid:12)(cid:12)(cid:12) F (cid:105) → (cid:90) u =0 (cid:12)(cid:12)(cid:12)(cid:12) X ( (cid:98) uT (cid:99) ) − b ( u ) U ( (cid:98) uT (cid:99) ) (cid:12)(cid:12)(cid:12)(cid:12)

22 + c (cid:12)(cid:12)(cid:12)(cid:12) ˙ b ( u ) (cid:12)(cid:12)(cid:12)(cid:12) F du (36) T (cid:88) t =1 1 T (cid:104)(cid:12)(cid:12)(cid:12)(cid:12) X ( t ) − b ( t ) U ( t ) (cid:12)(cid:12)(cid:12)(cid:12)

22 + λ (cid:12)(cid:12)(cid:12)(cid:12) b ( t ) − b ( t −

1) + b ( t − (cid:12)(cid:12)(cid:12)(cid:12) F (cid:105) → (cid:90) u =0 (cid:12)(cid:12)(cid:12)(cid:12) X ( (cid:98) uT (cid:99) ) − b ( u ) U ( (cid:98) uT (cid:99) ) (cid:12)(cid:12)(cid:12)(cid:12)

22 + c (cid:12)(cid:12)(cid:12)(cid:12) ¨ b ( u ) (cid:12)(cid:12)(cid:12)(cid:12) F du (37) which leads to the following calculus of variations problems: (cid:98)

Φ = arg min b ∈C ([0 , , R P × KP ) (cid:90) u =0 (cid:12)(cid:12)(cid:12)(cid:12) X ( (cid:98) uT (cid:99) ) − b ( u ) U ( (cid:98) uT (cid:99) ) (cid:12)(cid:12)(cid:12)(cid:12) + c (cid:12)(cid:12)(cid:12)(cid:12) ˙ b ( u ) (cid:12)(cid:12)(cid:12)(cid:12) F du (38) (cid:98) Φ = arg min b ∈C ([0 , , R P × KP ) (cid:90) u =0 (cid:12)(cid:12)(cid:12)(cid:12) X ( (cid:98) uT (cid:99) ) − b ( u ) U ( (cid:98) uT (cid:99) ) (cid:12)(cid:12)(cid:12)(cid:12) + c (cid:12)(cid:12)(cid:12)(cid:12) ¨ b ( u ) (cid:12)(cid:12)(cid:12)(cid:12) F du (39)in order to be rigorous here we need to state the boundary conditions of the problem. Ofcourse, in practice we cannot have these boundary conditions since we need to solve thisproblem online. However, given previous estimates will automatically impose some boundaryconditions.The two problems in (29) and (30) involve a minimization over a space of functions. Nowthe problem can be stated in the framework of calculus of variations as follows. For penalty term in (9) we asymptotically get:  L ( u, b, ˙ b ) = (cid:12)(cid:12)(cid:12)(cid:12) X ( (cid:98) uT (cid:99) ) − b ( u ) U ( (cid:98) uT (cid:99) ) (cid:12)(cid:12)(cid:12)(cid:12) + c (cid:12)(cid:12)(cid:12)(cid:12) ˙ b ( u ) (cid:12)(cid:12)(cid:12)(cid:12) F J [ b ] = (cid:82) L ( u, b, ˙ b ) du (cid:98) Φ = arg min b ∈ B J [ b ] (40)where the Lagrangian term is just the least squares term that corresponds to the likelihood ofthe observations plus a penalty term for the roughness of the function, and B is some smoothenough class of functions. To solve such problems, we consider a necessary condition foroptimality also known as Euler-Lagrange equation of the involved Lagrangian: ∂ L ∂b − ddu ∂ L ∂ ˙ b = 0 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b = (cid:98) Φ ⇐⇒ ¨ (cid:98) Φ( u ) = 1 c ∇ (cid:98) Φ || X ( (cid:98) uT (cid:99) ) − (cid:98) Φ U ( (cid:98) uT (cid:99) ) || . similarly, using the second-order difference penalty term in (10) will lead asymptotically tothe formalized problem:  L ( u, b, ˙ b, ¨ b ) = (cid:12)(cid:12)(cid:12)(cid:12) X ( (cid:98) uT (cid:99) ) − b ( u ) U ( (cid:98) uT (cid:99) ) (cid:12)(cid:12)(cid:12)(cid:12) + c (cid:12)(cid:12)(cid:12)(cid:12) ¨ b ( u ) (cid:12)(cid:12)(cid:12)(cid:12) F J [ b ] = (cid:82) L ( u, b, ˙ b, ¨ b ) du (cid:98) Φ = arg min b ∈ B J [ b ] (41)where again the Lagrangian term is just the least squares term that corresponds to the likeli-hood of the observations, plus a penalization term for the roughness (curvature) of the func-tion. Similarly, the Euler-Lagrange equation becomes: ∂ L ∂b − ddu ∂ L ∂ ˙ b + ddu ∂ L ∂ ¨ b = 0 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b = (cid:98) Φ ⇐⇒ ... (cid:98) Φ( u ) = − c ∇ (cid:98) Φ || X ( (cid:98) uT (cid:99) ) − (cid:98) Φ U ( (cid:98) uT (cid:99) ) || . To be able to use the results from the calculus of variations theory, some smoothnessassumptions are necessary. in particular b must be smooth enough (differentiable) and theLagrangian term must be a smooth function of b , ˙ b and of ¨ b , which is obviously the casehere since we deal with a quadratic function of b , ˙ b and of ¨ b . The smoothness of b can becontrolled in simulation ( C , C etc.). In practice, however, this may not always be the casebut we assume it is the case since we are concerned with locally stationary processes. Theabove conditions are necessary for deriving the Euler-Lagrange equation.7.3. Connectivity networks before and after odor presentation.

Connectivity networks

MOOTH ONLINE PARAMETER ESTIMATION T3T4T5T6T10 T18 T20Before lemon T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After lemonT3T4T5T6T10 T18 T20Before rum T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After rumT3T4T5T6T10 T18 T20Before anise T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After aniseT3T4T5T6T10 T18 T20Before vanilla T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After vanillaT3T4T5T6T10 T18 T20Before banana T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After banana F IG . In the before odor networks an edge is present if the high frequency coherence connectivity is higher thansome threshold as deﬁned by the 0.5 quantile. On the after networks, a blue edge indicates that this connectivitywas high before and it remained after the odor, a dotted line gray edge indicates that this connectivity was highbefore and it is no longer the case and ﬁnally a green edge indicates that this connectivity was low before and itbecame high after the introduction of the smell. T3T4T5T6T10 T18 T20Before lemon T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After lemonT3T4T5T6T10 T18 T20Before rum T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After rumT3T4T5T6T10 T18 T20Before anise T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After aniseT3T4T5T6T10 T18 T20Before vanilla T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After vanillaT3T4T5T6T10 T18 T20Before banana T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After banana F IG . In the before odor networks an edge is present if the high frequency PDC connectivity is higher thansome threshold as deﬁned by the 0.5 quantile. On the after networks, a blue edge indicates that this connectivitywas high before and it remained after the odor, a dotted line gray edge indicates that this connectivity was highbefore and it is no longer the case and ﬁnally a green edge indicates that this connectivity was low before and itbecame high after the introduction of the smell. MOOTH ONLINE PARAMETER ESTIMATION T3T4T5T6T10 T18 T20Before lemon T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After lemonT3T4T5T6T10 T18 T20Before rum T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After rumT3T4T5T6T10 T18 T20Before anise T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After aniseT3T4T5T6T10 T18 T20Before vanilla T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After vanillaT3T4T5T6T10 T18 T20Before banana T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After banana F IG . In the before odor networks an edge is present if the high frequency coherence connectivity is higher thansome threshold as deﬁned by the 0.5 quantile. On the after networks, a blue edge indicates that this connectivitywas high before and it remained after the odor, a dotted line gray edge indicates that this connectivity was highbefore and it is no longer the case and ﬁnally a green edge indicates that this connectivity was low before and itbecame high after the introduction of the smell. T3T4T5T6T10 T18 T20Before lemon T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After lemonT3T4T5T6T10 T18 T20Before rum T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After rumT3T4T5T6T10 T18 T20Before anise T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After aniseT3T4T5T6T10 T18 T20Before vanilla T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After vanillaT3T4T5T6T10 T18 T20Before banana T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After banana F IG . In the before odor networks an edge is present if the low frequency PDC connectivity is higher than somethreshold as deﬁned by the 0.5 quantile. On the after networks, a blue edge indicates that this connectivity washigh before and it remained after the odor, a dotted line gray edge indicates that this connectivity was high beforeand it is no longer the case and ﬁnally a green edge indicates that this connectivity was low before and it becamehigh after the introduction of the smell. MOOTH ONLINE PARAMETER ESTIMATION T3T4T5T6T10 T18 T20Before lemon T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After lemonT3T4T5T6T10 T18 T20Before rum T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After rumT3T4T5T6T10 T18 T20Before anise T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After aniseT3T4T5T6T10 T18 T20Before vanilla T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After vanillaT3T4T5T6T10 T18 T20Before banana T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After banana F IG . In the before odor networks an edge is present if the high frequency coherence connectivity is higher thansome threshold as deﬁned by the 0.9 quantile. On the after networks, a blue edge indicates that this connectivitywas high before and it remained after the odor, a dotted line gray edge indicates that this connectivity was highbefore and it is no longer the case and ﬁnally a green edge indicates that this connectivity was low before and itbecame high after the introduction of the smell. T3T4T5T6T10 T18 T20Before lemon T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After lemonT3T4T5T6T10 T18 T20Before rum T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After rumT3T4T5T6T10 T18 T20Before anise T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After aniseT3T4T5T6T10 T18 T20Before vanilla T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After vanillaT3T4T5T6T10 T18 T20Before banana T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After banana F IG . In the before odor networks an edge is present if the high frequency PDC connectivity is higher thansome threshold as deﬁned by the 0.9 quantile. On the after networks, a blue edge indicates that this connectivitywas high before and it remained after the odor, a dotted line gray edge indicates that this connectivity was highbefore and it is no longer the case and ﬁnally a green edge indicates that this connectivity was low before and itbecame high after the introduction of the smell. MOOTH ONLINE PARAMETER ESTIMATION T3T4T5T6T10 T18 T20Before lemon T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After lemonT3T4T5T6T10 T18 T20Before rum T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After rumT3T4T5T6T10 T18 T20Before anise T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After aniseT3T4T5T6T10 T18 T20Before vanilla T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After vanillaT3T4T5T6T10 T18 T20Before banana T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After banana F IG . In the before odor networks an edge is present if the high frequency coherence connectivity is higher thansome threshold as deﬁned by the 0.9 quantile. On the after networks, a blue edge indicates that this connectivitywas high before and it remained after the odor, a dotted line gray edge indicates that this connectivity was highbefore and it is no longer the case and ﬁnally a green edge indicates that this connectivity was low before and itbecame high after the introduction of the smell. T3T4T5T6T10 T18 T20Before lemon T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After lemonT3T4T5T6T10 T18 T20Before rum T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After rumT3T4T5T6T10 T18 T20Before anise T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After aniseT3T4T5T6T10 T18 T20Before vanilla T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After vanillaT3T4T5T6T10 T18 T20Before banana T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20 T3T4T5T6T10 T18 T20After banana F IG . In the before odor networks an edge is present if the low frequency PDC connectivity is higher than somethreshold as deﬁned by the 0.9 quantile. On the after networks, a blue edge indicates that this connectivity washigh before and it remained after the odor, a dotted line gray edge indicates that this connectivity was high beforeand it is no longer the case and ﬁnally a green edge indicates that this connectivity was low before and it becamehigh after the introduction of the smell. MOOTH ONLINE PARAMETER ESTIMATION REFERENCES A LLEN , T. A., S

ALZ , D. M., M C K ENZIE , S. and F

ORTIN , N. J. (2016). Nonspatial Sequence Coding in CA1Neurons.

Journal of Neuroscience ACCALA , L. and S

AMESHIMA , K. (2001). Partial directed coherence: A new concept in neural structure deter-mination.

Biological Cybernetics ASU , S., L I , X. and M ICHAILIDIS , G. (2019). Low Rank and Structured Modeling of High-dimensional VectorAutoregressions.

IEEE Transactions on Signal Processing ASU , S. and M

ICHAILIDIS , G. (2015). Regularized estimation in sparse high-dimensional time series models.

Annals of Statistics ROCKWELL , P. J. and D

AVIS , R. A. (1986).

Time Series: Theory and Methods .C OLE , M. W., B

AGIC , A., K

ASS , R. and S

CHNEIDER , W. (2010). Prefrontal Dynamics Underlying Rapid In-structed Task Learning Reverse with Practice.

Journal of Neuroscience RIBBEN , I., H

ARALDSDOTTIR , R., A

TLAS , L. Y., W

AGER , T. D. and L

INDQUIST , M. A. (2012). Dynamicconnectivity regression: determining state-related changes in brain connectivity.

Neuroimage AHLHAUS , R. (1997). Fitting Time Series Models to Nonstationary Processes.

The Annals of Statistics AHLHAUS , R. (2012). Locally Stationary Processes.

Handbook of Statistics AHLHAUS , R. and S

UBBA R AO , S. (2006). Statistical inference for time-varying ARCH processes. The Annalsof Statistics AHLHAUS , R. and S

UBBA R AO , S. (2007). A recursive online algorithm for the estimation of time-varyingARCH parameters. Bernoulli Society for Mathematical Statistics and Probability MAMI , M. and T

ABAN , M. (2018). A novel intelligent adaptive Kalman Filter for estimating the Submarine’svelocity: With experimental evaluation.

Ocean Engineering

IECAS , M. and O

MBAO , H. (2016). Modeling the Evolution of Dynamic Brain Processes During an AssociativeLearning Experiment.

Journal of the American Statistical Association

RISTON , K. J. (2011). Functional and Effective Connectivity: A Review.

Brain Connectivity ELFAND , I. M. and F

OMIN , S. V. (1964).

Calculus of Variations . Prentice-Hall.G

RANGER , C. W. J. (1969). Investigating Causal Relations by Econometric Models and Cross-spectral Methods.

Econometrica AGER , W. W. (1989). Updating the Inverse of a Matrix.

SIAM Review ALMAN , R. E. (1960). A New Approach to Linear Filtering and Prediction Problems.

Transactions of theASME–Journal of Basic Engineering OLMOGOROV , A. N. (1939). Sur l’interpolation et l’extrapolation des suites stationnaires.

C.R. de l’Acad. Sci.,Paris .L I , L., P LUTA , D., S

HAHBABA , B., F

ORTIN , N., O

MBAO , H. and B

ALDI , P. (2019). Modeling Dynamic Func-tional Connectivity with Latent Factor Gaussian Processes. 8263-8273.L

IPPUNER , D. and M

OSCHYTZ , G. (2004). The Kalman ﬁlter in the context of adaptive ﬁlter theory.

Interna-tional Journal of Circuit Theory and Applications ÜTKEPOHL , H. (1991).

New Introduction to Multiple Time Series Analysis . Springer.M

EHRA , R. K. (1972). Approaches to Adaptive Filtering.

IEEE Transactions on Automatic Control

AC-17

ASON , G. P., V ON S ACHS , R. and K

ROISANDT , G. (2000). Wavelet processes and adaptive estimation of theevolutionary wavelet spectrum.

Journal of the Royal Statistical Society Series B MBAO , H., S

ACHS , R. and G UO , W. (2005). SLEX Analysis of Multivariate Nonstationary Time Series. Jour-nal of the American Statistical Association

MBAO , H., F

IECAS , M., T

ING , C.-M. and L OW , Y. F. (2018). Statistical models for brain signals with proper-ties that evolve across trials. NeuroImage

AGNOTTA , M. F. and P

LOMP , G. (2018). Time-varying MVAR algorithms for directed connectivity analysis:Critical comparison in simulations and benchmark EEG data.

PloS one e0198846.P ARK , T., E

CKLEY , I. A. and O

MBAO , H. C. (2014). Estimating Time-Evolving Partial Coherence BetweenSignals via Multivariate Locally Stationary Wavelet Processes.

IEEE Transactions on Signal Processing LACKETT , R. L. (1950). Some theorems in least squares.

Biometrika

37 1-2

RIESTLEY , M. B. (1965). Evolutionary Spectra and Non-Stationary Processes.

Journal of the Royal StatisticalSociety AITO , Y., H

ARASHIMA , H., Y

AMAGUCHI , N. and F

UJISAWA , K. (1981). Recent advances in EEG and EMGdata processing.

Chap. Tracking of information within multichannel EEG record-causal analysis in EEG S AMDIN , S., T

ING , C.-M., O

MBAO , H. and S

ALLEH , S. (2016). A Uniﬁed Estimation Framework for State-Related Changes in Effective Brain Connectivity.

IEEE Transactions on Biomedical Engineering

844 -858.S

HUMWAY , R. H. and S

TOFFER , D. S. (2005).

Time Series Analysis and Its Applications . Springer-Verlag.S

IMS , C. A. (1980). Macroeconomics and Reality.

Econometrica ORENSON , H. W. (1970). Least-squares estimation: from Gauss to Kalman.

IEEE Spectrum ING , C.-M., O

MBAO , H., S

AMDIN , S. B. and S

ALLEH , S. H. (2018). Estimating Dynamic Connectivity Statesin fMRI Using Regime-Switching Factor Models.

IEEE Transactions on Medical Imaging ING , C.-M., S

AMDIN , S., T

ANG , M. and O

MBAO , H. (2020). Detecting Dynamic Community Structure inFunctional Brain Networks Across Individuals: A Multilayer Approach.

IEEE Transactions on Medical Imag-ing

ANG , Y., T

ING , C.-M. and O

MBAO , H. (2016). Modeling Effective Connectivity in High-Dimensional CorticalSource Signals.

IEEE Journal of Selected Topics in Signal Processing IENER , N. (1949).

The Extrapolation, Interpolation rand Smoothing of Stationary Time Series .X U , Y. and L INDQUIST , M. A. (2015). Dynamic connectivity detection: an algorithm for determining functionalconnectivity change points in fMRI data.

Frontiers in neuroscience OUNG , P. (2011).

Recursive Approaches to Time Series Analysis .Z HAO , W. and P

RADO , R. (2020). Efﬁcient Bayesian PARCOR Approaches for Dynamic Modeling of Multi-variate Time Series.

Journal of Time Series Analysis HOU , B., M

OORMAN , D. E., B

EHSETA , S., O

MBAO , H. and S

HAHBABA , B. (2016). A Dynamic BayesianModel for Characterizing Cross-Neuronal Interactions During Decision-Making.

Journal of the American Sta-tistical Association111