Improving J-divergence of brain connectivity states by graph Laplacian denoising
Tiziana Cattai, Gaetano Scarano, Marie-Constance Corsi, Danielle S. Bassett, Fabrizio De Vico Fallani, Stefania Colonnese
IImproving J-divergence of brain connectivity statesby graph Laplacian denoising
Tiziana Cattai , Gaetano Scarano , Marie-Constance Corsi , Danielle S.Bassett Fabrizio De Vico Fallani , Stefania Colonnese Inria Paris, Aramis Project Team, Paris, France Institut du Cerveau et de la Moelle epiniere, ICM, Inserm U 1127, CNRS UMR 7225,Sorbonne Universite, Paris, France Dept. of Information Engineering, Electronics and Telecommunication, SapienzaUniversity of Rome, Italy Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, 19104,USA Department of Electrical and Systems Engineering, University of Pennsylvania,Philadelphia, PA, 19104, USA Department of Physics & Astronomy, University of Pennsylvania, Philadelphia, PA,19104, USA Department of Neurology, Hospital of the University of Pennsylvania, Philadelphia,PA, 19104, USA Department of Psychiatry, Hospital of the University of Pennsylvania, Philadelphia,PA 19104, USA The Santa Fe Institute, Santa Fe, NM, 87501 USACorresponding author: [email protected]
Abstract
Functional connectivity (FC) can be represented as a network, and is frequentlyused to better understand the neural underpinnings of complex tasks such as motorimagery (MI) detection in brain-computer interfaces (BCIs). However, errors inthe estimation of connectivity can affect the detection performances. In this work,we address the problem of denoising common connectivity estimates to improve thedetectability of different connectivity states. Specifically, we propose a denoisingalgorithm that acts on the network graph Laplacian, which leverages recent graphsignal processing results. Further, we derive a novel formulation of the Jensendivergence for the denoised Laplacian under different states. Numerical simulationson synthetic data show that the denoising method improves the Jensen divergenceof connectivity patterns corresponding to different task conditions. Furthermore,we apply the Laplacian denoising technique to brain networks estimated from realEEG data recorded during MI-BCI experiments. Using our novel formulation ofthe J-divergence, we are able to quantify the distance between the FC networks inthe motor imagery and resting states, as well as to understand the contribution ofeach Laplacian variable to the total J-divergence between two states. Experimentalresults on real MI-BCI EEG data demonstrate that the Laplacian denoising improvesthe separation of motor imagery and resting mental states, and shortens the timeinterval required for connectivity estimation. We conclude that the approach showspromise for the robust detection of connectivity states while being appealing forimplementation in real-time BCI applications.
This work has been submitted to the IEEE for possible publication. Copyright may be transferredwithout notice, after which this version may no longer be accessible
December 23, 2020 1/26 a r X i v : . [ q - b i o . N C ] D ec Introduction
Functional connectivity describes how brain areas mutually interact [1]. This informationcan be modeled as a graph, which is one of the most common formalism to characterizenetworked data [2, 3]. Many recent studies prove that mental states can be characterizedby graph statistics, such as node strength, efficiency, and modularity [4]. Detectingbrain connectivity-related features corresponding to different mental states can enhanceseveral technologies, such as brain-computer interfaces (BCIs). BCI systems allowsubjects to communicate and interact without peripheral neuro-muscular activity [5].The requirement for the BCI functioning is therefore the correct detection of the user’smental states. While research on the subject has significantly advanced over the lastdecade, there is still a key limitation known as BCI inefficiency [6]. It refers to the factthat there is a percentage of users who cannot be trained to use the interface. Thislimitation, together with system-user interaction problems [6], motivated us to developnew tools with the intent of having a more robust estimate of brain connectivity withthe final goal of better separating two cognitive states. Implementing this estimatefrom signals acquired at graph nodes (e.g. EEG electrodes) is a difficult task becauseof the inherent noise, the number of links to estimate, the presence of artifacts, thenon-stationarity of the signal.To address the problem of connectivity estimation together with the improvement ofseparability between mental states to optimally control a BCI, it is necessary to combinetools from different fields, such as neuroscience and signal processing. For example,graph signal processing (GSP) can be applied in this scenario [7–9]. GSP has alreadybeen used to deal with biological data, and in particular brain data [10,11]. Indeed, GSPis potentially able to integrate information regarding brain structure, as represented bythe graph itself, with information regarding brain function, as represented by the graphsignals.Another helpful tool to the brain connectivity estimation problem is the signaldetection theory. Detection procedures can be applied to investigate statistical differencesbetween the brain connectivity features of two different mental states, which correspondsto motor imagery and resting state for our applications. In this context, widely adoptedstatistics are the Likelihood Ratio (LR) of the features [12–19] as well as the linear detectormaximizing the so-called deflection [20–23]. With the aim of obtaining a distance metricof features under two states, the case of normally distributed observations simplifies theanalysis. Indeed, for normally distributed observations with equal conditional varianceand different conditional means, the maximum deflection test coincides with the LRtest. Moreover, this latter can be extended to a linear quadratic detector so as to copewith observations characterized by different conditional variances. [20, 21]. To obtaina measure of separability between features under the two states, one possibility is theJensen divergence which reflects the maximum deflection test performance [12, 14, 19].In the following sub-section, we present our original contributions.
Paper Contributions • This paper proposes a novel graph Laplacian denoising algorithm, to enhance theaccuracy of brain connectivity estimates. In recent literature, several studies havebeen conducted to improve the accuracy of link estimation, whereas few studiesapproach this problem in terms of description of the graph algebraic structure (seeSection 2). We address this limitation by proposing a subspace-based Laplaciandenoising algorithm that preserves relevant connectivity features while rejectingnoise-dominated components. In particular, the Laplacian denoising preservesi) the sub-spaces more directly related to the graph topology, as summarized bythe eigenvectors corresponding to the smallest Laplacian eigenvalues, and ii) thesub-spaces estimated under a favourable signal-to-noise ratio, as summarized by theDecember 23, 2020 2/26 otation DescriptionA , ˆA adjacency matrix (real, estimated)V set of all nodesN total number of nodesE set of all links D , ˆD degree matrix (real, estimated) L , ˆL Laplacian matrix (real, estimated) λ , ˆ λ eigenvalue (real, estimated) u , ˆu eigenvector (real, estimated) U L , U M , U H subset of smallest, central, larger eigenvalues ˜L , ˜l filtered graph laplacian matrix and vector T transformation matrix x vectorized laplacian in the transformed domainJ J-divergence S score Table 1.
Table of main notation. eigenvectors corresponding to the largest Laplacian eigenvalues. The noise rejectionobtained by this twofold sub-space selection notably improves the separabilityof two connectivity states. To sake of clarity, we refer to connectivity states asthe patterns of connectivity estimated while the brain performs distinct cognitivetasks. • In order to measure the improvement achieved by the proposed brain connectivitydenoising algorithm, we provide an analysis of the J-divergence of the Laplaciancoefficients, we explicit their contribution to the states’ separability in terms oftheir first and second order moments of the test statistics, and we show thatthe proposed Laplacian denoising actually increases the J-divergence of the brainconnectivity features rest ( null ) or motor imagery ( alternative ). The improvementof the J-divergence of the graph Laplacian coefficients under different connectivitystates is assessed by numerical simulations on synthetic data. • Finally, we present experimental results on real EEG data acquired during motorimagery-based BCI experiments, and we prove that the proposed novel denoisingalgorithm increases the J-divergence of brain connectivity states and paves the wayfor connectivity estimation time interval reduction. As a relevant by-product of thetheoretical J-divergence analysis, we are able to attribute a score to each and everyLaplacian coefficient representing its marginal contribution to the J-divergence.The score admits relevant biological interpretation confirming the efficacy of theapproach. These results can be assessed by further studies on the brain connectivityfeatures.The structure of the paper is as follows. Section 2 reviews the scientific literaturerelated to our work. Section 3 describes the signal model used in the analysis. Section 4details the novel graph denoising we propose. Section 5 describes the problem of Gaussiandetection, and it presents the novel formulation of J-divergence we use throughout thepaper. In section 6, we test our filtering method on synthetic graph to verify its abilityto separate two graphs. Section 7 applies our method on real EEG data, exhibiting itscapacity to estimate graph connectivity during motor imagery tasks and to discriminatebetween two mental states. We conclude in section 8. In Table 1, we list of the mainnotation used in the paper.December 23, 2020 3/26
Related Work
The problem of graph connectivity estimation has been well studied in literature indifferent domains, from neuroscience to signal processing and graph theory [24–26].State-of-the-art graph learning methods have the limitation that they usually presentover-simplified models for the signal on graph to overcome problems of computationaland memory cost. Some recent works, such as [27], propose different strategies to dealwith graph learning problems. Specifically, in the context of mental state identification,authors in [27] present a novel technique to create and modify embeddings associated toeach graph node to efficiently compute the adjacency matrix. Since FC computationrequires a lot of time and computational power, one possibility consists in clustering FCinto relevant communities of synchronous components. One approach, recently proposedin [28], goes in this direction with the application of k-means clustering algorithmfollowed by a tensor decomposition to reduce the FC data.FC estimation can leverage the generalization of classical signal processing operationsinto the graph setting, where signals are localized on graph nodes, giving rise to novelresearch domain of the graph signal processing (GSP) [7–9]. GSP has already showedits potential to describe brain functioning in [29] and [10]. Indeed, GSP representationnaturally fits to the brain, where the structure can be described by the graph itselfwhile brain functioning directly corresponds to graph signals. An interesting applicationis graph filtering [7, 30] which can be useful to extract meaningful brain behaviour[31]. In [32], authors propose a mathematical model for brain fibers able to describeneurophysiological mechanisms. The model, based on GSP techniques, extracts a subsetof graph eigenvectors which represent a suitable basis for filtering fiber tracts from brainimaging data.GSP techniques have been applied also to brain-computer interfaces with NIRSsignals [33]. Specifically, GSP analysis is leveraged in [33] in the context of featureextraction to extract spatial information from the NIRS signals and it has been shownto improve classification performances.Classical signal processing techniques and eigenvector-based filtering have alreadybeen used with brain data [34, 35]. In [36] and [37], eigenvector-based filters are appliedto fetal magnetic signals and diffuse optic imaging data to obtain more localized activitiesand reduce artifacts and noise. Specifically, in [37], classical eigenvector-filtering, i.e.based on larger eigenvectors, is used in diffuse optical imaging with the aim to improvingconnectivity estimation.In the following, stemming from the GSP approach to FC estimation, we proposea novel Laplacian denoising algorithm, and we show that it improves the statisticalseparation of distinguished connectivity states. To this aim, we provide an analysis ofthe J-divergence, which naturally provides a metric to quantify the distance betweentwo distributions, for the problem under concern. Recently, the J-divergence has beenapplied in [38] to investigate the time series’ irreversibility . Another recent applicationof the J-divergence is proposed in [39], where authors present a novel approach tovector-skew the J-divergence. This method is able to preserve J-divergence propertiesand simultaneously to fine tune parameters for specific applications.J-divergence has been also applied in the context of BCI design, to tackle one ofthe most challenging issues of EEG-based BCIs, which is the long calibration time. Ingeneral, the number of data required to calibrate the model is really high, because of thenoise and the non-stationarity of brain signals. One solution comes from [40], where asubject-to-subject transfer learning is proposed to improve the classification performancewith limited training data. J-divergence is used in a transfer learning framework to testtheir method by comparing the data of the target subject with the data from previoussubjects. In the following, we investigate the J-divergence under a different points ofview, namely i) we assess the performance of the denoising algorithm in separatingDecember 23, 2020 4/26onnectivity states and ii) we provide criterion for scoring the Laplacian coefficientsbased on their contribution to the connectivity states separation.
We are interested in analyzing signals defined on an undirected, connected, weightedgraph G = { V, E, A } , which consists of a finite set of vertices V with | V | = N , a set ofedges E and a weighted adjacency matrix A . If there is an edge e = ( i, j ) connectingvertices i and j , the element A i,j represents the weight of the edge; otherwise, A i,j = 0.The graph Laplacian, is a real symmetric matrix defined as: L = D − A (1)where the degree matrix D is a diagonal matrix whose i th diagonal element d i isequal to the sum of the weights of all the edges incident to vertex i . We denote by { u i } i =0 , ,...,N − set of orthonormal eigenvectors, corresponding to increasingly orderedeigenvalues 0 = λ ≤ λ ≤ λ ... ≤ λ N − = λ max .In GSP, the Laplacian eigenvectors are considered as SoGs and provide a basis forthe Graph Fourier Transform. For a SoG s , the GFT is defined as the projection of s on the eigenvectors of the graph Laplacian: ˆ s ( λ l ) = s H u l . The graph Laplacianeigenvalues λ l , l = 0 , · · · N − T oss with sampling pace T s as y n [ kT s ] , n = 0 , . . . N − , k = 0 , . . . N s , N s = (cid:98) T oss /T s (cid:99) , or invector form as y [ kT s ] = [ y [ kT s ] . . . y N − [ kT s ]] The vector sequence y [ kT s ] , k = 0 , . . . N s is used to estimate the graph adjacency matrix A by computing a similarity metricon each and every node pair. There are many state-of-the-art methods to estimate A i,j , i, j = 0 , · · · N −
1, which associate link weights according to different interactionproperties [1, 41–43]. Thereby, the adjacency matrix A is actually represented by itsestimated version ˆ A , which contains the connectivity values ˆ A i,j estimated for eachgraph node pair ( i, j ) , i, j = 0 , · · · N −
1. Accordingly, the estimated degree matrix ˆ D iscomputed, so as to derive the estimated laplacian ˆ L through Eq. 1, that becomes here:ˆ L = ˆ D − ˆ A (2)Any estimation error on the adjacency matrix affects the Laplacian estimate, and itresults into less distinguishable connectivity states. In the following section we addressthe denoising of the estimated Laplacian for the purpose of improving the separation ofconnectivity states. In order to introduce the Laplacian denoising algorithm, we consider the eigenvaluedecomposition of the estimated Laplacian matrix ˆ L as follows:ˆ L = N − (cid:88) i =0 ˆ λ i ˆ u i ˆ u Hi (3) We refer here to the non-normalized graph Laplacian, also called the combinatorial Laplacian.
December 23, 2020 5/26erturbations affect graph Laplacian estimation in terms of both eigenvalues and/oreigenvectors. To elaborate on the effect of perturbations, we explicit the first, secondand third order error contributions to the estimated laplacian ˆ L as:ˆ L = N − (cid:88) i =0 ( λ i + (cid:15) λ i )( u i + (cid:15) u i )( u i + (cid:15) u i ) H = N − (cid:88) i =0 λ i u i u Hi (cid:124) (cid:123)(cid:122) (cid:125) L + λ i u i (cid:15) Hu i + λ i (cid:15) u i u Hi + (cid:15) λ i u i u Hi (cid:124) (cid:123)(cid:122) (cid:125) first order error + λ i (cid:15) u i (cid:15) Hu i + (cid:15) λ i u i (cid:15) Hu i + (cid:15) λ i (cid:15) u i u Hi (cid:124) (cid:123)(cid:122) (cid:125) second order error + (cid:15) λ i (cid:15) u i (cid:15) Hu i (cid:124) (cid:123)(cid:122) (cid:125) third order error (4)Thereby, the estimated Laplacian can be approximated at the first order as the sumof N terms: ˆ L ≈ N − (cid:88) i =0 ( λ i + (cid:15) λ i ) u i u Hi + λ i ( u i (cid:15) Hu i + (cid:15) u i u Hi ) (5)Eq.(5) highlights the first order error contribution due to relative perturbation ofthe Laplacian eigenvalues as well as of the eigenvectors direction. We are interested inthe Laplacian components whose perturbation is contained because either the relativeeigenvalue perturbation (cid:15) λ i /λ i or the eigenvector perturbation (cid:15) u i is (relatively) small.To this aim, we order the set of orthonormal eigenvectors U ALL = (cid:8) ˆ u l , l = 0 , , ..., N − (cid:9) with increasingly eigenvalues 0 = ˆ λ ≤ ˆ λ ≤ ˆ λ ... ≤ ˆ λ N − := ˆ λ max , and we considerthree subsets of eigenvalues and associated eigenvectors: 1) the subset U L containing the N L smallest eigenvalues; 2) the subset U H containing the N H largest eigenvalues; and 3)the subset U M containing the remaining N M = N − N L − N H central eigenvalues, with U L ∪ U M ∪ U H = U ALL .Firstly, we remark that the N H largest eigenvalues are more robust to eigenvalueperturbation; this assumption is usually exploited in classical signal processing, where thesubspace U H is used for the estimation of the covariance matrix because of its favorablesignal-to-noise ratio [44].Secondly, stemming on recent literature results [45], it can be expected that thesubspace U L is partially robust in terms of eigenvector perturbation. In fact, in [45]the authors states that a connectivity estimation error on the A mn adjacency matrixelement, i.e. on the weight of the link between the m -th and the n -th nodes, inducesa perturbation (cid:15) u i of the i -th eigenvector depending on the difference between the m -th and the n -th coefficients of u i . Thereby, eigenvectors smoothly varying across the m -th and the n -th nodes are less affected by estimation errors on A mn . On the otherhand, in GSP, it is well known that U L eigenvectors, corresponding to small eigenvalues,represent low frequency basis elements in the Graph Fourier Transform [46], [9] sincethey are characterized by the smallest variations across strongly connected graph regions.Thereby, the eigenvectors in U L are equipped with inherent resilience to connectivityestimation error within these regions. On the other hand, the eigenvectors in U L aretightly related to the network connectivity, and therefore they need to be involved inthe devised denoising method.Stemming on these observations, we propose a denoising method based on preservingthe contribution to the Laplacian due to the subspaces U L , U H while discarding thoserelative to the subspace U M . In formulas, given the estimated Laplacianˆ L = (cid:88) i ∈U L ∪U M ∪U H ˆ λ i ˆ u i ˆ u Hi (6)December 23, 2020 6/26e compute the denoised Laplacian ˜ L as follows:˜ L = (cid:88) i ∈U L ˆ λ i ˆ u i ˆ u Hi + (cid:88) i ∈U H ˆ λ i ˆ u i ˆ u Hi (7)We recognize that the proposed, subspace-based, Laplacian denoising approach allowspreservation of • the sub-space U L , which is directly related to the graph connectivity features • the sub-space U H , which is estimated with a favourable signal-to-noise ratio.The proposed Laplacian denoising method, synthetically presented in Algorithm 1,preserves the information relevant for the purpose of graph connectivity identification,while rejecting noisy components. In order to quantify the improvement achieved interms of connectivity state separability, we resort to the J-divergence as a metric ofthe distance between connectivity states. In the following we derive a formulation theJ-divergence for the problem under concern. Several metrics can be adopted to determine the separability of two connectivity states[47], as represented by the Laplacian matrix L . Herein, we resort to the notion ofJ-divergence for characterizing the separability of connectivity states, and we reformulateit for the problem under concern. Thus, J-divergence is used to identify the Laplaciancoefficients that are most relevant for detection purposes and later on to measure theimprovement achieved by the proposed denoising algorithm.For the purpose of the analysis, we will assume that the Laplacian coefficientsobtained at the output of the denoising algorithm are normally distributed. Let usremark that the Gaussian assumption stands in many applications , including the case ofconnectivity estimates carried out on real brain signals, and thereby it is often assumedin the literature, e.g. for Laplacian estimation purposes [7]. Specifically, we assume thatthe vector ˜ l = Vec( ˜ L ) is distributed according to a multidimensional Gaussian probabilitywhose mean vector and covariance matrix differ under two different connectivity states,referred to as the null and the alternative hypotheses H , H in the following: (cid:40) H : (cid:101) l ∼ N ( η , K ) H : (cid:101) l ∼ N ( η , K ) (8)As an information theoretic measure of distance between ˜ l under H and H , we nowcompute in analytical form the J-divergence, which is defined as the expected value ofthe difference of the Log Likelihood Ratio under the two hypothesis H and H [14].The J-divergence formulation will allow us to evaluate to which extent the connectivitystates represented by the Laplacian coefficients are distinguishable from each other.Let us first assume that the Laplacian moments η , η , K , K are known. Detectioncan then be conducted on a linear transformation of the observations: x = T (cid:16)(cid:101) l − η (cid:17) The reason why this occurs is that the Gaussian assumption tightly models laplacian diagonalelements, computed in each row as the sum of extradiagonal elements in that column, as well asextradiagonal elements which are often computed as the result of correlation estimates. The notation (cid:101) l ∼ N (cid:0) η j , K j (cid:1) , with j ∈ { , } indicates that the random vector (cid:101) l is Gaussiandistributed with mean vector η j and covariance matrix K j . December 23, 2020 7/26here η def = T ( η − η ) and T = T ( K , K ) is an affine transform that simultaneously whitens the observations in the H hypothesis obtains uncorrelated observations in the H hypothesis. An example of the action of the transform T , is shown in Fig. 1 for thecase of 2-dimensional Gaussian data whose mean and covariance matrix differ under the H , H . The original data points are plotted in Fig. 1(a), whereas their transformedcounterparts are represented in Fig. 1(b). The transformed data are unitary variance,zero-centered under H and are uncorrelated under H . Fig 1.
Example of transformation effect. In a) we have 2-dimensional Gaussian distributionwhich differ under mean and covariance matrix . In b) we report the same distributions afterthe T transformation
In real detection systems, the moments η , η , K , K can be either estimated froma training set, e.g. during a BCI training phase, or coarsely initialized and trackedthroughout the system life, using methodologies, priors, and heuristics related to theapplication-specific problem under concern [48–50]. Besides, the transformed data x can be obtained even avoiding computation of the moments and of T , by applyingthe laplacian coefficients ˜ l to a suitably trained network [51], that will enforce theafore-mentioned statistical constraints.With these position, the observation model becomes: H : x ∼ N ( , I ) versus H : x ∼ N (cid:0) η , Σ (cid:1) (9)The J-divergence is then defined as: J def = E {R ( x ) |H } − E {R ( x ) |H } (10)being R ( x ) the Log-Likelihood Ratio : R ( x ) = x H (cid:0) I − Σ − (cid:1) x + 2 η H Σ − x (11)Let us now associate the variables x n whose variance σ n (cid:54) = 1 to the first P indexesand the remaining ones to the indexes n = P + 1 , . . . , N so as to rewrite the LLR asfollows: The matrix T and the diagonal matrix Σ = diag( σ , . . . , σ n , . . . , σ N ) are computed as the gener-alized eigenvectors and the generalized eigenvalues matrices of the pencil ( K , K ), respectively. Givenany square root Q of K − , i.e. such that Q H · K · Q = I , we may conveniently employ the unitarytransformation V obtained from the eigenanalysis Q H · K · Q = V · Λ · V H ; in fact, it is easilyproved that the matrix T = V H · Q H verifies T · K · T H = I ; T · K · T H = Σ with Λ = Σ . The Log-Likelihood Ratio R ( x ) is widely adopted classical LLR detection: R ( x ) H ↑ ≷ ↓ H θ , being θ selected according to the desired detection versus missing probability tradeoff. Possibly, we might have P = N or P =0. December 23, 2020 8/26 ( x ) = N (cid:88) n =1 σ n (cid:2)(cid:0) σ n − (cid:1) x n + 2 η n · x n (cid:3) = P (cid:88) n =1 (cid:0) σ n − (cid:1) | x n | + 2 η n · x n σ n + N (cid:88) n = P +1 η ∗ n · x n (12)By adding and subtracting the term | η n | / [ σ n ( σ n − R ( x ) = P (cid:88) n =1 σ n − σ n (cid:12)(cid:12)(cid:12)(cid:12) x n + η n σ n − (cid:12)(cid:12)(cid:12)(cid:12) (cid:124) (cid:123)(cid:122) (cid:125) P quadratic terms + N (cid:88) n = P +1 η n · x n (cid:124) (cid:123)(cid:122) (cid:125) N − P linear terms − P (cid:88) n =1 | η n | σ n (cid:0) σ n − (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) constant to be included in the threshold (13)The P variates x n , n = 1 , . . . , P , having different conditional variances under thehypotheses H , H , contribute to the LLR by the P terms quadratic terms. The N − P variates x n , n = P + 1 , . . . , N , having equal unitary conditional variances under thehypotheses H , H , contribute to the LLR by the N − P linear terms. To gain furtherinsight on the J-divergence, we resort to the following theorem, whose demonstration isreported in the Appendix. Theorem 1
Let ξ be a vector formed by the N statistically independent random variables: ξ n = (cid:18) x n + η n σ n − (cid:19) , n = 1 , . . . , Pξ n = x n , n = P + 1 , . . . , N (14) The LLR is expressed as R ( x ) = a H LLR · ξ being a LLR constant coefficients defined as inEq. (25) and the J-divergence in Eq. (10) is computed as follows: J = P (cid:88) n =1 (cid:0) σ n − σ − n (cid:1) (cid:34) | η n | σ n σ n + σ − n (cid:0) σ n − σ − n (cid:1) (cid:35) + N (cid:88) n = P +1 | η n | = P (cid:88) n =1 J ( σ,η ) n + N (cid:88) n = P +1 J ( η ) n (15)Theorem 1 generalizes the result in [13, 14] where only the case of variables havingequal conditional means and different covariances (i.e. η = η , K (cid:54) = K ) has beenaddressed.The J-divergence as formulated in Eq.(15) is a measure of the statistical distance ofthe graph Laplacian coefficients under two connectivity states H and H , and it will beused to quantify the improvement of separability of brain connectivity states achievedby the denoising algorithm described in section 4.Furthermore, the above analysis sheds a light upon the variables that mostly contributeto the states separability. From Eq.(15), we see a one-to-one correspondence betweenthe transformed space variables x n and the terms of the J-divergence J ; besides, theDecember 23, 2020 9/26erm can be of two kinds J ( σ,η ) n = (cid:0) σ n − σ − n (cid:1) (cid:34) | η n | σ n σ n + σ − n (cid:0) σ n − σ − n (cid:1) (cid:35) J ( η ) n = 2 | η n | (16)depending on whether the variable changes both in conditional mean and standarddeviation, or in conditional mean only. The functions J ( σ,η ) n , J ( η ) n are plotted in Fig.2for η between 0 and 1 and σ − from 0 to 10. Interestingly, in Fig.2 we recognize that a Fig 2.
J-Divergence contributions as function of mean η and standard deviation σ : a) J ( σ,η ) for variables whose conditional standard deviation differ under H and H , and b) J ( η ) forvariables with invariant conditional standard deviation. conditional variance change gives a higher contribution to J than an equal conditionalmean change.To sum up, the above analysis highlights the contribution of each of transformedvariable to the separability of the connectivity states, and allows to rank their relevanceto J n , so as to identify the transformed variables which mostly differ under the twohypothesis. This paves the way for an information theoretic scoring of the Laplaciancoefficients, described in the following. As a by-product of the analysis, we are now able to identify which Laplacian coefficients(i.e. links weights or nodes degrees), contribute mostly to the connectivity statesseparability. This is obtained by attributing a score to the Laplacian coefficientsmeasuring their contribution to the J-divergence J .By definition, the Laplacian coefficient ˜ l n , n = 0 , · · · N −
1, contribute to eachtransformed variable x n . Thereby, we introduce a score S n , , evaluated by suitableDecember 23, 2020 10/26ackpropagation of the J n terms on each contributing Laplacian coefficient. Specifically,the n -th coefficient score is computed as S n = (cid:88) n J n · t nn (cid:80) k t nk (17)where we recognize that the weight t nn representing the contribution of the n -th Laplaciancoefficient to the n -th transformed variable is normalized with respect to the sum (cid:80) k t nk of the weights of all the contributing coefficients. Fig 3.
Graphic interpretation of the score computed for the first element in the vector ˜ l A graphical interpretation of the score is provided in Fig.3, where we represent theset of variables ˜ l belonging to the original domain (left), the set of variables x belongingto the transformed domain (center) and the corresponding marginal contributions to J (right). The relationship between ˜ l and x is given by the transformation matrix T thatblends variables from the original domain to the transformed one. Each J-divergencecomponent J n (colored circle on the right) is associated to the variable x n in thetransformed domain, which in turn is originated by many ˜ l n (shaded colored box on theleft). Thereby, J n is backprojected to the original domain by weighting its contributionas in Eq. 17. Back-projection and accumulation can also be applied by limiting thesummation in Eq.(17) to the largest ranking J n terms. The score computation allows toquantify the relevancy of the Laplacian coefficients ˜ l n for state separability, and in theexperimental results we show that it leads to meaningful results in case of real BCI data.The Algorithm 2 review the main steps of the J-divergence computation and scoringprocedures. In this section, we test the performance of the Laplacian denoising presented in section4 in improving the J-divergence of two estimated connectivity states over syntheticSoGs. To this end, we first consider a graph and a model for signals at nodes under twoconnectivity states, selected to represent an over-simplified model of brain EEG signalsfunctional connectivity; real brain signals are considered in the next section.We compare our approach with the case of laplacian without filtering (that we referas U ALL and with several of eigenvector-based filters, i.e. U L , U H ). Then, we explain indetail the analysis we performed and the related results.December 23, 2020 11/26 lgorithm 1 Graph Laplacian denoisingInput: Estimated Laplacian ˜ L Output:
Denoised Laplacian ˜ L Compute the eigen-decompositionˆ L = N − (cid:88) i =0 ˆ λ i ˆ u i ˆ u Hi Compute the denoised Laplacian ˜ L by a: Selecting the number N L , of smallest eigenvalues and the number N H of largesteigenvalues to retain b: Computing ˜ L = N L − (cid:88) i =0 ˆ λ i ˆ u i ˆ u Hi + N − (cid:88) i = N − N H ˆ λ i ˆ u i ˆ u Hi In order to validate our framework on synthetic data, we define signals under the twohypothesis H and H to obtain two distinct graph connectivity states.Under H , we model the network activity by considering H scalar generator signals s ( h ) [ kT s ] , h = 0 , · · · H −
1. Each generator signal simultaneously contributes to thesignals measured over a subset G ( h ) , h = 0 , · · · H − N × g ( h ) , h = 0 , · · · H −
1. A noise component w [ kT s ]and a common component across all the nodes b [ kT s ] · are also present. Under H ,only these latter components are observed. With these positions, we come up withthe following expression for the vector of the observed signals y [ kT s ] under the twohypothesis H and H : H : y [ kT s ] = H − (cid:88) h =0 s ( h ) [ kT s ] · g ( h ) + w [ kT s ] + b [ kT s ] · H : y [ kT s ] = w [ kT s ] + b [ kT s ] · (18)In the simulations, the noise w [ kT s ] is a realization of a discrete, stationary, whiteGaussian process, with E { w [ k ] } = 0 , E { w [ k ] w [ k ] T } = σ w I ∀ k ; the samples of discretesequences b [ kT s ] are realizations of a zero mean Gaussian random distribution withvariance σ b ; and s ( h ) [ kT s ] , h = 0 , · · · H − y [ kT s ] are obtained, we estimate the adjacency matrix. Thereare many state-of-the-art methods to perform the estimation, such as spectral coherence[42], imaginary coherence [43], phase-locking value which differently characterize braininteractions [52]of the signals at two nodes i, j as: C ij ( ω k ) = | ˆ P ij ( ω k ) | (cid:113) ˆ P i ( ω k ) · ˆ P j ( ω k ) (19)In Eq. (19), ˆ P i ( ω k ), ˆ P j ( ω k ) and ˆ P ij ( ω k ) are the the estimated auto-spectra and cross-spectrum of the signals y i [ kT s ], y j [ kT s ] at the nodes i and j , computed at the frequencyDecember 23, 2020 12/26 lgorithm 2 J-divergence and score computationInput: Conditional means µ , µ and covariance matrices K , K of ˜l = Vec(˜ l ) under H and H Output: J n , S n , n = 0 , · · · N − Step 1: Transform computation a: Compute the square root matrix Q ← K − / and the eigenvectors V and the eigenvalues σ , · · · σ N − of the eigen-decomposition Q H K Q = V diag (cid:0) σ , · · · σ N − (cid:1) V H b: Compute T as ← V H Q H and Σ ← (cid:112) ( eig ( Q H K Q ) (cid:48) ) Step 2: J-divergence computation a: Define a threshold θ b: Compute J n , n = 0 , · · · N − J n ← | η n | , ⇐⇒ (cid:0) σ n > θ (cid:1) ∪ (cid:0) | σ n − | > θ (cid:1)(cid:0) σ n − σ − n (cid:1) (cid:34) | η n | σ n σ n + σ − n (cid:0) σ n − σ − n (cid:1) (cid:35) , otherwise, Step 3: Score computation a: Compute S n , n = 0 , · · · N − S n = (cid:88) n J n · t nn (cid:80) k t nk bin ω k = 2 πN s k . Given C ij ( ω k ) as in Eq. (19), the adjacency matrix ˆ A ,estimated isaveraging across the N s frequency bins as follows:ˆ A ij = N s − (cid:88) k =0 C ij ( ω k ) (20)To sum up, our proposed signal model for synthetic data determines a simple graphconnectivity under the two hypotheses H and H . The model successfully reproduces anetwork characterized by distinguishable connectivity states in presence of controlledperturbations. In Fig. 4, the estimated adjacency matrix is plotted under the twoconditions H and H in presence of perturbations. We recognize that under H thereare no links and ˆ A fluctuates around zero because of the perturbations. Under H someconnections exist but their values are affected by the perturbations.Once we have obtained the adjacency matrix estimations under H and H , wecompute the estimated Laplacians as in Eq. (2) and then, we decompose it with itseigenvalues and eigenvectors as in Eq. (3). In order to recall the eigenvectors’ behaviour,we plot in Fig. 5 the first and the 10 th eigenvectors on graph under H hypothesis. Wecan see that the first eigenvector, in Fig. 5(a) perfectly appears smooth on the graph All the power spectral estimates are computed with Welch method, with 1s length Hanning windowsand overlap of 50%.
December 23, 2020 13/26 ig 4.
Adjacency matrix with synthetic data. ˆ A ij [ k ] is represented under H in a) and under H in panel b). and within a subset of linked nodes. Fig. 5(b) describes the 10 th eigenvector on graph.Here, the eigenvector is mostly smooth, it highlights another community, but it showshigher variability than the first eigenvector over linked nodes. Fig 5.
Eigenvectors on graph. In panel a) there is the fist eigenvector; in panel b) the 10 th eigenvector This model will be used to test the proposed Laplacian denoising algorithm. Tothis goal, we randomly produce 20 repetitions (or trials) for each statistical hypothesis,as i.i.d. realizations of our model with a fixed set of parameters. Different noise andpolarization level will be considered.
In this sub-section, we compare our laplacian-based filtering based on U L ∪ U H , shortlydenoted as U L ∪ H , with U ALL , U L and U H . Specifically, we investigate the robustness ofthe different sub-spaces with simulated data.With the final goal of measuring the sub-space robustness, for each sub-space we takeinto account two cases, namely the absence and the presence of perturbation, to whichwe refer to as the ground truth (GT) and the noisy cases, respectively. Each groundtruth sub-space is compared to different noisy cases, corresponding to σ w = 0 , . σ b = 0 , F [53] between the GT case and thenoisy configurations, varying the perturbation levels . Results of this analysis are in Fig.6. We plot F versus the trial for the subspace U H (red), the subspace U L (green), andthe subspace U L ∪ H (blue) in several perturbation conditions. In Fig. 6(a) we have thezero perturbation case, in which, not surprisingly, F = 0 for every filter and every trial.With a gradual increase of perturbations (i.e. noise only case in Fig.6(b) or polarization For each sub-space configuration, we compute the Frobenius distance F between its noisy and GTversions. See Definition 2 in [53] for the mathematical formulation.
December 23, 2020 14/26nly case in Fig.6(c)), the most favorable case is with U H for almost every trial. Whenperturbations dramatically increase Fig.6(d), performances decrease in particular for U H filter. In this figure, we do not report results for U ALL case because F = 0 for all thetrials with all the eigenvectors. Fig 6.
Results of Frobenius distance on synthetic data. Several perturbation configuration arerepresented: in panel a) σ w =0 and σ b =0,in panel b) σ w =1 . σ b =0,in panel c) σ w =0 and σ b =2 and in panel d) σ w =1 . σ b =2. In the different colors (in the legend) we represent thedifferent sub-spaces. It is then clear that the eigenvectors in U H are significantly robust. This is notsurprising, since in classical signal processing U H larger eigenvectors are used because oftheir advantages in signal-to-noise-ratio (SNR). Still, the subspace U L ∪ H maintains therobustness, while being relevant to describe the inherent topology of the graph. In thenext results, we show that the Laplacian denoising leveraging the subspace U L ∪ H leadsto better distinguishable connectivity states in absence and in presence of perturbation. Fig 7.
Results of J-divergence analysis on synthetic data. Several perturbation configurationare represented: in panel a) σ w =0 and σ b =0, in panel b) σ w =1 . σ b =0, in panel c) σ w =1 . σ b =2. In the different colors (shown in the legend) we represent the different sub-spacesfor the filtering. Finally, we test the ability to separate graph Laplacians under the hypotheses H and H . The J-divergence analysis presented in section 5 ends with a measure of theDecember 23, 2020 15/26tatistical distance J between two states. Here, the analysis of separability between twostates is applied to graph laplacian in simulated scenario with the goal of comparingthe discriminant ability of our denoising method with respect to the other sub-spaceconfigurations, i.e. U ALL , U L and U H .Here the two conditions are the two hypothesis H and H and we perform simulationsfor several perturbation levels. In every case, we compute the total J as a measureof statistical distance between the two conditions and we also evaluate the marginal J n as measure of the contribution of each n-variable to the final separability. Table 2collects J-divergence values for several perturbation sets and for different sub-spaces.Results show that in absence of perturbations the most favorable case is U ALL . Thisresult is intuitive because in absence of perturbations there is no reason why reducedsub-spaces should better discriminate. Interestingly, increasing perturbations (i.e. noiseand polarization), the most favorable case is U L ∪ H , which gives the highest J , i.e. thebest separability. It means that graph laplacian denoising through U L ∪ H preserves thehighest separability between the two hypothesis, even in presence of strong perturbation.Fig 7, shows J n behavior as function of the first 20 variables only for U ALL and U L ∪ H cases. These representations make clear the contribution of n variables to the final J .From Fig 7 we recognize that increasing perturbations, variables in U L ∪ H generally givehigher J n contributions compared to U ALL . Table 2.
J-divergence values on synthetic data. We report in bold characters the highestJ-divergence value for each perturbation configuration.
Our results with synthetic data show that in presence of perturbations our laplacianfiltering succeeds in distinguishing graphs under two conditions. This conclusion remainstrue if the system is perturbed by noise but also if there is an artefact of different nature,i.e. a common artifact that we indicated as polarization and which can represent realphenomena.
In this section, we present experimental results of our graph laplacian filtering on realdata, recorded during motor-imagery BCI experiments. In this case the H and H hypothesis directly correspond to he hypothesis that subject performs motor imagery( H ) or he/she is in resting state ( H ). The study was conducted on twenty healthy subjects (aged 27 . ± .
01 years, 8 women),all right-handed. All the subjects, which did not present any disorder, received financialcompensation for their participation and they signed a written informed consent. Theethical committee CPP-IDF-VI of Paris approved the experimental protocol. Duringthe BCI experiments, every subject was in front of a screen with a target. Subjects wereinstructed to perform right hand - motor imagery task when the target was up, whileremaining at rest when the target was down [54]. A 74-channel system was used toDecember 23, 2020 16/26ecord EEG data in a standard 10-10 configuration. The reference for EEG were set tomastoid signals; the ground electrode was located on the left scalpula; and impedenceswere lower than 20 kOhms. Sampling frequency for EEG recordings was 1 kHz, and thendownsampled to 250 Hz. For each subject, recorded sequences have been segmented toobtain N T trials of motor imagery and N T trials of resting state. The total length ofeach trial was 5s.EEG data analysis was preceded by a pre-processing stage. Specifically, an Indepen-dent Component Analysis (ICA) was performed to eliminate artifacts, such as ocularand cardiac signals [55]; in particular, the Infomax Algorithm [56] was implemented withFieldtrip toolbox [57]. Here, we perform the J-divergence analysis on real motor-imagery data. To this aim,we take EEG signals from one subject and N T trials for H and N T trials for H , with N T = 20. We use spectral coherence to build the connectivity matrix, as in Eq. (19).Then, the estimated adjacency matrix ˆ A is computed as in Eq. (20), and thanks tothe Eq. (2), we can derive ˆ L . As for synthetic data, we compute the filtered graphlaplacian ˜ L with the subset U L ∪ H . We compare results on real data with U ALL , U H , U L .In each case, we compute J-divergence J as in Eq. (10) and the marginal contribution J n associated to the n -th variable as in Eq.(15). Table 3.
J-divergence values on real data. We report in bold characters the highest J-divergencevalue.
Fig 8.
Results of J-divergence analysis for real data. We report the CJ n in Eq.(21) as functionof the involved variables. In the different colors (shown in the legend) we represent the differentsub-spaces used for filtering. In Table 3, we report J-divergence results for each sub-space configuration. Comparingall the methods, the highest J value relates to U L ∪ H case. This result is very importantbecause it means that the sub-space U L ∪ H is suitable to separate real EEG data and itis useful to correctly detect the subject mental state.With the aim of understanding the contributions of different variables, we firstlycompute the J n marginal contributions to obtain a weight to each variable in thetransformed domain. Then, we compute the cumulative sum of the first n variables.December 23, 2020 17/26nce the J n vector is sorted, we evaluate the cumulative J-divergence CJ n to investigatethe impact of the variables to the total J-divergence, as follows: CJ n = n (cid:88) k =1 J k (21)Results in Fig. 8 show that the cumulative sum of the first 20 variables, is always higherfor U L ∪ H than all the other sub-space configurations. In other words, if a given numberof variables are retained, the overall achieved J-divergence is always larger using theproposed denoising algorithm. This confirms the contribution of the proposed Laplaciandenoising to discrimination of the two mental states. β band Given that the filtering with U L ∪ H enables a better discrimination between motorimagery and resting state, we now exploit the above introduced scoring procedure todetermine, based on an information theoretic grounded criterion, which connectivitycoefficients mostly contribute to separate the resting and motor imagery states.To proceed, it is important to underline that the brain response to motor tasks ingeneral is not uniform across the frequencies, but it is mostly evident in α (8-13 Hz) or β (14-29 Hz) band [58], depending on the subject.As a proof of concept, we show results in β band, but in a training BCI scenario, thefrequency band of interest can be tuned according to the subject response. Here, wefilter the connectivity matrix in the selected frequency band, as follows:ˆ A ij = (cid:88) ω k /T s ∈ β C ij ( ω k ) (22)Thereby, having stated that the denoising based on U L ∪ H sub-space provides bestresults in separating Laplacians under H and H , we restrict the analysis to U L ∪ H and U ALL for score analysis in β band. We compute the scores as described in section 5(A)and we report the score results in Fig. 9. In the first row, we collect the results referringto extra diagonal elements of ˜ L , i.e. links, and in the second row, we report results fordiagonal elements, i.e. nodes weights. Besides, on the left and right columns we providethe results achieved without and with application of the proposed Laplacian denoising.The first interesting observation is that, also when the analysis is restricted to the β band, application of the proposed denoising improves the J-divergence of the observedconnectivity states (from 79 .
47 to 160 . . .
7. Furthermore, the scoring obtained without denoising is larger on nodeslocated in frontal, temporal or parietal area, such as
F P Z and P . After denoising, thescores are more pronounced on sensory-motor areas, and we are able to pinpoint somemore relevant nodes, such as C and F C . Let us now analyze the score associated tolinks’ weigths (Laplacian extra- diagonal elements). In absence of denoising, recognizingcontributions of different brain areas is difficult because all the link weights are generallylow, ie. between 0 and 0 . .
42, the strongest links are localized insensory-motor areas, and links connecting contro-lateral motor areas, such as CP − C rank highest.Thereby, the scoring based on the denoised Laplacian provides a mean for analysisand interpretation of the observed connectivity states.December 23, 2020 18/26 ig 9. Results of score computation for real data. We report results for U ALL subspace filteringin panels a,c);for U L ∪ H subspace filtering in panels b,d). In the first line, score values referto links (i.e. extra-diagonal elements) and in the second line, they relate to elements in theprincipal diagonal (i.e. nodes). For sake of clarity, in all the figures we plot the 20 nodes orlinks with highest score. β band BCIs aim to provide real time interaction between the subject and the interface [54, 59];thereby, reducing the observation time T oss for Laplacian estimation is beneficial forpotential applicability to online motor-imagery BCI. With this application frameworkin mind, we test the Laplacian denoising when the observation time window length isreduced to T oss = 1 s . In the following we consider a moving window of length T oss = 1 s and shift it by m ∆ t, m = 0 · · · M −
1, with M = 9 ∆ t = 0 . m -th temporal interval, m = 0 · · · M − H ) and motor imagery ( H ) state. Then we derive the conditional ( H , H ) estimatedadjacency matrix ˆ A as in Eq.(22), the estimated graph laplacian ˆ L as in Eq. (3), andits denoised version ˜ L as in Eq.(7). Then, we evaluate the J-divergence between thetwo hypotheses as in Eq. (15). Finally, we average the J obtained on the m -th window m = 0 · · · M − m, m = 0 · · · M −
1. Our findings showthat in the majority of the considered time intervals, i.e. on 7 intervals out of 9, thedenoised Laplacian ˜ L , leveraging the U L ∪ H subspace, leads to higher J-divergence thanthe estimated Laplacian ˆ L ( U ALL subspace). This result is really interesting because itshows that, even with short time-interval, our method succeeds in separating the twomental states. The above findings on real EEG data show that the proposed LaplacianDecember 23, 2020 19/26 ig 10.
Results of J-divergence analysis over a moving window on real data. We plot theJ-divergence over M = 9, 1s long, time intervals with 50% overlapping, versus the time intervalindex. The J-divergence is computed in β band and averaged across subjects. denoising applies also on short time-windows and improves the potential to correctlydetect motor imagery state. This paves the way to application of the proposed Laplaciandenoising to real BCI applications. This work has proposed a Laplacian denoising algorithm for the purpose of graphconnectivity states detection. A novel formulation of the Jensen divergence has beenderived. The J-divergence formulation is used to quantify the performance of the denoisingalgorithm, as well as to attribute a score to the Laplacian coefficients in terms of theircontribution to the connectivity states separability. The Laplacian denoising algorithmperformances are assessed by numerical simulations on synthetic data. Furthermore, theLaplacian denoising algorithm has been applied to real EEG data acquired within motorimagery BCI experiments. The proposed Laplacian denoising improves the separationof the two mental states of motor imagery and resting state, even under restrainedobservation time intervals. Besides, the J-divergence based scoring sheds light overthe contribution of different connectivity coefficients to motor imagery state detection.Thereby, the proposed approach is promising for the robust detection of connectivitystates while being appealing for implementation in real-time BCI applications.
Appendix A Theorem Let us consider the problem of binary classification of Gaussian variables H : x ∼N ( , I ), H : x ∼ N (cid:0) η , Σ (cid:1) , corresponding to the uncommon mean, uncommoncovariance case, by means of the LLRT formulation in Eq.(11). By simple algebraicmanipulation, we recognize that the test R ( x ) H ↑ ≷ ↓ H t (cid:48) corresponds to: R (cid:48) ( x ) = P (cid:88) n =1 σ n − σ n (cid:12)(cid:12)(cid:12)(cid:12) x n + η n σ n − (cid:12)(cid:12)(cid:12)(cid:12) (cid:124) (cid:123)(cid:122) (cid:125) P quadratic terms + N (cid:88) n = P +1 η n · x n (cid:124) (cid:123)(cid:122) (cid:125) N − P linear terms H ↑ ≷ ↓ H t (cid:48)(cid:48) (23)December 23, 2020 20/26ith t (cid:48)(cid:48) = t (cid:48) + (cid:80) Pn =1 | η n | (cid:2) σ n (cid:0) σ n − (cid:1)(cid:3) − .Let us consider the linear-quadratic observation space Ξ of the N -dimensional randomvector ξ def = [ ξ . . . ξ N ] T defined as (see Eq. (14)) ξ n = (cid:18) x n + η n σ n − (cid:19) = x n + 2 x n η n σ n − (cid:18) η n σ n − (cid:19) ; n = 1 , . . . , Pξ n = x n , n = P + 1 , . . . , N (24)In the space Ξ the LLRT R (cid:48) ( x ) H ↑ ≷ ↓ H t (cid:48)(cid:48) rewrites as follows: N − (cid:88) a LLR ,n ξ n = a H LLR · ξ H ↑ ≷ ↓ H t (cid:48)(cid:48) (25)where the elements of a LLR def = [ a LLR , , . . . , a LLR ,N ] T are: a LLR ,n def = σ n − σ − n σ n for n = 1 , P η n for n = P + 1 , N (26)With these positions, J def = E {R ( x ) |H } − E {R ( x ) |H } = E {R (cid:48) ( x ) |H } − E {R (cid:48) ( x ) |H } = N − (cid:88) n =0 a LLR ,n (E { ξ n |H } − E { ξ n |H } ) (27)By computing the above expectations it can be straightforwardly shown that the n -thterm a LLR ,n (E { ξ n |H } − E { ξ n |H } ) of the above sum equals to J n ( σ n ,η n ) = σ n − σ − n σ n (cid:18) σ n + η n + 2 η n σ n − − (cid:19) = (cid:0) σ n − σ − n (cid:1) (cid:20)(cid:0) σ n − σ − n (cid:1) + η n σ n σ n + 1 σ n − (cid:21) = (cid:0) σ n − σ − n (cid:1) (cid:34) η n σ n σ n + 1 (cid:0) σ n − σ − n (cid:1) ( σ n − (cid:35) = (cid:0) σ n − σ − n (cid:1) (cid:34) η n σ n σ n + σ − n (cid:0) σ n − σ − n (cid:1) (cid:35) , n = 1 , · · · PJ n ( η n ) = 2 η n , n = P, · · · N − . (28)and J ( η n ) n = 2 η n , n = P, · · · N − . (29)QED.December 23, 2020 21/26 cknowledgements FD acknowledges support from the Agence Nationale de la Recherche through contractnumber ANR15 − NEUC − −
02; and the European Research Council (ERC) underthe European Union’s Horizon 2020 research and innovation programme (grant agreementNo. 864729). TC acknowledges Juliana Gonzalez-Astudillo for useful discussions andsuggestions.
References
1. A. M. Bastos and J.-M. Schoffelen, “A tutorial review of functional connectiv-ity analysis methods and their interpretational pitfalls,”
Frontiers in systemsneuroscience , vol. 9, p. 175, 2016.2. M. Newman, “Networks: an introduction: Oxford university press, inc,”
Nielsen,M., Elmes, G. and Kipyatkov , vol. 1999, 2010.3. L. Torres, A. S. Blevins, D. S. Bassett, and T. Eliassi-Rad, “The why, how, andwhen of representations for complex systems,” arXiv preprint arXiv:2006.02870 ,2020.4. J. Gonzalez-Astudillo, T. Cattai, G. Bassignana, M.-C. Corsi, and F. D. V. Fallani,“Network-based brain computer interfaces: principles and applications,”
Journalof Neural Engineering , 2020.5. J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M.Vaughan, “Brain–computer interfaces for communication and control,”
Clinicalneurophysiology , vol. 113, no. 6, pp. 767–791, 2002.6. M. C. Thompson, “Critiquing the concept of bci illiteracy,”
Science and Engineer-ing Ethics , vol. 25, no. 4, pp. 1217–1233, 2019.7. A. Ortega, P. Frossard, J. Kovaˇcevi´c, J. M. Moura, and P. Vandergheynst, “Graphsignal processing: Overview, challenges, and applications,”
Proceedings of theIEEE , vol. 106, no. 5, pp. 808–828, 2018.8. A. Sandryhaila and J. M. Moura, “Discrete signal processing on graphs,”
IEEEtransactions on signal processing , vol. 61, no. 7, pp. 1644–1656, 2013.9. D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst,“The emerging field of signal processing on graphs: Extending high-dimensionaldata analysis to networks and other irregular domains,”
IEEE signal processingmagazine , vol. 30, no. 3, pp. 83–98, 2013.10. W. Huang, T. A. Bolton, J. D. Medaglia, D. S. Bassett, A. Ribeiro, and D. VanDe Ville, “A graph signal processing perspective on functional brain imaging,”
Proceedings of the IEEE , vol. 106, no. 5, pp. 868–885, 2018.11. J. D. Medaglia, W. Huang, E. A. Karuza, A. Kelkar, S. L. Thompson-Schill,A. Ribeiro, and D. S. Bassett, “Functional alignment with anatomical networksis associated with cognitive flexibility,”
Nature human behaviour , vol. 2, no. 2,pp. 156–164, 2018.12. L. Scharf and B. Van Veen, “Low rank detectors for gaussian random vectors,”
IEEE Transactions on Acoustics, Speech, and Signal Processing , vol. 35, no. 11,pp. 1579–1582, 1987.December 23, 2020 22/263. L. L. Scharf,
Statistical signal processing , vol. 98. Addison-Wesley Reading, MA,1991.14. A. Pezeshki, L. L. Scharf, J. K. Thomas, and B. D. Van Veen, “Canonicalcoordinates are the right coordinates for low-rank gauss–gauss detection andestimation,”
IEEE Transactions on Signal Processing , vol. 54, no. 12, pp. 4817–4820, 2006.15. V. Trees and L. Harry,
Detection, Estimation, and Modulation Theory-Part l-Detection, Estimation, and Linear Modulation Theory . John Wiley & Sons NewYork, 2001.16. H. V. Poor,
An introduction to signal detection and estimation . Springer Science& Business Media, 2013.17. S. Kullback,
Information theory and statistics . Courier Corporation, 1997.18. S. Theodoridis and K. Koutroumbas, “Pattern recognition, edition,” 2009.19. J. Sch¨urmann,
Pattern classification: a unified view of statistical and neuralapproaches . John Wiley & Sons, Inc., 1996.20. B. Picinbono and P. Duvaut, “Detection and contrast,” in
Stochastic processes inunderwater acoustics , pp. 181–203, Springer, 1986.21. B. Picinbono and P. Devaut, “Optimal linear-quadratic systems for detection andestimation,”
IEEE Transactions on Information Theory , vol. 34, no. 2, pp. 304–311,1988.22. B. Picinbono, “On deflection as a performance criterion in detection,”
IEEETransactions on Aerospace and Electronic Systems , vol. 31, no. 3, pp. 1072–1081,1995.23. P. Chevalier and B. Picinbono, “Complex linear-quadratic systems for detectionand array processing,”
IEEE transactions on signal processing , vol. 44, no. 10,pp. 2631–2634, 1996.24. E. Bullmore and O. Sporns, “Complex brain networks: graph theoretical analysisof structural and functional systems,”
Nature reviews neuroscience , vol. 10, no. 3,pp. 186–198, 2009.25. S. Segarra, S. P. Chepuri, A. G. Marques, and G. Leus, “Statistical graph signalprocessing: Stationarity and spectral estimation,” in
Cooperative and Graph SignalProcessing , pp. 325–347, Elsevier, 2018.26. S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang, “Complexnetworks: Structure and dynamics,”
Physics reports , vol. 424, no. 4-5, pp. 175–308,2006.27. N. Ghoroghchian, D. M. Groppe, R. Genov, T. A. Valiante, and S. C. Draper,“Node-centric graph learning from data for brain state identification,”
IEEETransactions on Signal and Information Processing over Networks , vol. 6, pp. 120–132, 2020.28. G. Frusque, J. Jung, P. Borgnat, and P. Gon¸calves, “Multiplex network inferencewith sparse tensor decomposition for functional connectivity,”
IEEE transactionson Signal and Information Processing over Networks , vol. 6, pp. 316–328, 2020.December 23, 2020 23/269. W. Huang, L. Goldsberry, N. F. Wymbs, S. T. Grafton, D. S. Bassett, andA. Ribeiro, “Graph frequency analysis of brain signals,”
IEEE journal of selectedtopics in signal processing , vol. 10, no. 7, pp. 1189–1203, 2016.30. M. Rabbat, M. Coates, and S. Blouin, “Graph laplacian distributed particlefiltering,” in ,pp. 1493–1497, IEEE, 2016.31. L. Rui, H. Nejati, and N.-M. Cheung, “Dimensionality reduction of brain imagingdata using graph signal processing,” in , pp. 1329–1333, IEEE, 2016.32. J. Wang, D. B. Aydogan, R. Varma, A. W. Toga, and Y. Shi, “Topographicregularity for tract filtering in brain connectivity,” in
International Conference onInformation Processing in Medical Imaging , pp. 263–274, Springer, 2017.33. P. C. Petrantonakis and I. Kompatsiaris, “Single-trial nirs data classification forbrain–computer interfaces using graph signal processing,”
IEEE Transactions onNeural Systems and Rehabilitation Engineering , vol. 26, no. 9, pp. 1700–1709,2018.34. M. E. Spencer, R. M. Leahy, J. Mosher, and P. Lewis, “Adaptive filters for moni-toring localized brain activity from surface potential time series,” in
ASILOMARCONFERENCE ON SIGNALS SYSTEMS AND COMPUTERS , pp. 156–156,COMPUTER SOCIETY PRESS, 1992.35. P. Strobach, K. Abraham-Fuchs, and W. Harer, “Event-synchronous cancellationof the heart interference in biomedical signals,”
IEEE transactions on biomedicalengineering , vol. 41, no. 4, pp. 343–350, 1994.36. M. Chen, R. T. Wakai, and B. Van Veen, “Eigenvector based spatial filtering offetal biomagnetic signals,”
Journal of perinatal medicine , vol. 29, no. 6, pp. 486–496, 2001.37. Y. Zhang, D. H. Brooks, M. A. Franceschini, and D. A. Boas, “Eigenvector-based spatial filtering for reduction of physiological interference in diffuse opticalimaging,”
Journal of biomedical optics , vol. 10, no. 1, p. 011014, 2005.38. J. Li, P. Shang, and X. Zhang, “Time series irreversibility analysis using jensen–shannon divergence calculated by permutation pattern,”
Nonlinear Dynamics ,vol. 96, no. 4, pp. 2637–2652, 2019.39. F. Nielsen, “On a generalization of the jensen-shannon divergence,” arXiv preprintarXiv:1912.00610 , 2019.40. J. Giles, K. K. Ang, L. S. Mihaylova, and M. Arvaneh, “A subject-to-subjecttransfer learning framework based on jensen-shannon divergence for improvingbrain-computer interface,” in
ICASSP 2019-2019 IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP) , pp. 3087–3091, IEEE, 2019.41. K. J. Friston, “Functional and effective connectivity: a review,”
Brain connectivity ,vol. 1, no. 1, pp. 13–36, 2011.42. G. C. Carter, “Coherence and time delay estimation,”
Proceedings of the IEEE ,vol. 75, no. 2, pp. 236–255, 1987.December 23, 2020 24/263. G. Nolte, O. Bai, L. Wheaton, Z. Mari, S. Vorbach, and M. Hallett, “Identifyingtrue brain interaction from eeg data using the imaginary part of coherency,”
Clinical neurophysiology , vol. 115, no. 10, pp. 2292–2307, 2004.44. E. Ollila, D. E. Tyler, V. Koivunen, and H. V. Poor, “Complex ellipticallysymmetric distributions: Survey, new results and applications,”
IEEE Transactionson signal processing , vol. 60, no. 11, pp. 5597–5625, 2012.45. E. Ceci and S. Barbarossa, “Graph signal processing in the presence of topologyuncertainties,”
IEEE Transactions on Signal Processing , vol. 68, pp. 1558–1573,2020.46. U. Von Luxburg, “A tutorial on spectral clustering,”
Statistics and computing ,vol. 17, no. 4, pp. 395–416, 2007.47. M. Basseville, “Divergence measures for statistical data processing—an annotatedbibliography,”
Signal Processing , vol. 93, no. 4, pp. 621–633, 2013.48. Y.-M. Chang, N.-J. Hsu, and H.-C. Huang, “Semiparametric estimation andselection for nonstationary spatial covariance functions,”
Journal of Computationaland Graphical Statistics , vol. 19, no. 1, pp. 117–139, 2010.49. P. Guttorp and A. M. Schmidt, “Covariance structure of spatial and spatiotemporalprocesses,”
Wiley Interdisciplinary Reviews: Computational Statistics , vol. 5, no. 4,pp. 279–287, 2013.50. M. Greco, S. Fortunati, and F. Gini, “Maximum likelihood covariance matrixestimation for complex elliptically symmetric distributions under mismatchedconditions,”
Signal Processing , vol. 104, pp. 381–386, 2014.51. A. Ali, M. Testa, T. Bianchi, and E. Magli, “Biometricnet: deep unconstrained faceverification through learning of metrics regularized onto gaussian distributions,” arXiv preprint arXiv:2008.06021 , 2020.52. T. Cattai, S. Colonnese, M.-C. Corsi, D. S. Bassett, G. Scarano, and F. D. V.Fallani, “Characterization of mental states through node connectivity betweenbrain signals,” in ,pp. 1377–1381, IEEE, 2018.53. O. M. Baksalary and G. Trenkler, “On subspace distances determined by thefrobenius norm,”
Linear Algebra and its Applications , vol. 448, pp. 245–263, 2014.54. J. R. Wolpaw, D. J. McFarland, T. M. Vaughan, and G. Schalk, “The wadsworthcenter brain-computer interface (bci) research and development program,”
IEEETransactions on Neural Systems and Rehabilitation Engineering , vol. 11, no. 2,pp. 1–4, 2003.55. A. Delorme, T. Sejnowski, and S. Makeig, “Enhanced detection of artifactsin eeg data using higher-order statistics and independent component analysis,”
Neuroimage , vol. 34, no. 4, pp. 1443–1449, 2007.56. A. J. Bell and T. J. Sejnowski, “An information-maximization approach to blindseparation and blind deconvolution,”
Neural computation , vol. 7, no. 6, pp. 1129–1159, 1995.57. R. Oostenveld, P. Fries, E. Maris, and J.-M. Schoffelen, “Fieldtrip: open sourcesoftware for advanced analysis of meg, eeg, and invasive electrophysiological data,”
Computational intelligence and neuroscience , vol. 2011, p. 1, 2011.December 23, 2020 25/268. Y. Meirovitch, H. Harris, E. Dayan, A. Arieli, and T. Flash, “Alpha and betaband event-related desynchronization reflects kinematic regularities,”
Journal ofNeuroscience , vol. 35, no. 4, pp. 1627–1637, 2015.59. J. Wolpaw and E. W. Wolpaw,
Brain-computer interfaces: principles and practice .OUP USA, 2012.60. P. Shenoy, M. Krauledat, B. Blankertz, R. P. Rao, and K.-R. M¨uller, “Towardsadaptive classification for bci,”