Construction of embedded fMRI resting state functional connectivity networks using manifold learning
aa r X i v : . [ q - b i o . N C ] M a y C ONSTRUCTION OF EMB EDDED F
M RI
R ESTING STATEFUNC TIONAL C ONNECTIVITY NETWOR KS USING MANIFOLDLEAR NING
A P
REPRINT
Ioannis K. Gallos
School of Applied Mathematical and Physical SciencesNational Technical University of Athens, Greece [email protected]
Evangelos Galaris
Dipartimento di Matematica e ApplicazioniUniversità degli Studi di Napoli Federico II [email protected]
Constantinos I. Siettos ∗ Dipartimento di Matematica e ApplicazioniUniversità degli Studi di Napoli Federico IITel.: +39 0816-75615 [email protected]
May 27, 2020 A BSTRACT
We construct embedded functional connectivity networks (FCN) from benchmark resting-state func-tional magnetic resonance imaging (rsfMRI) data acquired from patients with schizophrenia andhealthy controls based on linear and nonlinear manifold learning algorithms, namely, Multidimen-sional Scaling (MDS), Isometric Feature Mapping (ISOMAP) and Diffusion Maps. Furthermore,based on key global graph-theoretical properties of the embedded FCN, we compare their classifi-cation potential using machine learning techniques. We also assess the performance of two metricsthat are widely used for the construction of FCN from fMRI, namely the Euclidean distance and thelagged cross-correlation metric. We show that the FCN constructed with Diffusion Maps and thelagged cross-correlation metric outperform the other combinations. K eywords resting-state fMRI · Functional Connectivity Networks · Schizophrenia · Manifold Learning · MachineLearning
Over the past years, functional magnetic resonance imaging (fMRI) has been widely used for the identification of brainregions that are related to both functional segregation and integration. Regarding functional segregation, the conven-tional analysis relies on the identification of the activated voxels based on functional response models and multivariatestatistics between experimental conditions (e.g. resting state vs. task-stimulated activity). A representative exampleis the General Linear Model (GLM) that is implemented in well established software packages such as SPM [1] andFSL [2]. On the other hand, for the assessment of functional integration, there is a distinction between functional andeffective connectivity [3]. Functional connectivity (FC) analysis seeks for statistical dependencies (e.g. correlations,coherence) between brain regions. Effective connectivity (EC) analysis [3] tries to reveal the influence that one neuralsystem exerts on another [3]. A detailed review on the differences between FC and EC approaches can be found in [3].Here, we focus on the construction of FCN based on resting-state fMRI (rsfMRI) recordings. In rsfMRI, there is nostimuli and thus the assessment of functional integration is more complex and not so straightforward compared to task-related experiments [4]. Furthermore, spontaneous/ resting state brain activity as measured with fMRI has been also ∗ Corresponding author: [email protected]
PREPRINT - M AY
27, 2020considered as a potential biomarker in psychiatric disorders (see e.g. the review of Zhou et al. [5]). In general, for theconstruction of FCN, two basic general frameworks are explored: (a) Seed-based Analysis (SBA) and (b) IndependentComponent-based Analysis (ICA). In the SBA [6], the (averaged) fMRI signals of the regions of interest (ROI) arecorrelated with each other; correlations above a threshold are considered functional connections between seeds/ROIs.Even though the SBA has been proved extremely useful in identifying functional networks of specific brain regions[7, 8, 9], its major disadvantage is the requirement of the a-priori knowledge of the functional organization of thebrain, while possible correlations between seeds can be due to structured spatial confounds (e.g. scanner artifacts)[10]. Furthermore, seed selection is based on standard coordinates, while at the subject level, anatomical differencesmay lead to the selection of functionally irrelevant voxels at the group level. Thus, despite the use of normalizationtechniques, the accuracy of this approach is shown to be limited [11]. On the other hand, ICA [12] has arisen as analternative/complementary approach since the early 2000s [13, 14, 15]. ICA decomposes the 4D fMRI data to a setof spatial components with maximum statistical independence and their associated time series. Smith et al.[16] ina meta-analytic study of 30,000 rsfmRI scans with the aid of ICA revealed a functional “partition" of the brain intoresting-state Networks (RSNs), such as the Sensorimotor, Default mode and Auditory networks. Applications of ICAinclude also data pre-processing, where noise-related components are regressed out from the original fMRI signals[17]. However, while ICA produces spatial components that are statistically independent to each other, there is noclear link between the spatial components and specific brain functions and furthermore the spatial components cannotin general be ordered by relative importance [10]. Another issue is that most of the standard algorithms that computeICs utilize gradient based optimization algorithms that use an iterative scheme; the initial guesses in these algorithmsare generated randomly making the whole process stochastic: for the same dataset, the obtained spatial componentsmay differ significantly over repeated runs [18]. Thus, the robustness/reproducibility of the ICA results over repeatedruns may be questioned.In order to tackle the above issues, several techniques have been proposed for the classification of ICs and the con-struction of subject specific ROIs [19, 20]. Advances have been also made regarding the selection of the model orderof the ICA decomposition, such as the Bayesian dimensionality estimation technique [13] and the use of theoreticalinformation criteria for model order selection [21]. Finally, the so-called ranking and averaging ICA by reproducibil-ity (RAICAR) [20, 18] (see also [10] for a critical discussion) aims at resolving issues regarding stochasticity androbustness of the ICA decomposition. RAICAR utilizes a sufficient number of ICA realizations and based on the re-producibility of the ICs aims to rank them in terms of the most “reliable" components. Reliable ICs among realizationsare assessed via correlations and the final estimate of each component is averaged.Alternatively and/or complementary to the above analysis, linear manifold learning algorithms such as Principal Com-ponent Analysis (PCA) [22, 23, 24] and Multidimensional Scaling (MDS) [25, 26] have been exploited. PCA hasbeen succesfully applied in the pre-processing routine for dimensionality reduction (often prior to ICA) [27]. Appli-cations of PCA, include also the recovery of signals of interest [28] and the construction of FCN from fMRI scans intask-related experiments [23, 24]. In these studies, the performance of PCA with respect to the detection of regions ofcorrelated voxels has been shown to be satisfactory but not without problems. For example, a study by Baumgartner etal. [24] highlighted the limits of PCA to correctly identify activation of brain regions in cases of low contrast-to-noiseratios (CNR) appearing when signal sources of e.g. physiological noise are present [29].MDS has been also widely used in fMRI (mostly for task-based studies) mainly for the identification of similaritiesbetween brain regions in terms of voxel-wise connectivity [30, 31, 32, 33, 34, 35]. The implementation of MDS inneuroimaging dates back to the work of Friston et al. [26]. There, it was investigated the embedded (voxel-wise)connectivity of fmRI data acquired during tasks of word generation between healthy and schizophrenia subjects. Sal-vador et al. [36] used MDS to investigate the embedded connectivity of anatomical regions of the brain from rsfMRIdata. Benjaminsson et al. [37] used MDS to embed high-dimensional rsfMRI data from the mutual information spaceto a low dimensional euclidean space for the identification of RSNs. Herve et al. [38] used MDS to acquire a lowdimensional approximation of interregional correlations for the investigation of the affective speech comprehension.Finally, in a meta-analytic study by Etkin et al. [39], MDS was exploited to provide a low-dimensional visualization ofco-activation interrelations of ROIs. MDS has been also used in works investigating the functional (dys)connectivityassociated with schizophrenia [40] and Asperger’s Syndrome [41].However, thus far, only a few studies have exploited non-linear manifold learning algorithms such as Local LinearEmbedding (LLE) [42], Isometric Feature Mapping (ISOMAP) [43] and Diffusion Maps [44] for the analysis of fMRIdata and particularly for the construction of FCN. LLE has been applied in rsfMRI studies for the improvement ofpredictions in ageing studies [45] for low-dimensional clustering towards the classification of healthy subjects andpatients with schizophrenia [46] and as an alternative method for dimensionality reduction before the application ofICA in task-related fMRI where non-linear relationships in the BOLD signal are introduced [47].In Anderson et al. [48], ISOMAP was employed to a benchmark rsfMRI dataset of 146 subjects for the constructionof embedded low-dimensional FCN for the classification between controls and schizophrenia patients. ROIs wereselected using single-subject ICA and the similarities between the ICA components were assessed using a pseudo-distance measure based on lagged cross-correlation. Graph-theoretical measures were then used for classification2
PREPRINT - M AY
27, 2020between patients and healthy controls. Another study based on single-subject ICA, exploited ISOMAP to classifyspatially unaligned fMRI scans [49]. The study considered patients with schizophrenia versus healthy controls anddifferent age groups (young/old) of healthy controls versus patients with alzheimer’s disease. Despite the relativelylow sample sizes, results were promising with good classification rates. Recently, Haak et al. [50] utilized ISOMAP inan effort to create individualised connectopies from rsfMRI recordings taken from the WU-Minn Human ConnectomeProject in a fully data-driven manner.Thus, only a handful of studies have used Diffusion Maps for the analysis of fMRI data, focused on the clustering ofspatial maps of task-related experiments [51, 52]. Shen et al. [51] employed Diffusion Maps to separate activated fromnon-activated voxels. Sipola et al.[52] used Diffusion Maps with a Gaussian kernel to cluster selected fMRI spatialmaps that are derived by ICA. They demonstrated their approach using fMRI recordings acquired from healthy par-ticipants listening to a stimulus with a rich musical structure. Other applications of Diffusion Maps in neuroimagingconcern the epileptic-seizure prediction and the identification pre-seizure state in EEG timeseries [53, 54]. A reviewon the intersection between manifold learning methods and the construction of FCN can be found in [55] and [56].Here, we used MDS, ISOMAP and Diffusion Maps to construct embedded FCN from single-subject ICA analysisof rsfMRI data taken from healthy controls and schizophrenia patients. For our demonstrations, we used the CO-BRE rsfMRI data that are publicly available and have been used recently in many studies [57, 58, 48, 59]. Basedon key global graph-theoretical measures of the embedded graphs, we assessed their classification efficiency usingseveral machine learning algorithms, namely Linear standard Support Vector Machines (LSVM), Radial (Radial ba-sis function kernel) Support Vector Machines (RSVM), k-Nearest Neighbour (k-NN), and Artificial Neural Networks(ANN). We also investigated the performance of two most-commonly used metrics, namely the cross-correlation andthe euclidean distance. Our analysis showed that Diffusion Maps outperformed all other methods and lagged cross-correlation proved to be a better choice than the euclidean metric.At this point, we should note, that our study does not aim at the best classification performance by trying to find thebest possible pre-processing pipe-line of the raw fMRI data and/or the selection of subjects and/or the selection of thebest set of graph-theoretical measures that provide the maximum classification. Yet, we aim at using state-of-the-artmanifold learning methods for the construction of the FCN and compare their classification efficiency using only a fewkey global theoretical graph-measures and compare the obtained results with those derived by similar studies (see e.g.[48]) using the same pipe-line for data pre-processing and single-subject ICA. To the best of our knowledge, this paperis the first to perform such a thorough comparative analysis of both linear and nonlinear manifold learning on rsfMRIdata. It is also the first study to show how Diffusion Maps can be used for the construction of FCN from rsfMRI,assessing also the efficiency of two basic distance metrics, the cross-correlation-based and the Euclidean distance.
For our demonstrations, we performed a basic pre-processing of the raw fMRI data using SPM as also implemented inother studies (see e.g. [48]). In particular, for the fMRI data processing, we used FEAT (FMRI Expert Analysis Tool)Version 6.00, part of FSL (FMRIB’s Software Library, ). In particular, the followingpre-statistics processing was applied: motion correction using MCFLIRT [60], slice-timing correction using Fourier-space time-series phase-shifting; non-brain removal using BET [61], spatial smoothing using a Gaussian kernel ofFWHM 5mm, grand-mean intensity normalization of the entire 4D dataset by a single multiplicative factor. ICAAROMA [17] was also implemented to detect and factor out noise-related independent components (ICs) along witha high-pass temporal filtering at 0.01 Hz (100 seconds) that was applied after ICA AROMA procedure as it is highlyrecommended [17].We then proceeded with the decomposition of the pre-processed fMRI data to spatial ICs (for each suject) using theRAICAR methodology [20]. In this way, we found the most reproducible spatial ICs over repeated runs as a solutionto the well known problem of the variability of the ICA decomposition [18]. This choice was related to the benchmarkfMRIdata per se as there was only a single session per subject with relatively small duration (6 minutes); therefore wewouldn’t expect a robust ICA decomposition for all subjects (see also the discussion in [10]). Another choice wouldbe to perform group-ICA analysis, but we decided to use single-subject ICA in order to have a common ground withthe methodologically similar work presented in [48]. Group-ICA analysis will be performed in a future study.3
PREPRINT - M AY
27, 2020
ICA is a linear data-driven technique that reduces the high-dimensional fMRI F ( t, x, y, z ) space in a set of M statisti-cally independent spatial components (ICs). This reduction can be represented as: F ( t, x, y, z ) = M X i =1 A i ( t ) C i ( x, y, z ) , (1)where F ( t, x, y, z ) is the measured BOLD signal, A i ( t ) is the temporal amplitude (the matrix A containing all tem-poral amplitudes is known as mixing matrix) and C i ( x, y, z ) is the spatial magnitude of the i-th ICA component.While PCA requires that the principal components are uncorrelated and orthogonal, ICA asks for statistical indepen-dence between the ICs. Generally, ICA algorithms are based either on the minimization of mutual information or themaximization of non-gaussianity among components. As discussed in the introduction, most of the standard imple-mentations of ICA, such as the one in MELODIC (Multivariate Exploratory Linear Optimized Decomposition intoIndependent Components) [62], which is part of FSL fMRI analysis software package, share similar gradient basedoptimization algorithms using an iterative scheme whose initial values are generated randomly, thus making the wholeprocess stochastic. As a consequence, results over repeated runs may differ significantly [18]. A solution to thisproblem is provided by the so-called ranking and averaging ICA by reproducibility (RAICAR) [20] that we brieflydescribe in the following section. The RAICAR methodology developed by Yang et al. [20] was addressed to tackle the problem of the ICs variabilityby performing K ICA realizations. Thus, RAICAR leads to K “slightly" different mixing matrices A , A . . . A K and K different sets of ICs S , S . . . S K . Each realization finds a fixed number M of spatial ICs. Then, a crossrealization correlation matrix ( CRCM ) of size M · K × M · K is constructed and the alignment (ICA producesunaligned components) of ICs of each realization takes place on the basis of the absolute maximum cross correlationamong components. Thus, the cross realization correlation matrix reads: CRCM = R , R , . . . R ,K − R ,K R , . . . . . . R ,K ... . . . . . . . . . ... R K − , . . . . . . R K − ,K R K, R K, . . . R K,K − R K,K R i,j with i, j = 1 , ...K are submatrices of size M × M and their elements represent the absolute spatial cross corre-lation coefficients among components and across realizations. CRCM is a symmetric matrix and its diagonal consistsof identity matrices which are ignored for the next steps of the algorithm.The procedure starts with the identification of the global maximum of the CRCM, thus finding the matched componentbased on two realizations. At the next step, the methodology seeks for the highest absolute spatial cross correlation co-efficients of the identified component in the remaining realizations factoring out all others. The procedure is repeated M times until M aligned components are found.The next step involves the computation of the reproducibility index for each of the aligned components. This is doneby constructing the histogram of the absolute spatial cross correlation coefficients of the upper triangle matrix of theCRCM. This histogram tends to be bimodal, as in general, we expect low spatial cross correlation among most of theICs and high spatial cross correlation only for a few of them. A spatial cross correlation threshold is applied withthe desired value lying in the valley of the histogram between the two modes [20]. Finally, the reproducibility indexis computed for each one of the aligned components. This is done by aggregating the supra-threshold absolute crosscorrelation coefficients of the CRCM for each of the aligned components.The last step of the algorithm is the ranking and averaging of the aligned components in descending order based onthe reproducibility index. The selective averaging is applied so that the components are averaged if and only if, thegiven aligned component has at least one absolute spatial cross correlation coefficient above the threshold across real-izations.After applying RAICAR, the ICs are chosen via a cut-off threshold based on the reproducibility index (of each com-ponent) that indicates how consistent is the appearance of an IC across realizations.Here, we have set K = 30 realizations (same as also in [20]); taking more realizations did not change the outcomes ofthe analysis. The spatial cross correlation threshold was chosen by localizing the minimum of the histogram of absolute4 PREPRINT - M AY
27, 2020cross correlation coefficients of the CRCM. This threshold was specified separately for each subject. The reproducibleICs were determined by the histogram of reproducibility index over the number of components in descending order.The cut-off threshold was set as the half of the maximum reproducibility index value possible K ( K − · . (this choiceis the same with the one used in [20]). This cut-off threshold was set equal for all subjects.Subjects with less than 20 reproducible ICs were excluded from further analysis as this number of components re-sulted in disconnected graphs. Thus, we ended up with 104 subjects out of which 57 were healthy controls and 47schizophrenia patients. For the construction of FCN, we used all combinations between three manifold learning algorithms, namely MDS,ISOMAP and Diffusion Maps and two widely used metrics, namely the lagged cross-correlation [48, 63, 64] and theeuclidian distance [52, 65, 66].
For every pair of the associated time courses of the ICs, say A i and A j , the cross-correlation function (CCF) over amaximum of three time lags (as in [48]) reads: CCF ( A i , A j , l ) = E [( A i,t + l − A i )( A j,t − A j )] q E [( A i,t − A i )( A j,t − A j ] , (2)where l is the time lag, and A i is the mean value of the whole time series.For the construction of the FCN connectivity/ correlation matrices, we used a pseudo-distance measure d c defined as(see also [48]): d c ( A i , A j ) = 1 − max l =0 , , , ( | CCF ( A i , A j , l ) | ) . (3)The resulting dis(similarity) matrices are fully connected and therefore are hardly comparable between subjects (seethe discussion in [48]). Thus, here as a standard practice, (and in all other algorithms described below), we appliedthresholding to the (dis)similarity matrices in order to keep the strongest connections of the derived functional connec-tivity matrix. In order to factor out the influence of the variable network density on the computation and comparison ofgraph-theoretical measures across groups [67], we have implemented the approach of proportional thresholding (PT)[67]. In particular, we considered a range of levels of PT from 20% to 70% with a step of 2%. Below the threshold of20%, graphs became too fragmented, while thresholds above the 70% resulted in dense graphs, that mostly includednoisy and less significant edges (see also the discussion in [68] about this issue). If a graph was still fragmented afterthresholding, the giant component was used for further analysis. The euclidean distance is used in many studies to assess (dis)similarities between fMRI data points [52, 65, 66]. Fortime series associated with the independent spatial maps, A i and A j , the euclidean distance reads: L ( A i , A j ) = vuut T X t =1 ( A i,t − A j,t ) . (4)For the construction of FCN, PT was applied to the euclidean similarity matrices for each individual over the range of %- %. Below we present how MDS, ISOMAP and Diffusion Maps can be exploited to construct (embedded) FCN.5
PREPRINT - M AY
27, 2020
Multidimensional Scaling [25] is a linear method of dimensionality reduction that can be used to find similar-ities between pairs of objects in a low-dimensional (embedded) space. Given a set of M objects/observables x , x , . . . , x M ∈ R N , MDS produces a low-dimensional data representation y , y , . . . , y M ∈ R p , p ≪ N mini-mizing the objective function: X i,j, i = j (cid:16) k x i − x j k − d ( x i , x j ) (cid:17) . (5) d ( x i , x j ) is the (dis)similarity obtained (by any (dis)similarity measure of choice) between all pairs of points x , x , . . . , x M ∈ R N . In our case, the observables x i are the amplitudes of the spatial ICs A i, i =1 ,..M ∈ R N .Here, N = 150 (number of time points).The coordinates of the embedded manifold y , y , . . . , y M are given by: [ y , . . . , y M ] = Λ p × p · V Tp × M . (6) Λ p × p contains the square roots of the p largest eigenvalues, and V Tp × M are the corresponding eigenvectors of thematrix: B = − HD H T . (7) H M × M is the double centering matrix defined as: H = I − M · T , = [ . . . ] × M . (8)Using MDS, the dimensionality reduction of the original data X = x , x , . . . , x M ∈ R N yields the embedding of Y = y , y , . . . , y M ∈ R p , p ≪ N .Here, for the construction of the embedded FCN, we produced distance matrices (using either lagged cross-correlationor the euclidean distance) D Y of size M × M .For the implementation of the MDS algorithm, we used the “cmdscale" function contained in the software pack-age“Stats" in the R free Software Environment [69]. ISOMAP is a non-linear manifold learning algorithm that given a set of M objects/observables x , x , . . . , x M ∈ R N produces a low-dimensional data representation y , y , . . . , y M ∈ R p , p ≪ N minimizing the objective function: X i,j, i = j (cid:16) d G ( x i , x j ) − d ( x i , x j ) (cid:17) , (9)where d G ( x i , x j ) is the shortest path (geodesic distance) and d ( x i , x j ) is the (dis)similarity obtained (by any(dis)similarity measure of choice) between all pairs of points x , x , . . . , x M ∈ R N .In our case, the observables x i are the amplitudes of the spatial ICs A i, i =1 ,..M ∈ R N .The above minimization problem is solved as follows: [43]: • Construct a graph G = ( V, E ) , where the vertices V are the ICs A i ; its links E are created by usingeither the k -nearest neighbors algorithm or a fixed distance between nodes, known as the ǫ distance. Forexample, a link between two ICs is created if d i,j ≡ d ( A i , A j ) < ǫ , ∀ i = j . Here, we used the k nearestneighbours algorithm with k = 3 , , , [43]). Set the weight w i,j of the link (if any) between A i , A j as w i,j = d ( A i , A j ) . If there is not a link set: w i,j = 0 .6 PREPRINT - M AY
27, 2020 • Approximate the embedded manifold by estimating the shortest path (geodesic distance) d G ( A i , A j ) foreach pair of nodes based on the distances d i,j ; this step can be implemented for example using the Dijkstraalgorithm [70]. This procedure results to a matrix, D G with elements the shortest paths: D G ij ≡ d G ( A i , A j ) = min (cid:8) d i,j , d i,k + d k,j (cid:9) , k = 1 , , . . . , M k = i, j. (10) • Estimate the coordinates of the low-dimensional (embedded) manifold y , y , . . . , y M exploiting the MDSalgorithm [25] on the geodesic distance matrix D G .Here, for the implementation of the ISOMAP algorithm, we used the package “vegan" [71] contained in the R freeSoftware Environment [69]. Diffusion Maps [44] is a non-linear manifold learning algorithm that given a set of M objects/observables X = x , x , . . . , x M ∈ R N produces a low-dimensional representation Y = y , y , . . . , y M ∈ R p , p ≪ N , address-ing the diffusion distance among data points as the preserved metric[72]. The embedding of the data in the low-dimensional space is obtained by the projections on the eigenvectors of a normalized graph Laplacian [73]. TheDiffusion Maps algorithm can be described in a nutshell in the following steps: • Construction of the affinity matrix W M × M , here M is the number of ICs for each subject. The elements W ij represent the weighted edges connecting nodes i and j using the so-called heat kernel: W i,j = exp ( − d ( x i , x j ) σ ) , (11)where x i is a N -dimensional point (here, N=150), d ( x i , x j ) are the (dis)similarities obtained (by any dissim-ilarity measure of choice) between all pairs of points x , x , . . . , x M ∈ R N and σ is an appropriately chosenparameter which can be physically described as a scale parameter of the heat kernel [44]. The heat kernel W satisfies two important properties, the one of symmetry and the other of positive semi-definite matrix. Thelatter property is crucial and allows the interpretation of weights as scaled probabilities of “jumping" fromone node to another.The parameter σ of the neighborhood size is data-dependent and here, it was determined by finding the linearregion in the sum of all weights in W , say S w , using different values of σ [74, 52]. S w is calculated throughthe formula: S w = M X i M X j W ij , (12)In order to use a single value of σ for all participants, we computed a super-distribution of the sum ofweights across subjects (taking the median value of the distributions) using different values of σ . Thus, weconsidered values of σ lying in the linear region of the super-distribution. After inspection, every value of σ was lying in the linear region of every single subject’s logarithmic plot. • Formulation of the diagonal M × M normalization matrix K along with the diffusion matrix P : K ii = M X j =1 W ij , (13) P = K − W . (14)Each element of the symmetric and normalized diffusion matrix P reflects the connectivity between two datapoints x and x . As an analogy, this connectivity can be seen as the probability of “jumping" from one pointto another in a random walk process. Consequently, raising P to a power of t can be thought as a diffusionprocess. As the number of t increases, paths with low probability tend to zero, while the connectivity be-tween paths with high probability remains high enough, thus governing the diffusion process [44]. Thus, the7 PREPRINT - M AY
27, 2020algorithm of Diffusion Maps preserves the diffusion distance among points in a low-dimensional euclideanspace. The diffusion distance is closely related to the diffusion matrix P and for two distinct points x i , x j and for specific time instance t is defined as [75]: D t ( x i , x j ) = X m | P tim − P tmj | . (15)Unlike geodesic distance, the diffusion distance is robust to noise perturbations, as it sums over all possiblepaths (of t steps) between points [44]. • Construction of the conjugate matrix P = K / PK − / , (16)substituting Eq.(14) to Eq.(16) we get P = K − / WK − / . (17)This is the so-called Graph Laplacian matrix [73]. The matrix P is adjoint to the symmetric matrix P . Thus, P and P share the same eigenvalues [76]. • Singular Value Decomposition (SVD) of P yields P = U ΛU ∗ , (18)where Λ is a diagonal matrix containing the M eigenvalues of P and U the eigenvectors of P . The eigenvec-tors V of P can be bow found by [76]: V = K / U . (19) • By taking out the trivial eigenvalue λ = 1 of the matrix Λ and the corresponding eigenvector contained in V ,the coordinates of the low dimensional embedded manifold y , y , . . . , y M are given by: [ y , . . . , y M ] = Λ p × p · V Tp × M , (20)where Λ p × p contains the p largest eigenvalues, and V Tp × M are the corresponding eigenvectors of the diffusionmatrix P .For the implementation of the above algorithm we used the package “diffusionMap" [77] contained in the R freeSoftware Environment [69]. Here, the embedded dimension was determined via the eigenspectrum of the final decomposition for every dimen-sionality reduction/Manifold learning algorithm. If there is a gap between the first few larger eigenvalues of the finaldecomposition and the rest of the eigenspectrum, these few eigenmodes capture most of the distance differences be-tween data points and are able to represent and uncover intrinsic properties of the data structure [76, 78, 79]. In orderto determine the embedded dimension for the methods described above, we considered the following steps: We sortedthe eigenvalues in decreasing order eg. λ ≥ λ ≥ λ · · · ≥ λ M (for diffusion maps λ is discarded). Then, foreach subject, we calculated the consequent pairwise differences λ − λ , λ − λ , ... , λ M − − λ M . A gap betweenthe mean (over all subjects) differences determines from which point and further, the relative contribution of anothereigendimension is redundant (this having a small contribution to the reconstruction of the embedded FCN). Thus, herefor our computations, we considered low-dimensional embeddings of p = (2 , , , dimensions for each one of themanifold learning methods described above. Here, we analyzed the topological properties of the binary FCN graphs on the basis of three major graph measures,namely, the average path length, the global clustering coefficient, and the median degree, which are the most frequentlyused in neuroscience [80]. In particular, given a graph G = ( V, E ) with g ij representing the link (0: unconnected or1: connected) from node i to node j and k i = P N V j =1 g ij the degree of node i, the graph measures are computed asfollows:a) The average path length is defined by: L = N V ( N V − P i = j D G ij , i.e. is the average number of steps along the8 PREPRINT - M AY
27, 2020shortest paths D G ij for all possible pairs of the network nodes. This is a measure of the efficiency of information ormass transport on a network between all possible pairs of nodes.b) The global clustering coefficient is defined by: C g = P t c P t , where t is a triplet and t c is a closed triplet. A triplet of agraph consists of three nodes that are connected by either to open (i.e open triplet) or closed (i.e closed triplet) ties. Ingeneral, this measure indicates how random or structured a graph is (in our case, in terms of functional segregation).c) The median degree M k is the median value of the degree distribution of G . This measure reflects how wellconnected is the “median" network node in terms of the number of links that coincide with it.An extensive review of definitions and meaning of the above and other graph-theoretical measures with respect tobrain functional networks can be found in [81, 80].The computations for the graph analysis were performed with the “igraph" [82] package contained in the free softwareenvironment R [69]. Based on the three key graph measures, the classification was assessed using machine learning algorithms, namely Lin-ear Support Vector Machines(LSVM), Radial Support Vector Machines (RSVM), Artificial Neural Networks (ANN)and k-NN classification (for a brief description of the above algorithms and their parameters see in the Appendix).The classification algorithms were trained, validated and tested using a 10-fold cross-validation scheme which wasrepeated 100 times. Thus, we separated the data in ten separate sub-samples; nine of them were used as trainingsets and one of them was used for validation. This process was repeated 10 times leaving out each time a differentsub-sample which served as a validation set. The whole procedure was repeated 100 times. The overall classificationrate was determined via the computation of the average classification rate over all the repetitions of the 10-fold crossvalidation for each model.The average confusion matrix (over all repetitions of the 10-fold cross validation) was also computed for each classi-fication model. The confusion matrix is a × (in the case of binary classification) square matrix containing all truepositives T P , false positives
F P , true negatives
T N and false negatives
F N . Here, we considered as positives P theschizophrenia cases and as negatives N the healthy control cases. Sensitivity (also called the True Positive Rate) andSpecificity (also called the True Negative Rate) are basic statistical measures for the assessment of binary classifica-tions. The sensitivity T P R is given by
T P R = T PT P + F N , while the specificity
T N R is given by
T N R = T NT N + F P .Here, sensitivity characterizes the ability of the classifier to correctly identify a schizophrenic subject, while specificityis the ability of the classifier to correctly identify a healthy subject.Here, we used the algorithms contained in the package “caret" [83] contained in the R free software environment [69].
In table 1, we present the best classification accuracy, along with the corresponding sensitivity and specificity rate foreach manifold learning algorithm, threshold and classifier when using the lagged cross-correlation metric. At the endof the table 1, we provide also the results obtained by the “conventional" (thresholded) cross-correlation matrix.Fig. 1 shows the super-distribution of the sum of weights of all subjects against different values of σ used forconstruction of the FCN using the Diffusion Maps algorithm; the red dotted vertical line shows the optimal σ (here, σ = 0.325) while the black vertical lines bound the linear region ( σ ∈ (0 . , . ). The results were robust to differentchoices of the time step of the Diffusion Maps algorithm, namely t = 0 , , . The investigation of the optimal σ wasperformed with respect to the best classification accuracy over all classifiers.Fig.2 depicts the best classification rates for MDS, ISOMAP, Diffusion Maps and lagged-cross correlation matrix fordifferent classifiers are reported. Diffusion Maps provided the best classification accuracy (79.3% for SVM with anRBF kernel and 52% PT), thus appearing more robust over a wide range of PT. With respect to the maximum classifi-cation accuracy obtained by Diffusion Maps, results were robust over a wide range of values of σ ∈ (0 . , . .Fig. 3 depicts the classification performance obtained with Diffusion Maps for different values of σ .Isomap performed better for low thresholds. Overall, the performance of Isomap was more or less the same with thecross-correlation matrix with the best classification rate being slightly higher for Isomap (74.4% for SVM with anRBF kernel and 24% PT). Its performance was however sensitive to the choice of the number k of nearest neighbors;with k = 5 we got a 74.4% classification accuracy for SVM with an RBF kernel and 24% PT) while for k = 6 we9 PREPRINT - M AY
27, 2020Table 1: Best classification rates over all manifold and machine learning methods using the lagged cross-correlationpseudo-distance measure d c (see 2.3.1); the embedded dimension (Emb. Dim) is also shown for each method alongwith the corresponding best threshold point (Thres), Classifier, Accuracy (Acc), Sensitivity (Sens) and Specificity(Spec) rate. Method Emb. Dim Thres Classifier Acc ± SD Sens Spec
MDS ± ± ± ± Isomap ± ± ± ± Diffusion Maps ± ± ± ± Correlation Matrix - 0.52 RSVM 69.5 ± ± ± ± Figure 1:
Super-distribution of all subjects of the sum of the weights (see the equation in 12) (e.g. by taking themedian value of the distribution across subjects per value of σ ) using different values of σ parameter. The reddashed vertical line shows the optimal σ that was found to be σ = 0.325. The other two vertical black lines depictthe linear zone in which we investigated values of σ . PREPRINT - M AY
27, 2020Figure 2:
Overall classification performance for all thresholds (from 20%-70% of the strongest edges with 2%as pace) and classifiers. The metric used is the lagged cross-correlation based pseudo-distance measure d c (see2.3.1). Figure 3:
Classification performance of Diffusion Maps for different values of σ and classifiers. The metric usedis the pseudo-distance measure d c (see 2.3.1) based on lagged cross-correlation. got a 71% classification accuracy for ANN and 20% PT), thus providing the best performance for ISOMAP. MDSwas outperformed by both ISOMAP and Diffusion Maps as well as the cross correlation matrix; the classificationperformance of MDS was overall relatively poor. 11 PREPRINT - M AY
27, 2020Fig. 4 shows characteristic eigenspectrums of the final decomposition for MDS, ISOMAL and Diffusion Maps. Asit is shown, there are three gaps: the first gap appears between the first eigenvalue and the rest of the spectrum, thesecond gap between the first two eigenvalues and the rest of the spectrum, and a third gap appears between the firstfour-five eigenvalues and the rest of the spectrum.Figure 4:
Mean Differences of the first larger 15 eigenvalues (see 2.4.4) for all manifold learning algorithmswith the cross correlation metric. A) MDS (see 2.4.1), B) ISOMAP (see 2.4.2) and C) Diffusion Maps (see 2.4.3),based on the optimal parameters. The red dashed vertical line marks the maximum number of dimensionsconsidered in this study (i.e. five dimensions).3.2 Classification performance using the euclidean distance
The same analysis was performed for the euclidean distance. The best classification rates using the euclidean distancefor all manifold learning methods and classifiers are presented in table 2.Overall, the classification rates with the euclidean distance were more noisy compared to the ones computed with thelagged cross-correlation distance.For a wide range of PT, the classification rates obtained were under 60% and sometimes at the level of a random ora completely biased classifier (towards the larger class, here, the healthy controls). In our case, a random classifierresulted to a 50.5% classification accuracy, while for a strict majority rule for imbalanced datasets, a completelybiased (and thus useless) classification model results to a 54.8 % classification rate.Fig. 5 depicts the accuracy of all methods across all thresholds using all different classifiers.In this case, the best classification rate was obtained with Isomap (72.9% for RSVM and 68% PT). Despite the fact thatfor certain thresholds, the maximum classification rates obtained with the euclidean distance were enough high, theperformance of the methods was generally poorer compared to the ones computed with the lagged cross-correlationdistance for all the range of PT.Characteristic eigenspectrums for all manifold learning algorithms utilizing the euclidean distance are shown in Fig.6.Finally, in Fig. 7 are given the boxplots of the classification rates among metrics used. Overall, the lagged cross-correlation distance resulted in higher classification rates compared to the euclidean distance with the exception ofMDS. Except from some specific thresholds, MDS in both cases performed poorly.12
PREPRINT - M AY
27, 2020Table 2: Best classification rates over all manifold learning methods and classifiers with the use of the euclideandistance L (see 2.3.2); the embedded dimension (Emb. Dim), threshold point (Thres), classifier, Accuracy (Acc),Sensitivity (Sens) and Specificity (Spec) rates. Method Emb. Dim Thres Classifier Acc ± SD Sens Spec
MDS ± ± ± ± Isomap ± ± ± ± Diffusion Maps ± ± ± ± Euclidean Matrix - 0.64 RSVM 71.6 ± ± ± ± Figure 5:
Classification performance using the euclidean distance (see 2.3.2) for all thresholds (from 20%-70%of the strongest edges with 2% as pace) and classifiers. PREPRINT - M AY
27, 2020Figure 6:
Mean Differences of the first larger 15 eigenvalues (see 2.4.4) for all manifold learning algorithms withthe euclidean metric. A) MDS (see 2.4.1), B) ISOMAP (see 2.4.2) and C) Diffusion Maps (see 2.4.3) using theoptimal parameters. The red dashed vertical line shows marks the maximum number of dimensions consideredin this study (i.e. five dimensions).
Figure 7:
Boxplots of classification rates over all classifiers and thresholds, using the A) lagged cross-correlationpseudo-distance d c (see 2.3.1). B) the euclidean distance L (see 2.3.2). The labels at the bottom of each panelcorrespond to the method used for the construction of the FCN: the Cross Correlation matrix (Corr. Matrix),Diffusion Maps (DMaps), ISOMAP, MDS. The black horizontal lines mark the median values of the distribu-tions. PREPRINT - M AY
27, 2020
In this study, we constructed embedded FCN from rsfMRI data using linear and non-linear manifold learning tech-niques. Based on the graph theoretical measures of the constructed FCN we then used machine learning algorithmsfor classification purposes. We also compared the performance of two widely used metrics, namely the lagged cross-correlation and the euclidian metric. For our illustrations, we used a publicly available dataset of resting fMRI record-ings taken from healthy and patients with schizophrenia. This is the first study that performs such a systematiccomparative analysis between various manifold learning algorithms, machine learning algorithms and metrics. To thebest of our knowledge, it is also the first study that shows how Diffusion Maps can be exploited to construct FCN fromfMRI data.At this point we should note that our intention was not to try to obtain the best possible classification performance by“optimising" the pre-processing of the raw fMRI data and/or by trying to find the best set of graph-theoretical measures.Instead we used a very basic pre-processing of the raw data (as the one performed in [48]), while classification wasbased only on three key graph-theoretical measures in neuroscience [80], namely, the average path length, the globalclustering coefficient and the median degree of the embedded binary networks. Based on the above, our best reportedaccuracy was 79.3 % obtained with Diffusion Maps and lagged cross-correlation (using 10 fold cross validation re-peated 100 times).For the same benchmark fMRI dataset Anderson and Cohen [48] used ISOMAP for the construction of embeddedlow-dimensional FCN for the classification between controls and schizophrenia patients. ROIs were acquired usingsingle subject ICA and functional connectivity was accessed using the lagged cross-correlation distance. The analysisrevealed differences in small world properties among groups and 13 graph theoretic features led to a reported 65%accuracy rate. Algunaid et al. [68] reported a 95% accuracy (with FSV feature selection method and 82.5% with aWelch’s t-test) testing more than 360 graph-based features. AAL was used for signal extraction and ROI selection,while an SVM-based classifier was used along with 10 fold cross validation scheme to evaluate its performance.An important outcome of our analysis is that the Diffusion Maps result in more robust results and higher classificationaccuracy and that the lagged cross-correlation distance outperforms the euclidean distance which is used traditionallyto construct connectivity matrices from fMRI data.In a future work, we aim at analysing the local-graph theoretical properties, thus matching the derived RAICAR ICswith known resting state networks and also perform group ICA analysis.
The COBRE dataset is publicly available at http://fcon_1000.projects.nitrc.org/indi/retro/cobre.html .For our analysis, we used the “R" software subroutines as described in 2 and 7.
Conceptualization:Constantinos Siettos; Methodology: Constantinos Siettos; Formal analysis and investigation: Ioan-nis Gallos; Data acquisition and processing: Ioannis Gallos and Evangelos Galaris; Writing- original draft preparation:Ioannis Gallos; Writing - review and editing: Constantinos Siettos and Ioannis Gallos; Supervision:Constantinos Siet-tos
For classification we used Linear Support Vector Machines (LSVM), Radial (Radial basis function kernel) SupportVector Machines (RSVM), one hidden layer Artificial Neural Networks (ANN) and k-NN classifier(k-NN). All classi-fiers were trained and evaluated via repeated 10-fold cross-validation scheme repeated 100 times. For the classificationwe used the three key graph theoretical measures as described in subsection 2.5 in Materials and Methods. Trainingand classification were implemented using algorithms contained in package “caret" [83] publicly available in R freesoftware environment [69].
Support vectors machines (SVM) aim at finding the optimal separating plane or hyperplane in the feature space amonggroups. In particular, for a set of points ( x i , y i ) i =1 , ..N , where N is the number of subjects, x i ∈ R d contains d attributes/features selected for subject i and y i ∈ ( − , the subject’s class (here, either healthy or patient), SVMattempts to find the optimal plane or hyperplane that divides the two classes by maximizing the margin of separation.15 PREPRINT - M AY
27, 2020Any hyperplane with a given set of points x i can be modelled as w · x i + b = 0 where w represent the weightsof features x i . Parallel hyperplanes can be described as w · x i + b ≥ if y i = 1 and w · x i + b ≤ − if y i = − . The optimization problem then aims at maximizing the margin between hyperplanes k w k such that for every ( y i ) i =1 , ..N , y i · ( w · x i + b ) ≥ . One can take advantage of a regularization parameter C indicating the penalty oferror z i that gives a trade-off between misclassifications and the width of the separating margin. This leads to the finaloptimization problem, which minimizes k w k + C · P i z i subject to y i · ( w · x i + b ) ≥ − z i , i = 1 , ..N .Based on the idea that the data maybe better separable in a higher dimensional space, SVM may utilize a kernelfunction to map x i ∈ R d to φ ( x i ) ∈ R D , D > d . In our study, besides standard Linear SVM (LSVM), we also usedRadial SVM (RSVM) making use of the Radial basis functions kernel given by K ( x i , x j ) = exp ( − k x i − x j k · γ ) , where γ is the kernel’s scale parameter. k-nearest neighbours algorithm is one of the simplest classification/machine learning algorithms. Given ( x i , y i ) i =1 , ..N , where N is the number of subjects, x i ∈ R d contains d attributes/features selected for subject i and y i the subject’s class (here, either healthy or patient), k-NN utilizes euclidean distance in the feature space to per-form a voting system among κ closest neighbours. In this manner, each point is classified as “control", if the numberof“control" neighbours is greater than the number of “patient" neighbours and inversely. The number κ of closestneighbours is a parameter of choice that plays a crucial role in method’s performance. In this study, it is important tonote that we chose odd values of κ (i.e how many neighbours we take into consideration) in order not to have to breakpossible ties in the voting system among neighbours In this study, we also used feed-forward Artificial Neural Networks (ANN) consisting of one hidden layer. Theinput units were three as the number of the features considered for classification. We have chosen one hidden layerconsisting from 1 to 5 neurons along with a bias term. The activation function used for all neurons was the logistictransfer function [84]. The output was one node (reflecting simple binary classification control/patient). The trainingprocedure of the model was done via back-propagation [85] using a 10-fold cross validation scheme repeated 100times. Finally a weight decay parameter a (regularization parameter) was used to prevent over-fitting and improvegeneralization [86] of the final model. For the implementation of the ANN we used the “nnet" software package [87]publicly available in R free software environment [69]. We tuned the parameters of the algorithms via grid search.For the SVM: C = (0 . , . , . , . , , . , , . , , , , , , , , , γ = (0 . , . , . , . , . , . , , . , , . , , , , , , , , , .For the k-NN classifier: κ = (1 , , , , .For the ANN: number of neurons in the hidden layer p = (1 , , , , ,decay level a = (0 . , . , . , . , . , . , . . References [1] Karl J Friston, Andrew P Holmes, Keith J Worsley, J-P Poline, Chris D Frith, and Richard SJ Frackowiak.Statistical parametric maps in functional imaging: a general linear approach.
Human brain mapping , 2(4):189–210, 1994.[2] Stephen M Smith, Mark Jenkinson, Mark W Woolrich, Christian F Beckmann, Timothy EJ Behrens, HeidiJohansen-Berg, Peter R Bannister, Marilena De Luca, Ivana Drobnjak, David E Flitney, et al. Advances infunctional and structural mr image analysis and implementation as fsl.
Neuroimage , 23:S208–S219, 2004.[3] Karl J Friston. Functional and effective connectivity: a review.
Brain connectivity , 1(1):13–36, 2011.[4] Meenakshi Khosla, Keith Jamison, Gia H Ngo, Amy Kuceyeski, and Mert R Sabuncu. Machine learning inresting-state fmri analysis.
Magnetic resonance imaging , 2019.[5] Yuan Zhou, Kun Wang, Yong Liu, Ming Song, Sonya W Song, and Tianzi Jiang. Spontaneous brain activityobserved with functional magnetic resonance imaging as a potential biomarker in neuropsychiatric disorders.
Cognitive neurodynamics , 4(4):275–294, 2010. 16
PREPRINT - M AY
27, 2020[6] Bharat Biswal, F Zerrin Yetkin, Victor M Haughton, and James S Hyde. Functional connectivity in the motorcortex of resting human brain using echo-planar mri.
Magnetic resonance in medicine , 34, 1995.[7] Michael D Greicius, Ben Krasnow, Allan L Reiss, and Vinod Menon. Functional connectivity in the restingbrain: a network analysis of the default mode hypothesis.
Proceedings of the National Academy of Sciences ,100(1):253–258, 2003.[8] Michael D Fox, Abraham Z Snyder, Justin L Vincent, Maurizio Corbetta, David C Van Essen, and Marcus ERaichle. The human brain is intrinsically organized into dynamic, anticorrelated functional networks.
Proceed-ings of the National Academy of Sciences , 102(27):9673–9678, 2005.[9] Daniel S Margulies, AM Clare Kelly, Lucina Q Uddin, Bharat B Biswal, F Xavier Castellanos, and Michael PMilham. Mapping the functional connectivity of anterior cingulate cortex.
Neuroimage , 37(2):579–588, 2007.[10] David M Cole, Stephen M Smith, and Christian F Beckmann. Advances and pitfalls in the analysis and interpre-tation of resting-state fmri data.
Frontiers in systems neuroscience , 4:8, 2010.[11] Michael D Saxe, Fortunato Battaglia, Jing-Wen Wang, Gael Malleret, Denis J David, James E Monckton,A Denise R Garcia, Michael V Sofroniew, Eric R Kandel, Luca Santarelli, et al. Ablation of hippocampalneurogenesis impairs contextual fear conditioning and synaptic plasticity in the dentate gyrus.
Proceedings ofthe National Academy of Sciences , 103(46):17501–17506, 2006.[12] Aapo Hyvärinen and Erkki Oja. Independent component analysis: algorithms and applications.
Neural networks ,13(4-5):411–430, 2000.[13] Christian F Beckmann, Marilena DeLuca, Joseph T Devlin, and Stephen M Smith. Investigations into resting-state connectivity using independent component analysis.
Philosophical Transactions of the Royal Society B:Biological Sciences , 360(1457):1001–1013, 2005.[14] Christian F Beckmann and Stephen M Smith. Tensorial extensions of independent component analysis for mul-tisubject fmri analysis.
Neuroimage , 25(1):294–311, 2005.[15] Dae Il Kim, Jing Sui, Srinivas Rachakonda, Tonya White, Dara S. Manoach, V. P. Clark, Beng-Choon Ho,S. Charles Schulz, and Vince D. Calhoun. Identification of imaging biomarkers in schizophrenia: A coefficient-constrained independent component analysis of the mind multi-site schizophrenia study.
Neuroinformatics ,8(4):213–229, jul 2010.[16] Stephen M Smith, Peter T Fox, Karla L Miller, David C Glahn, P Mickle Fox, Clare E Mackay, Nicola Filippini,Kate E Watkins, Roberto Toro, Angela R Laird, et al. Correspondence of the brain’s functional architectureduring activation and rest.
Proceedings of the National Academy of Sciences , 106(31):13040–13045, 2009.[17] Raimon HR Pruim, Maarten Mennes, Daan van Rooij, Alberto Llera, Jan K Buitelaar, and Christian F Beckmann.Ica-aroma: A robust ica-based strategy for removing motion artifacts from fmri data.
Neuroimage , 112:267–277,2015.[18] Johan Himberg, Aapo Hyvärinen, and Fabrizio Esposito. Validating the independent components of neuroimag-ing time series via clustering and visualization.
Neuroimage , 22(3):1214–1222, 2004.[19] Gustavo SP Pamplona, Bruno H Vieira, Frank Scharnowski, and Carlos EG Salmon. Personode: A toolbox forica map classification and individualized roi definition.
Neuroinformatics , pages 1–11, 2020.[20] Zhi Yang, Stephen LaConte, Xuchu Weng, and Xiaoping Hu. Ranking and averaging independent componentanalysis by reproducibility (raicar).
Human brain mapping , 29(6):711–725, 2008.[21] Yi-Ou Li, Tülay Adalı, and Vince D Calhoun. Estimating the number of independent components for functionalmagnetic resonance imaging data.
Human brain mapping , 28(11):1251–1266, 2007.[22] I.T Jollife.
Principal Component Analysis, Second Edition . Springer-Verlag, 2002.[23] Keith J Worsley, Jen-I Chen, Jason Lerch, and Alan C Evans. Comparing functional connectivity via thresholdingcorrelations and singular value decomposition.
Philosophical Transactions of the Royal Society B: BiologicalSciences , 360(1457):913–920, 2005.[24] R Baumgartner, L Ryner, W Richter, R Summers, M Jarmasz, and R Somorjai. Comparison of two exploratorydata analysis methods for fmri: fuzzy clustering vs. principal component analysis.
Magnetic Resonance Imaging ,18(1):89–94, 2000.[25] Joseph B Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.
Psychome-trika , 29(1):1–27, 1964.[26] Karl J Friston, Chris D Frith, Paul Fletcher, PF Liddle, and Richard SJ Frackowiak. Functional topography:multidimensional scaling and functional connectivity in the brain.
Cerebral cortex , 6(2):156–164, 1996.17
PREPRINT - M AY
27, 2020[27] Armin Iraji, Vince D Calhoun, Natalie M Wiseman, Esmaeil Davoodi-Bojd, Mohammad RN Avanaki, E MarkHaacke, and Zhifeng Kou. The connectivity domain: Analyzing resting state fmri data using feature-baseddata-driven and model-based methods.
Neuroimage , 134:494–507, 2016.[28] Roberto Viviani, Georg Grön, and Manfred Spitzer. Functional principal component analysis of fmri data.
Humanbrain mapping , 24(2):109–129, 2005.[29] Kaiming Li, Lei Guo, Jingxin Nie, Gang Li, and Tianming Liu. Review of methods for functional brain connec-tivity detection using fmri.
Computerized Medical Imaging and Graphics , 33(2):131–139, 2009.[30] Svetlana V Shinkareva, Jing Wang, and Douglas H Wedell. Examining similarity structure: multidimensionalscaling and related approaches in neuroimaging.
Computational and mathematical methods in medicine , 2013,2013.[31] Charidimos Tzagarakis, Trenton A Jerde, Scott M Lewis, Kâmil U˘gurbil, and Apostolos P Georgopoulos. Cere-bral cortical mechanisms of copying geometrical shapes: a multidimensional scaling analysis of fmri patterns ofactivation.
Experimental brain research , 194(3):369–380, 2009.[32] Alice J O’Toole, Fang Jiang, Hervé Abdi, Nils Pénard, Joseph P Dunlop, and Marc A Parent. Theoretical,statistical, and practical perspectives on pattern-based classification approaches to the analysis of functionalneuroimaging data.
Journal of cognitive neuroscience , 19(11):1735–1752, 2007.[33] James V Haxby, M Ida Gobbini, Maura L Furey, Alumit Ishai, Jennifer L Schouten, and Pietro Pietrini.Distributed and overlapping representations of faces and objects in ventral temporal cortex.
Science ,293(5539):2425–2430, 2001.[34] Svetlana V Shinkareva, Vicente L Malave, Marcel Adam Just, and Tom M Mitchell. Exploring commonalitiesacross participants in the neural representation of objects.
Human brain mapping , 33(6):1375–1383, 2012.[35] Hans P Op de Beeck, Marijke Brants, Annelies Baeck, and Johan Wagemans. Distributed subordinate specificityfor bodies, faces, and buildings in human ventral visual cortex.
Neuroimage , 49(4):3414–3425, 2010.[36] Raymond Salvador, John Suckling, Martin R Coleman, John D Pickard, David Menon, and ED Bullmore. Neuro-physiological architecture of functional magnetic resonance images of human brain.
Cerebral cortex , 15(9):1332–1342, 2005.[37] Simon Benjaminsson, Peter Fransson, and Anders Lansner. A novel model-free data analysis technique basedon clustering in a mutual information space: application to resting-state fmri.
Frontiers in systems neuroscience ,4:34, 2010.[38] Pierre-Yves Hervé, Annick Razafimandimby, Mathieu Vigneau, Bernard Mazoyer, and Nathalie Tzourio-Mazoyer. Disentangling the brain networks supporting affective speech comprehension.
NeuroImage ,61(4):1255–1267, 2012.[39] Amit Etkin and Tor D Wager. Functional neuroimaging of anxiety: a meta-analysis of emotional processing inptsd, social anxiety disorder, and specific phobia.
American Journal of Psychiatry , 164(10):1476–1488, 2007.[40] DE Welchew, GD Honey, Tonmoy Sharma, TW Robbins, and ET Bullmore. Multidimensional scaling of inte-grated neurocognitive function and schizophrenia as a disconnexion disorder.
NeuroImage , 17(3):1227–1239,2002.[41] David E Welchew, Chris Ashwin, Karim Berkouk, Raymond Salvador, John Suckling, Simon Baron-Cohen,and Ed Bullmore. Functional disconnectivity of the medial temporal lobe in asperger’s syndrome.
Biologicalpsychiatry , 57(9):991–998, 2005.[42] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. science ,290(5500):2323–2326, 2000.[43] Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimen-sionality reduction. science , 290(5500):2319–2323, 2000.[44] Ronald R Coifman and Stéphane Lafon. Diffusion maps.
Applied and computational harmonic analysis , 21(1):5–30, 2006.[45] Anqi Qiu, Annie Lee, Mingzhen Tan, and Moo K Chung. Manifold learning on brain functional networks inaging.
Medical image analysis , 20(1):52–60, 2015.[46] Hui Shen, Lubin Wang, Yadong Liu, and Dewen Hu. Discriminative analysis of resting-state functional con-nectivity patterns of schizophrenia using low dimensional embedding of fmri.
Neuroimage , 49(4):3110–3121,2010. 18
PREPRINT - M AY
27, 2020[47] Peter Mannfolk, Ronnie Wirestam, Markus Nilsson, Freddy Ståhlberg, and Johan Olsrud. Dimensionality reduc-tion of fmri time series data using locally linear embedding.
Magnetic Resonance Materials in Physics, Biologyand Medicine , 23(5-6):327–338, 2010.[48] Ariana Anderson and Mark Steven Cohen. Decreased small-world functional network connectivity and clusteringacross resting state networks in schizophrenia: an fmri classification tutorial.
Frontiers in Human Neuroscience ,7:520, 2013.[49] Ariana Anderson, Ivo D Dinov, Jonathan E Sherin, Javier Quintana, Alan L Yuille, and Mark S Cohen. Classifi-cation of spatially unaligned fmri scans. neuroimage , 49(3):2509–2519, 2010.[50] Koen V Haak, Andre F Marquand, and Christian F Beckmann. Connectopic mapping with resting-state fmri.
Neuroimage , 170:83–94, 2018.[51] Xilin Shen and François G Meyer. Analysis of event-related fmri data using diffusion maps. In
Biennial Interna-tional Conference on Information Processing in Medical Imaging , pages 652–663. Springer, 2005.[52] Tuomo Sipola, Fengyu Cong, Tapani Ristaniemi, Vinoo Alluri, Petri Toiviainen, Elvira Brattico, and Asoke KNandi. Diffusion map for clustering fmri spatial maps extracted by independent component analysis. In , pages 1–6. IEEE, 2013.[53] Wenzhao Lian, Ronen Talmon, Hitten Zaveri, Lawrence Carin, and Ronald Coifman. Multivariate time-seriesanalysis and diffusion maps.
Signal Processing , 116:13–28, 2015.[54] Dominique Duncan, Ronen Talmon, Hitten P Zaveri, and Ronald R Coifman. Identifying preseizure state inintracranial eeg data using diffusion kernels.
Math. Biosci. Eng , 10(3):579–590, 2013.[55] Constantinos Siettos and Jens Starke. Multiscale modeling of brain dynamics: from single neurons and networksto mathematical tools.
Wiley Interdisciplinary Reviews: Systems Biology and Medicine , 8(5):438–458, 2016.[56] Jonas Richiardi, Sophie Achard, Horst Bunke, and Dimitri Van De Ville. Machine learning with brain graphs:predictive modeling approaches for functional imaging in systems neuroscience.
IEEE Signal Processing Maga-zine , 30(3):58–70, 2013.[57] Vince D Calhoun, Jing Sui, Kent Kiehl, Jessica A Turner, Elena A Allen, and Godfrey Pearlson. Exploring thepsychosis functional connectome: aberrant intrinsic networks in schizophrenia and bipolar disorder.
Frontiers inpsychiatry , 2:75, 2012.[58] Andrew R Mayer, David Ruhl, Flannery Merideth, Josef Ling, Faith M Hanlon, Juan Bustillo, and Jose Cañive.Functional imaging of the hemodynamic sensory gating response in schizophrenia.
Human brain mapping ,34(9):2302–2312, 2013.[59] Muhammad Naveed Iqbal Qureshi, Jooyoung Oh, Dongrae Cho, Hang Joon Jo, and Boreom Lee. Multimodaldiscrimination of schizophrenia using hybrid weighted feature concatenation of brain functional connectivity andanatomical features with an extreme learning machine.
Frontiers in neuroinformatics , 11:59, 2017.[60] Mark Jenkinson, Peter Bannister, Michael Brady, and Stephen Smith. Improved optimization for the robust andaccurate linear registration and motion correction of brain images.
Neuroimage , 17(2):825–841, 2002.[61] Stephen M Smith. Fast robust automated brain extraction.
Human brain mapping , 17(3):143–155, 2002.[62] Christian F Beckmann and Stephen M Smith. Probabilistic independent component analysis for functional mag-netic resonance imaging.
IEEE transactions on medical imaging , 23(2):137–152, 2004.[63] Regina J Meszlényi, Petra Hermann, Krisztian Buza, Viktor Gál, and Zoltán Vidnyánszky. Resting state fmrifunctional connectivity analysis using dynamic time warping.
Frontiers in neuroscience , 11:75, 2017.[64] James S Hyde and Andrzej Jesmanowicz. Cross-correlation: an fmri signal-processing strategy.
NeuroImage ,62(2):848–851, 2012.[65] Archana Venkataraman, Koene RA Van Dijk, Randy L Buckner, and Polina Golland. Exploring functional con-nectivity in fmri via clustering. In
Proceedings of the... IEEE International Conference on Acoustics, Speech, andSignal Processing/sponsored by the Institute of Electrical and Electronics Engineers Signal Processing Society.ICASSP (Conference) , volume 2009, page 441. NIH Public Access, 2009.[66] Cyril Goutte, Peter Toft, Egill Rostrup, Finn Å Nielsen, and Lars Kai Hansen. On clustering fmri time series.
NeuroImage , 9(3):298–310, 1999.[67] Martijn P. van den Heuvel, Siemon C. de Lange, Andrew Zalesky, Caio Seguin, B.T. Thomas Yeo, and RubenSchmidt. Proportional thresholding in resting-state fMRI functional connectivity networks and consequences forpatient-control connectome studies: Issues and recommendations.
NeuroImage , 152:437–449, may 2017.19
PREPRINT - M AY
27, 2020[68] Rami F Algunaid, Ali H Algumaei, Muhammad A Rushdi, and Inas A Yassine. Schizophrenic patient iden-tification using graph-theoretic features of resting-state fmri data.
Biomedical Signal Processing and Control ,43:289–299, 2018.[69] R Core Team.
R: A language and environment for statistical computing. Vienna, Austria: R Foundation forStatistical Computing; 2014 , 2014.[70] Edsger W Dijkstra. A note on two problems in connexion with graphs.
Numerische mathematik , 1(1):269–271,1959.[71] Jari Oksanen, Roeland Kindt, Pierre Legendre, Bob O’Hara, M Henry H Stevens, Maintainer Jari Oksanen, andMASS Suggests. The vegan package.
Community ecology package , 10:631–637, 2007.[72] Boaz Nadler, Stephane Lafon, Ioannis Kevrekidis, and Ronald R Coifman. Diffusion maps, spectral clusteringand eigenfunctions of fokker-planck operators. In
Advances in neural information processing systems , pages955–962, 2006.[73] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.
Neural computation , 15(6):1373–1396, 2003.[74] Amit Singer, Radek Erban, Ioannis G Kevrekidis, and Ronald R Coifman. Detecting intrinsic slow variables instochastic dynamical systems by anisotropic diffusion maps.
Proceedings of the National Academy of Sciences ,106(38):16090–16095, 2009.[75] J De la Porte, BM Herbst, W Hereman, and SJ Van Der Walt. An introduction to diffusion maps. In
Proceedingsof the 19th Symposium of the Pattern Recognition Association of South Africa (PRASA 2008), Cape Town, SouthAfrica , pages 15–25, 2008.[76] Boaz Nadler, Stephane Lafon, Ronald Coifman, and Ioannis G Kevrekidis. Diffusion maps-a probabilistic in-terpretation for spectral embedding and clustering algorithms. In
Principal manifolds for data visualization anddimension reduction , pages 238–260. Springer, 2008.[77] Joseph Richards. Diffusion map.
R package version , pages 1–1, 2014.[78] Harry Strange and Reyer Zwiggelaar.
Open Problems in Spectral Dimensionality Reduction . Springer, 2014.[79] Lawrence K Saul, Kilian Q Weinberger, Jihun H Ham, Fei Sha, and Daniel D Lee. Spectral methods for dimen-sionality reduction.
Semisupervised learning , pages 293–308, 2006.[80] Cornelis J Stam and Jaap C Reijneveld. Graph theoretical analysis of complex networks in the brain.
Nonlinearbiomedical physics , 1(1):3, 2007.[81] Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations.
Neuroimage , 52(3):1059–1069, 2010.[82] Gabor Csardi and Tamas Nepusz. The igraph software package for complex network research.
InterJournal,Complex Systems , 1695(5):1–9, 2006.[83] Max Kuhn et al. Building predictive models in r using the caret package.
Journal of statistical software , 28(5):1–26, 2008.[84] Brian D Ripley.
Pattern recognition and neural networks . Cambridge university press, 2007.[85] Robert Hecht-Nielsen. Theory of the backpropagation neural network. In
Neural networks for perception , pages65–93. Elsevier, 1992.[86] Anders Krogh and John A Hertz. A simple weight decay can improve generalization. In
Advances in neuralinformation processing systems , pages 950–957, 1992.[87] Brian Ripley and William Venables. nnet: Feed-forward neural networks and multinomial log-linear models.