[PDF] Construction of embedded fMRI resting state functional connectivity networks using manifold learning

Abstract

We construct embedded functional connectivity networks (FCN) from benchmark resting-state functional magnetic resonance imaging (rsfMRI) data acquired from patients with schizophrenia and healthy controls based on linear and nonlinear manifold learning algorithms, namely, Multidimensional Scaling (MDS), Isometric Feature Mapping (ISOMAP) and Diffusion Maps. Furthermore, based on key global graph-theoretical properties of the embedded FCN, we compare their classification potential using machine learning techniques. We also assess the performance of two metrics that are widely used for the construction of FCN from fMRI, namely the Euclidean distance and the lagged cross-correlation metric. We show that the FCN constructed with Diffusion Maps and the lagged cross-correlation metric outperform the other combinations.

Full PDF

aa r X i v : . [ q - b i o . N C ] M a y C ONSTRUCTION OF EMB EDDED F

M RI

R ESTING STATEFUNC TIONAL C ONNECTIVITY NETWOR KS USING MANIFOLDLEAR NING

A P

REPRINT

Ioannis K. Gallos

School of Applied Mathematical and Physical SciencesNational Technical University of Athens, Greece [email protected]

Evangelos Galaris

Dipartimento di Matematica e ApplicazioniUniversità degli Studi di Napoli Federico II [email protected]

Constantinos I. Siettos ∗ Dipartimento di Matematica e ApplicazioniUniversità degli Studi di Napoli Federico IITel.: +39 0816-75615 [email protected]

May 27, 2020 A BSTRACT

We construct embedded functional connectivity networks (FCN) from benchmark resting-state func-tional magnetic resonance imaging (rsfMRI) data acquired from patients with schizophrenia andhealthy controls based on linear and nonlinear manifold learning algorithms, namely, Multidimen-sional Scaling (MDS), Isometric Feature Mapping (ISOMAP) and Diffusion Maps. Furthermore,based on key global graph-theoretical properties of the embedded FCN, we compare their classiﬁ-cation potential using machine learning techniques. We also assess the performance of two metricsthat are widely used for the construction of FCN from fMRI, namely the Euclidean distance and thelagged cross-correlation metric. We show that the FCN constructed with Diffusion Maps and thelagged cross-correlation metric outperform the other combinations. K eywords resting-state fMRI · Functional Connectivity Networks · Schizophrenia · Manifold Learning · MachineLearning

Over the past years, functional magnetic resonance imaging (fMRI) has been widely used for the identiﬁcation of brainregions that are related to both functional segregation and integration. Regarding functional segregation, the conven-tional analysis relies on the identiﬁcation of the activated voxels based on functional response models and multivariatestatistics between experimental conditions (e.g. resting state vs. task-stimulated activity). A representative exampleis the General Linear Model (GLM) that is implemented in well established software packages such as SPM [1] andFSL [2]. On the other hand, for the assessment of functional integration, there is a distinction between functional andeffective connectivity [3]. Functional connectivity (FC) analysis seeks for statistical dependencies (e.g. correlations,coherence) between brain regions. Effective connectivity (EC) analysis [3] tries to reveal the inﬂuence that one neuralsystem exerts on another [3]. A detailed review on the differences between FC and EC approaches can be found in [3].Here, we focus on the construction of FCN based on resting-state fMRI (rsfMRI) recordings. In rsfMRI, there is nostimuli and thus the assessment of functional integration is more complex and not so straightforward compared to task-related experiments [4]. Furthermore, spontaneous/ resting state brain activity as measured with fMRI has been also ∗ Corresponding author: [email protected]

PREPRINT - M AY

27, 2020considered as a potential biomarker in psychiatric disorders (see e.g. the review of Zhou et al. [5]). In general, for theconstruction of FCN, two basic general frameworks are explored: (a) Seed-based Analysis (SBA) and (b) IndependentComponent-based Analysis (ICA). In the SBA [6], the (averaged) fMRI signals of the regions of interest (ROI) arecorrelated with each other; correlations above a threshold are considered functional connections between seeds/ROIs.Even though the SBA has been proved extremely useful in identifying functional networks of speciﬁc brain regions[7, 8, 9], its major disadvantage is the requirement of the a-priori knowledge of the functional organization of thebrain, while possible correlations between seeds can be due to structured spatial confounds (e.g. scanner artifacts)[10]. Furthermore, seed selection is based on standard coordinates, while at the subject level, anatomical differencesmay lead to the selection of functionally irrelevant voxels at the group level. Thus, despite the use of normalizationtechniques, the accuracy of this approach is shown to be limited [11]. On the other hand, ICA [12] has arisen as analternative/complementary approach since the early 2000s [13, 14, 15]. ICA decomposes the 4D fMRI data to a setof spatial components with maximum statistical independence and their associated time series. Smith et al.[16] ina meta-analytic study of 30,000 rsfmRI scans with the aid of ICA revealed a functional “partition" of the brain intoresting-state Networks (RSNs), such as the Sensorimotor, Default mode and Auditory networks. Applications of ICAinclude also data pre-processing, where noise-related components are regressed out from the original fMRI signals[17]. However, while ICA produces spatial components that are statistically independent to each other, there is noclear link between the spatial components and speciﬁc brain functions and furthermore the spatial components cannotin general be ordered by relative importance [10]. Another issue is that most of the standard algorithms that computeICs utilize gradient based optimization algorithms that use an iterative scheme; the initial guesses in these algorithmsare generated randomly making the whole process stochastic: for the same dataset, the obtained spatial componentsmay differ signiﬁcantly over repeated runs [18]. Thus, the robustness/reproducibility of the ICA results over repeatedruns may be questioned.In order to tackle the above issues, several techniques have been proposed for the classiﬁcation of ICs and the con-struction of subject speciﬁc ROIs [19, 20]. Advances have been also made regarding the selection of the model orderof the ICA decomposition, such as the Bayesian dimensionality estimation technique [13] and the use of theoreticalinformation criteria for model order selection [21]. Finally, the so-called ranking and averaging ICA by reproducibil-ity (RAICAR) [20, 18] (see also [10] for a critical discussion) aims at resolving issues regarding stochasticity androbustness of the ICA decomposition. RAICAR utilizes a sufﬁcient number of ICA realizations and based on the re-producibility of the ICs aims to rank them in terms of the most “reliable" components. Reliable ICs among realizationsare assessed via correlations and the ﬁnal estimate of each component is averaged.Alternatively and/or complementary to the above analysis, linear manifold learning algorithms such as Principal Com-ponent Analysis (PCA) [22, 23, 24] and Multidimensional Scaling (MDS) [25, 26] have been exploited. PCA hasbeen succesfully applied in the pre-processing routine for dimensionality reduction (often prior to ICA) [27]. Appli-cations of PCA, include also the recovery of signals of interest [28] and the construction of FCN from fMRI scans intask-related experiments [23, 24]. In these studies, the performance of PCA with respect to the detection of regions ofcorrelated voxels has been shown to be satisfactory but not without problems. For example, a study by Baumgartner etal. [24] highlighted the limits of PCA to correctly identify activation of brain regions in cases of low contrast-to-noiseratios (CNR) appearing when signal sources of e.g. physiological noise are present [29].MDS has been also widely used in fMRI (mostly for task-based studies) mainly for the identiﬁcation of similaritiesbetween brain regions in terms of voxel-wise connectivity [30, 31, 32, 33, 34, 35]. The implementation of MDS inneuroimaging dates back to the work of Friston et al. [26]. There, it was investigated the embedded (voxel-wise)connectivity of fmRI data acquired during tasks of word generation between healthy and schizophrenia subjects. Sal-vador et al. [36] used MDS to investigate the embedded connectivity of anatomical regions of the brain from rsfMRIdata. Benjaminsson et al. [37] used MDS to embed high-dimensional rsfMRI data from the mutual information spaceto a low dimensional euclidean space for the identiﬁcation of RSNs. Herve et al. [38] used MDS to acquire a lowdimensional approximation of interregional correlations for the investigation of the affective speech comprehension.Finally, in a meta-analytic study by Etkin et al. [39], MDS was exploited to provide a low-dimensional visualization ofco-activation interrelations of ROIs. MDS has been also used in works investigating the functional (dys)connectivityassociated with schizophrenia [40] and Asperger’s Syndrome [41].However, thus far, only a few studies have exploited non-linear manifold learning algorithms such as Local LinearEmbedding (LLE) [42], Isometric Feature Mapping (ISOMAP) [43] and Diffusion Maps [44] for the analysis of fMRIdata and particularly for the construction of FCN. LLE has been applied in rsfMRI studies for the improvement ofpredictions in ageing studies [45] for low-dimensional clustering towards the classiﬁcation of healthy subjects andpatients with schizophrenia [46] and as an alternative method for dimensionality reduction before the application ofICA in task-related fMRI where non-linear relationships in the BOLD signal are introduced [47].In Anderson et al. [48], ISOMAP was employed to a benchmark rsfMRI dataset of 146 subjects for the constructionof embedded low-dimensional FCN for the classiﬁcation between controls and schizophrenia patients. ROIs wereselected using single-subject ICA and the similarities between the ICA components were assessed using a pseudo-distance measure based on lagged cross-correlation. Graph-theoretical measures were then used for classiﬁcation2

PREPRINT - M AY

27, 2020between patients and healthy controls. Another study based on single-subject ICA, exploited ISOMAP to classifyspatially unaligned fMRI scans [49]. The study considered patients with schizophrenia versus healthy controls anddifferent age groups (young/old) of healthy controls versus patients with alzheimer’s disease. Despite the relativelylow sample sizes, results were promising with good classiﬁcation rates. Recently, Haak et al. [50] utilized ISOMAP inan effort to create individualised connectopies from rsfMRI recordings taken from the WU-Minn Human ConnectomeProject in a fully data-driven manner.Thus, only a handful of studies have used Diffusion Maps for the analysis of fMRI data, focused on the clustering ofspatial maps of task-related experiments [51, 52]. Shen et al. [51] employed Diffusion Maps to separate activated fromnon-activated voxels. Sipola et al.[52] used Diffusion Maps with a Gaussian kernel to cluster selected fMRI spatialmaps that are derived by ICA. They demonstrated their approach using fMRI recordings acquired from healthy par-ticipants listening to a stimulus with a rich musical structure. Other applications of Diffusion Maps in neuroimagingconcern the epileptic-seizure prediction and the identiﬁcation pre-seizure state in EEG timeseries [53, 54]. A reviewon the intersection between manifold learning methods and the construction of FCN can be found in [55] and [56].Here, we used MDS, ISOMAP and Diffusion Maps to construct embedded FCN from single-subject ICA analysisof rsfMRI data taken from healthy controls and schizophrenia patients. For our demonstrations, we used the CO-BRE rsfMRI data that are publicly available and have been used recently in many studies [57, 58, 48, 59]. Basedon key global graph-theoretical measures of the embedded graphs, we assessed their classiﬁcation efﬁciency usingseveral machine learning algorithms, namely Linear standard Support Vector Machines (LSVM), Radial (Radial ba-sis function kernel) Support Vector Machines (RSVM), k-Nearest Neighbour (k-NN), and Artiﬁcial Neural Networks(ANN). We also investigated the performance of two most-commonly used metrics, namely the cross-correlation andthe euclidean distance. Our analysis showed that Diffusion Maps outperformed all other methods and lagged cross-correlation proved to be a better choice than the euclidean metric.At this point, we should note, that our study does not aim at the best classiﬁcation performance by trying to ﬁnd thebest possible pre-processing pipe-line of the raw fMRI data and/or the selection of subjects and/or the selection of thebest set of graph-theoretical measures that provide the maximum classiﬁcation. Yet, we aim at using state-of-the-artmanifold learning methods for the construction of the FCN and compare their classiﬁcation efﬁciency using only a fewkey global theoretical graph-measures and compare the obtained results with those derived by similar studies (see e.g.[48]) using the same pipe-line for data pre-processing and single-subject ICA. To the best of our knowledge, this paperis the ﬁrst to perform such a thorough comparative analysis of both linear and nonlinear manifold learning on rsfMRIdata. It is also the ﬁrst study to show how Diffusion Maps can be used for the construction of FCN from rsfMRI,assessing also the efﬁciency of two basic distance metrics, the cross-correlation-based and the Euclidean distance.

For our demonstrations, we performed a basic pre-processing of the raw fMRI data using SPM as also implemented inother studies (see e.g. [48]). In particular, for the fMRI data processing, we used FEAT (FMRI Expert Analysis Tool)Version 6.00, part of FSL (FMRIB’s Software Library, ). In particular, the followingpre-statistics processing was applied: motion correction using MCFLIRT [60], slice-timing correction using Fourier-space time-series phase-shifting; non-brain removal using BET [61], spatial smoothing using a Gaussian kernel ofFWHM 5mm, grand-mean intensity normalization of the entire 4D dataset by a single multiplicative factor. ICAAROMA [17] was also implemented to detect and factor out noise-related independent components (ICs) along witha high-pass temporal ﬁltering at 0.01 Hz (100 seconds) that was applied after ICA AROMA procedure as it is highlyrecommended [17].We then proceeded with the decomposition of the pre-processed fMRI data to spatial ICs (for each suject) using theRAICAR methodology [20]. In this way, we found the most reproducible spatial ICs over repeated runs as a solutionto the well known problem of the variability of the ICA decomposition [18]. This choice was related to the benchmarkfMRIdata per se as there was only a single session per subject with relatively small duration (6 minutes); therefore wewouldn’t expect a robust ICA decomposition for all subjects (see also the discussion in [10]). Another choice wouldbe to perform group-ICA analysis, but we decided to use single-subject ICA in order to have a common ground withthe methodologically similar work presented in [48]. Group-ICA analysis will be performed in a future study.3

PREPRINT - M AY

27, 2020

ICA is a linear data-driven technique that reduces the high-dimensional fMRI F ( t, x, y, z ) space in a set of M statisti-cally independent spatial components (ICs). This reduction can be represented as: F ( t, x, y, z ) = M X i =1 A i ( t ) C i ( x, y, z ) , (1)where F ( t, x, y, z ) is the measured BOLD signal, A i ( t ) is the temporal amplitude (the matrix A containing all tem-poral amplitudes is known as mixing matrix) and C i ( x, y, z ) is the spatial magnitude of the i-th ICA component.While PCA requires that the principal components are uncorrelated and orthogonal, ICA asks for statistical indepen-dence between the ICs. Generally, ICA algorithms are based either on the minimization of mutual information or themaximization of non-gaussianity among components. As discussed in the introduction, most of the standard imple-mentations of ICA, such as the one in MELODIC (Multivariate Exploratory Linear Optimized Decomposition intoIndependent Components) [62], which is part of FSL fMRI analysis software package, share similar gradient basedoptimization algorithms using an iterative scheme whose initial values are generated randomly, thus making the wholeprocess stochastic. As a consequence, results over repeated runs may differ signiﬁcantly [18]. A solution to thisproblem is provided by the so-called ranking and averaging ICA by reproducibility (RAICAR) [20] that we brieﬂydescribe in the following section. The RAICAR methodology developed by Yang et al. [20] was addressed to tackle the problem of the ICs variabilityby performing K ICA realizations. Thus, RAICAR leads to K “slightly" different mixing matrices A , A . . . A K and K different sets of ICs S , S . . . S K . Each realization ﬁnds a ﬁxed number M of spatial ICs. Then, a crossrealization correlation matrix ( CRCM ) of size M · K × M · K is constructed and the alignment (ICA producesunaligned components) of ICs of each realization takes place on the basis of the absolute maximum cross correlationamong components. Thus, the cross realization correlation matrix reads: CRCM =  R , R , . . . R ,K − R ,K R , . . . . . . R ,K ... . . . . . . . . . ... R K − , . . . . . . R K − ,K R K, R K, . . . R K,K − R K,K  R i,j with i, j = 1 , ...K are submatrices of size M × M and their elements represent the absolute spatial cross corre-lation coefﬁcients among components and across realizations. CRCM is a symmetric matrix and its diagonal consistsof identity matrices which are ignored for the next steps of the algorithm.The procedure starts with the identiﬁcation of the global maximum of the CRCM, thus ﬁnding the matched componentbased on two realizations. At the next step, the methodology seeks for the highest absolute spatial cross correlation co-efﬁcients of the identiﬁed component in the remaining realizations factoring out all others. The procedure is repeated M times until M aligned components are found.The next step involves the computation of the reproducibility index for each of the aligned components. This is doneby constructing the histogram of the absolute spatial cross correlation coefﬁcients of the upper triangle matrix of theCRCM. This histogram tends to be bimodal, as in general, we expect low spatial cross correlation among most of theICs and high spatial cross correlation only for a few of them. A spatial cross correlation threshold is applied withthe desired value lying in the valley of the histogram between the two modes [20]. Finally, the reproducibility indexis computed for each one of the aligned components. This is done by aggregating the supra-threshold absolute crosscorrelation coefﬁcients of the CRCM for each of the aligned components.The last step of the algorithm is the ranking and averaging of the aligned components in descending order based onthe reproducibility index. The selective averaging is applied so that the components are averaged if and only if, thegiven aligned component has at least one absolute spatial cross correlation coefﬁcient above the threshold across real-izations.After applying RAICAR, the ICs are chosen via a cut-off threshold based on the reproducibility index (of each com-ponent) that indicates how consistent is the appearance of an IC across realizations.Here, we have set K = 30 realizations (same as also in [20]); taking more realizations did not change the outcomes ofthe analysis. The spatial cross correlation threshold was chosen by localizing the minimum of the histogram of absolute4 PREPRINT - M AY

27, 2020cross correlation coefﬁcients of the CRCM. This threshold was speciﬁed separately for each subject. The reproducibleICs were determined by the histogram of reproducibility index over the number of components in descending order.The cut-off threshold was set as the half of the maximum reproducibility index value possible K ( K − · . (this choiceis the same with the one used in [20]). This cut-off threshold was set equal for all subjects.Subjects with less than 20 reproducible ICs were excluded from further analysis as this number of components re-sulted in disconnected graphs. Thus, we ended up with 104 subjects out of which 57 were healthy controls and 47schizophrenia patients. For the construction of FCN, we used all combinations between three manifold learning algorithms, namely MDS,ISOMAP and Diffusion Maps and two widely used metrics, namely the lagged cross-correlation [48, 63, 64] and theeuclidian distance [52, 65, 66].

For every pair of the associated time courses of the ICs, say A i and A j , the cross-correlation function (CCF) over amaximum of three time lags (as in [48]) reads: CCF ( A i , A j , l ) = E [( A i,t + l − A i )( A j,t − A j )] q E [( A i,t − A i )( A j,t − A j ] , (2)where l is the time lag, and A i is the mean value of the whole time series.For the construction of the FCN connectivity/ correlation matrices, we used a pseudo-distance measure d c deﬁned as(see also [48]): d c ( A i , A j ) = 1 − max l =0 , , , ( | CCF ( A i , A j , l ) | ) . (3)The resulting dis(similarity) matrices are fully connected and therefore are hardly comparable between subjects (seethe discussion in [48]). Thus, here as a standard practice, (and in all other algorithms described below), we appliedthresholding to the (dis)similarity matrices in order to keep the strongest connections of the derived functional connec-tivity matrix. In order to factor out the inﬂuence of the variable network density on the computation and comparison ofgraph-theoretical measures across groups [67], we have implemented the approach of proportional thresholding (PT)[67]. In particular, we considered a range of levels of PT from 20% to 70% with a step of 2%. Below the threshold of20%, graphs became too fragmented, while thresholds above the 70% resulted in dense graphs, that mostly includednoisy and less signiﬁcant edges (see also the discussion in [68] about this issue). If a graph was still fragmented afterthresholding, the giant component was used for further analysis. The euclidean distance is used in many studies to assess (dis)similarities between fMRI data points [52, 65, 66]. Fortime series associated with the independent spatial maps, A i and A j , the euclidean distance reads: L ( A i , A j ) = vuut T X t =1 ( A i,t − A j,t ) . (4)For the construction of FCN, PT was applied to the euclidean similarity matrices for each individual over the range of %- %. Below we present how MDS, ISOMAP and Diffusion Maps can be exploited to construct (embedded) FCN.5

PREPRINT - M AY

27, 2020

Multidimensional Scaling [25] is a linear method of dimensionality reduction that can be used to ﬁnd similar-ities between pairs of objects in a low-dimensional (embedded) space. Given a set of M objects/observables x , x , . . . , x M ∈ R N , MDS produces a low-dimensional data representation y , y , . . . , y M ∈ R p , p ≪ N mini-mizing the objective function: X i,j, i = j (cid:16) k x i − x j k − d ( x i , x j ) (cid:17) . (5) d ( x i , x j ) is the (dis)similarity obtained (by any (dis)similarity measure of choice) between all pairs of points x , x , . . . , x M ∈ R N . In our case, the observables x i are the amplitudes of the spatial ICs A i, i =1 ,..M ∈ R N .Here, N = 150 (number of time points).The coordinates of the embedded manifold y , y , . . . , y M are given by: [ y , . . . , y M ] = Λ p × p · V Tp × M . (6) Λ p × p contains the square roots of the p largest eigenvalues, and V Tp × M are the corresponding eigenvectors of thematrix: B = − HD H T . (7) H M × M is the double centering matrix deﬁned as: H = I − M · T , = [ . . . ] × M . (8)Using MDS, the dimensionality reduction of the original data X = x , x , . . . , x M ∈ R N yields the embedding of Y = y , y , . . . , y M ∈ R p , p ≪ N .Here, for the construction of the embedded FCN, we produced distance matrices (using either lagged cross-correlationor the euclidean distance) D Y of size M × M .For the implementation of the MDS algorithm, we used the “cmdscale" function contained in the software pack-age“Stats" in the R free Software Environment [69]. ISOMAP is a non-linear manifold learning algorithm that given a set of M objects/observables x , x , . . . , x M ∈ R N produces a low-dimensional data representation y , y , . . . , y M ∈ R p , p ≪ N minimizing the objective function: X i,j, i = j (cid:16) d G ( x i , x j ) − d ( x i , x j ) (cid:17) , (9)where d G ( x i , x j ) is the shortest path (geodesic distance) and d ( x i , x j ) is the (dis)similarity obtained (by any(dis)similarity measure of choice) between all pairs of points x , x , . . . , x M ∈ R N .In our case, the observables x i are the amplitudes of the spatial ICs A i, i =1 ,..M ∈ R N .The above minimization problem is solved as follows: [43]: • Construct a graph G = ( V, E ) , where the vertices V are the ICs A i ; its links E are created by usingeither the k -nearest neighbors algorithm or a ﬁxed distance between nodes, known as the ǫ distance. Forexample, a link between two ICs is created if d i,j ≡ d ( A i , A j ) < ǫ , ∀ i = j . Here, we used the k nearestneighbours algorithm with k = 3 , , , [43]). Set the weight w i,j of the link (if any) between A i , A j as w i,j = d ( A i , A j ) . If there is not a link set: w i,j = 0 .6 PREPRINT - M AY

27, 2020 • Approximate the embedded manifold by estimating the shortest path (geodesic distance) d G ( A i , A j ) foreach pair of nodes based on the distances d i,j ; this step can be implemented for example using the Dijkstraalgorithm [70]. This procedure results to a matrix, D G with elements the shortest paths: D G ij ≡ d G ( A i , A j ) = min (cid:8) d i,j , d i,k + d k,j (cid:9) , k = 1 , , . . . , M k = i, j. (10) • Estimate the coordinates of the low-dimensional (embedded) manifold y , y , . . . , y M exploiting the MDSalgorithm [25] on the geodesic distance matrix D G .Here, for the implementation of the ISOMAP algorithm, we used the package “vegan" [71] contained in the R freeSoftware Environment [69]. Diffusion Maps [44] is a non-linear manifold learning algorithm that given a set of M objects/observables X = x , x , . . . , x M ∈ R N produces a low-dimensional representation Y = y , y , . . . , y M ∈ R p , p ≪ N , address-ing the diffusion distance among data points as the preserved metric[72]. The embedding of the data in the low-dimensional space is obtained by the projections on the eigenvectors of a normalized graph Laplacian [73]. TheDiffusion Maps algorithm can be described in a nutshell in the following steps: • Construction of the afﬁnity matrix W M × M , here M is the number of ICs for each subject. The elements W ij represent the weighted edges connecting nodes i and j using the so-called heat kernel: W i,j = exp ( − d ( x i , x j ) σ ) , (11)where x i is a N -dimensional point (here, N=150), d ( x i , x j ) are the (dis)similarities obtained (by any dissim-ilarity measure of choice) between all pairs of points x , x , . . . , x M ∈ R N and σ is an appropriately chosenparameter which can be physically described as a scale parameter of the heat kernel [44]. The heat kernel W satisﬁes two important properties, the one of symmetry and the other of positive semi-deﬁnite matrix. Thelatter property is crucial and allows the interpretation of weights as scaled probabilities of “jumping" fromone node to another.The parameter σ of the neighborhood size is data-dependent and here, it was determined by ﬁnding the linearregion in the sum of all weights in W , say S w , using different values of σ [74, 52]. S w is calculated throughthe formula: S w = M X i M X j W ij , (12)In order to use a single value of σ for all participants, we computed a super-distribution of the sum ofweights across subjects (taking the median value of the distributions) using different values of σ . Thus, weconsidered values of σ lying in the linear region of the super-distribution. After inspection, every value of σ was lying in the linear region of every single subject’s logarithmic plot. • Formulation of the diagonal M × M normalization matrix K along with the diffusion matrix P : K ii = M X j =1 W ij , (13) P = K − W . (14)Each element of the symmetric and normalized diffusion matrix P reﬂects the connectivity between two datapoints x and x . As an analogy, this connectivity can be seen as the probability of “jumping" from one pointto another in a random walk process. Consequently, raising P to a power of t can be thought as a diffusionprocess. As the number of t increases, paths with low probability tend to zero, while the connectivity be-tween paths with high probability remains high enough, thus governing the diffusion process [44]. Thus, the7 PREPRINT - M AY

27, 2020algorithm of Diffusion Maps preserves the diffusion distance among points in a low-dimensional euclideanspace. The diffusion distance is closely related to the diffusion matrix P and for two distinct points x i , x j and for speciﬁc time instance t is deﬁned as [75]: D t ( x i , x j ) = X m | P tim − P tmj | . (15)Unlike geodesic distance, the diffusion distance is robust to noise perturbations, as it sums over all possiblepaths (of t steps) between points [44]. • Construction of the conjugate matrix P = K / PK − / , (16)substituting Eq.(14) to Eq.(16) we get P = K − / WK − / . (17)This is the so-called Graph Laplacian matrix [73]. The matrix P is adjoint to the symmetric matrix P . Thus, P and P share the same eigenvalues [76]. • Singular Value Decomposition (SVD) of P yields P = U ΛU ∗ , (18)where Λ is a diagonal matrix containing the M eigenvalues of P and U the eigenvectors of P . The eigenvec-tors V of P can be bow found by [76]: V = K / U . (19) • By taking out the trivial eigenvalue λ = 1 of the matrix Λ and the corresponding eigenvector contained in V ,the coordinates of the low dimensional embedded manifold y , y , . . . , y M are given by: [ y , . . . , y M ] = Λ p × p · V Tp × M , (20)where Λ p × p contains the p largest eigenvalues, and V Tp × M are the corresponding eigenvectors of the diffusionmatrix P .For the implementation of the above algorithm we used the package “diffusionMap" [77] contained in the R freeSoftware Environment [69]. Here, the embedded dimension was determined via the eigenspectrum of the ﬁnal decomposition for every dimen-sionality reduction/Manifold learning algorithm. If there is a gap between the ﬁrst few larger eigenvalues of the ﬁnaldecomposition and the rest of the eigenspectrum, these few eigenmodes capture most of the distance differences be-tween data points and are able to represent and uncover intrinsic properties of the data structure [76, 78, 79]. In orderto determine the embedded dimension for the methods described above, we considered the following steps: We sortedthe eigenvalues in decreasing order eg. λ ≥ λ ≥ λ · · · ≥ λ M (for diffusion maps λ is discarded). Then, foreach subject, we calculated the consequent pairwise differences λ − λ , λ − λ , ... , λ M − − λ M . A gap betweenthe mean (over all subjects) differences determines from which point and further, the relative contribution of anothereigendimension is redundant (this having a small contribution to the reconstruction of the embedded FCN). Thus, herefor our computations, we considered low-dimensional embeddings of p = (2 , , , dimensions for each one of themanifold learning methods described above. Here, we analyzed the topological properties of the binary FCN graphs on the basis of three major graph measures,namely, the average path length, the global clustering coefﬁcient, and the median degree, which are the most frequentlyused in neuroscience [80]. In particular, given a graph G = ( V, E ) with g ij representing the link (0: unconnected or1: connected) from node i to node j and k i = P N V j =1 g ij the degree of node i, the graph measures are computed asfollows:a) The average path length is deﬁned by: L = N V ( N V − P i = j D G ij , i.e. is the average number of steps along the8 PREPRINT - M AY

27, 2020shortest paths D G ij for all possible pairs of the network nodes. This is a measure of the efﬁciency of information ormass transport on a network between all possible pairs of nodes.b) The global clustering coefﬁcient is deﬁned by: C g = P t c P t , where t is a triplet and t c is a closed triplet. A triplet of agraph consists of three nodes that are connected by either to open (i.e open triplet) or closed (i.e closed triplet) ties. Ingeneral, this measure indicates how random or structured a graph is (in our case, in terms of functional segregation).c) The median degree M k is the median value of the degree distribution of G . This measure reﬂects how wellconnected is the “median" network node in terms of the number of links that coincide with it.An extensive review of deﬁnitions and meaning of the above and other graph-theoretical measures with respect tobrain functional networks can be found in [81, 80].The computations for the graph analysis were performed with the “igraph" [82] package contained in the free softwareenvironment R [69]. Based on the three key graph measures, the classiﬁcation was assessed using machine learning algorithms, namely Lin-ear Support Vector Machines(LSVM), Radial Support Vector Machines (RSVM), Artiﬁcial Neural Networks (ANN)and k-NN classiﬁcation (for a brief description of the above algorithms and their parameters see in the Appendix).The classiﬁcation algorithms were trained, validated and tested using a 10-fold cross-validation scheme which wasrepeated 100 times. Thus, we separated the data in ten separate sub-samples; nine of them were used as trainingsets and one of them was used for validation. This process was repeated 10 times leaving out each time a differentsub-sample which served as a validation set. The whole procedure was repeated 100 times. The overall classiﬁcationrate was determined via the computation of the average classiﬁcation rate over all the repetitions of the 10-fold crossvalidation for each model.The average confusion matrix (over all repetitions of the 10-fold cross validation) was also computed for each classi-ﬁcation model. The confusion matrix is a × (in the case of binary classiﬁcation) square matrix containing all truepositives T P , false positives

F P , true negatives

T N and false negatives

F N . Here, we considered as positives P theschizophrenia cases and as negatives N the healthy control cases. Sensitivity (also called the True Positive Rate) andSpeciﬁcity (also called the True Negative Rate) are basic statistical measures for the assessment of binary classiﬁca-tions. The sensitivity T P R is given by

T P R = T PT P + F N , while the speciﬁcity

T N R is given by

T N R = T NT N + F P .Here, sensitivity characterizes the ability of the classiﬁer to correctly identify a schizophrenic subject, while speciﬁcityis the ability of the classiﬁer to correctly identify a healthy subject.Here, we used the algorithms contained in the package “caret" [83] contained in the R free software environment [69].

In table 1, we present the best classiﬁcation accuracy, along with the corresponding sensitivity and speciﬁcity rate foreach manifold learning algorithm, threshold and classiﬁer when using the lagged cross-correlation metric. At the endof the table 1, we provide also the results obtained by the “conventional" (thresholded) cross-correlation matrix.Fig. 1 shows the super-distribution of the sum of weights of all subjects against different values of σ used forconstruction of the FCN using the Diffusion Maps algorithm; the red dotted vertical line shows the optimal σ (here, σ = 0.325) while the black vertical lines bound the linear region ( σ ∈ (0 . , . ). The results were robust to differentchoices of the time step of the Diffusion Maps algorithm, namely t = 0 , , . The investigation of the optimal σ wasperformed with respect to the best classiﬁcation accuracy over all classiﬁers.Fig.2 depicts the best classiﬁcation rates for MDS, ISOMAP, Diffusion Maps and lagged-cross correlation matrix fordifferent classiﬁers are reported. Diffusion Maps provided the best classiﬁcation accuracy (79.3% for SVM with anRBF kernel and 52% PT), thus appearing more robust over a wide range of PT. With respect to the maximum classiﬁ-cation accuracy obtained by Diffusion Maps, results were robust over a wide range of values of σ ∈ (0 . , . .Fig. 3 depicts the classiﬁcation performance obtained with Diffusion Maps for different values of σ .Isomap performed better for low thresholds. Overall, the performance of Isomap was more or less the same with thecross-correlation matrix with the best classiﬁcation rate being slightly higher for Isomap (74.4% for SVM with anRBF kernel and 24% PT). Its performance was however sensitive to the choice of the number k of nearest neighbors;with k = 5 we got a 74.4% classiﬁcation accuracy for SVM with an RBF kernel and 24% PT) while for k = 6 we9 PREPRINT - M AY

27, 2020Table 1: Best classiﬁcation rates over all manifold and machine learning methods using the lagged cross-correlationpseudo-distance measure d c (see 2.3.1); the embedded dimension (Emb. Dim) is also shown for each method alongwith the corresponding best threshold point (Thres), Classiﬁer, Accuracy (Acc), Sensitivity (Sens) and Speciﬁcity(Spec) rate. Method Emb. Dim Thres Classiﬁer Acc ± SD Sens Spec

MDS ± ± ± ± Isomap ± ± ± ± Diffusion Maps ± ± ± ± Correlation Matrix - 0.52 RSVM 69.5 ± ± ± ± Figure 1:

Super-distribution of all subjects of the sum of the weights (see the equation in 12) (e.g. by taking themedian value of the distribution across subjects per value of σ ) using different values of σ parameter. The reddashed vertical line shows the optimal σ that was found to be σ = 0.325. The other two vertical black lines depictthe linear zone in which we investigated values of σ . PREPRINT - M AY

27, 2020Figure 2:

Overall classiﬁcation performance for all thresholds (from 20%-70% of the strongest edges with 2%as pace) and classiﬁers. The metric used is the lagged cross-correlation based pseudo-distance measure d c (see2.3.1). Figure 3:

Classiﬁcation performance of Diffusion Maps for different values of σ and classiﬁers. The metric usedis the pseudo-distance measure d c (see 2.3.1) based on lagged cross-correlation. got a 71% classiﬁcation accuracy for ANN and 20% PT), thus providing the best performance for ISOMAP. MDSwas outperformed by both ISOMAP and Diffusion Maps as well as the cross correlation matrix; the classiﬁcationperformance of MDS was overall relatively poor. 11 PREPRINT - M AY

27, 2020Fig. 4 shows characteristic eigenspectrums of the ﬁnal decomposition for MDS, ISOMAL and Diffusion Maps. Asit is shown, there are three gaps: the ﬁrst gap appears between the ﬁrst eigenvalue and the rest of the spectrum, thesecond gap between the ﬁrst two eigenvalues and the rest of the spectrum, and a third gap appears between the ﬁrstfour-ﬁve eigenvalues and the rest of the spectrum.Figure 4:

Mean Differences of the ﬁrst larger 15 eigenvalues (see 2.4.4) for all manifold learning algorithmswith the cross correlation metric. A) MDS (see 2.4.1), B) ISOMAP (see 2.4.2) and C) Diffusion Maps (see 2.4.3),based on the optimal parameters. The red dashed vertical line marks the maximum number of dimensionsconsidered in this study (i.e. ﬁve dimensions).3.2 Classiﬁcation performance using the euclidean distance

The same analysis was performed for the euclidean distance. The best classiﬁcation rates using the euclidean distancefor all manifold learning methods and classiﬁers are presented in table 2.Overall, the classiﬁcation rates with the euclidean distance were more noisy compared to the ones computed with thelagged cross-correlation distance.For a wide range of PT, the classiﬁcation rates obtained were under 60% and sometimes at the level of a random ora completely biased classiﬁer (towards the larger class, here, the healthy controls). In our case, a random classiﬁerresulted to a 50.5% classiﬁcation accuracy, while for a strict majority rule for imbalanced datasets, a completelybiased (and thus useless) classiﬁcation model results to a 54.8 % classiﬁcation rate.Fig. 5 depicts the accuracy of all methods across all thresholds using all different classiﬁers.In this case, the best classiﬁcation rate was obtained with Isomap (72.9% for RSVM and 68% PT). Despite the fact thatfor certain thresholds, the maximum classiﬁcation rates obtained with the euclidean distance were enough high, theperformance of the methods was generally poorer compared to the ones computed with the lagged cross-correlationdistance for all the range of PT.Characteristic eigenspectrums for all manifold learning algorithms utilizing the euclidean distance are shown in Fig.6.Finally, in Fig. 7 are given the boxplots of the classiﬁcation rates among metrics used. Overall, the lagged cross-correlation distance resulted in higher classiﬁcation rates compared to the euclidean distance with the exception ofMDS. Except from some speciﬁc thresholds, MDS in both cases performed poorly.12

PREPRINT - M AY

27, 2020Table 2: Best classiﬁcation rates over all manifold learning methods and classiﬁers with the use of the euclideandistance L (see 2.3.2); the embedded dimension (Emb. Dim), threshold point (Thres), classiﬁer, Accuracy (Acc),Sensitivity (Sens) and Speciﬁcity (Spec) rates. Method Emb. Dim Thres Classiﬁer Acc ± SD Sens Spec

MDS ± ± ± ± Isomap ± ± ± ± Diffusion Maps ± ± ± ± Euclidean Matrix - 0.64 RSVM 71.6 ± ± ± ± Figure 5:

Classiﬁcation performance using the euclidean distance (see 2.3.2) for all thresholds (from 20%-70%of the strongest edges with 2% as pace) and classiﬁers. PREPRINT - M AY

27, 2020Figure 6:

Mean Differences of the ﬁrst larger 15 eigenvalues (see 2.4.4) for all manifold learning algorithms withthe euclidean metric. A) MDS (see 2.4.1), B) ISOMAP (see 2.4.2) and C) Diffusion Maps (see 2.4.3) using theoptimal parameters. The red dashed vertical line shows marks the maximum number of dimensions consideredin this study (i.e. ﬁve dimensions).

Figure 7:

Boxplots of classiﬁcation rates over all classiﬁers and thresholds, using the A) lagged cross-correlationpseudo-distance d c (see 2.3.1). B) the euclidean distance L (see 2.3.2). The labels at the bottom of each panelcorrespond to the method used for the construction of the FCN: the Cross Correlation matrix (Corr. Matrix),Diffusion Maps (DMaps), ISOMAP, MDS. The black horizontal lines mark the median values of the distribu-tions. PREPRINT - M AY

27, 2020

In this study, we constructed embedded FCN from rsfMRI data using linear and non-linear manifold learning tech-niques. Based on the graph theoretical measures of the constructed FCN we then used machine learning algorithmsfor classiﬁcation purposes. We also compared the performance of two widely used metrics, namely the lagged cross-correlation and the euclidian metric. For our illustrations, we used a publicly available dataset of resting fMRI record-ings taken from healthy and patients with schizophrenia. This is the ﬁrst study that performs such a systematiccomparative analysis between various manifold learning algorithms, machine learning algorithms and metrics. To thebest of our knowledge, it is also the ﬁrst study that shows how Diffusion Maps can be exploited to construct FCN fromfMRI data.At this point we should note that our intention was not to try to obtain the best possible classiﬁcation performance by“optimising" the pre-processing of the raw fMRI data and/or by trying to ﬁnd the best set of graph-theoretical measures.Instead we used a very basic pre-processing of the raw data (as the one performed in [48]), while classiﬁcation wasbased only on three key graph-theoretical measures in neuroscience [80], namely, the average path length, the globalclustering coefﬁcient and the median degree of the embedded binary networks. Based on the above, our best reportedaccuracy was 79.3 % obtained with Diffusion Maps and lagged cross-correlation (using 10 fold cross validation re-peated 100 times).For the same benchmark fMRI dataset Anderson and Cohen [48] used ISOMAP for the construction of embeddedlow-dimensional FCN for the classiﬁcation between controls and schizophrenia patients. ROIs were acquired usingsingle subject ICA and functional connectivity was accessed using the lagged cross-correlation distance. The analysisrevealed differences in small world properties among groups and 13 graph theoretic features led to a reported 65%accuracy rate. Algunaid et al. [68] reported a 95% accuracy (with FSV feature selection method and 82.5% with aWelch’s t-test) testing more than 360 graph-based features. AAL was used for signal extraction and ROI selection,while an SVM-based classiﬁer was used along with 10 fold cross validation scheme to evaluate its performance.An important outcome of our analysis is that the Diffusion Maps result in more robust results and higher classiﬁcationaccuracy and that the lagged cross-correlation distance outperforms the euclidean distance which is used traditionallyto construct connectivity matrices from fMRI data.In a future work, we aim at analysing the local-graph theoretical properties, thus matching the derived RAICAR ICswith known resting state networks and also perform group ICA analysis.

The COBRE dataset is publicly available at http://fcon_1000.projects.nitrc.org/indi/retro/cobre.html .For our analysis, we used the “R" software subroutines as described in 2 and 7.

Conceptualization:Constantinos Siettos; Methodology: Constantinos Siettos; Formal analysis and investigation: Ioan-nis Gallos; Data acquisition and processing: Ioannis Gallos and Evangelos Galaris; Writing- original draft preparation:Ioannis Gallos; Writing - review and editing: Constantinos Siettos and Ioannis Gallos; Supervision:Constantinos Siet-tos

For classiﬁcation we used Linear Support Vector Machines (LSVM), Radial (Radial basis function kernel) SupportVector Machines (RSVM), one hidden layer Artiﬁcial Neural Networks (ANN) and k-NN classiﬁer(k-NN). All classi-ﬁers were trained and evaluated via repeated 10-fold cross-validation scheme repeated 100 times. For the classiﬁcationwe used the three key graph theoretical measures as described in subsection 2.5 in Materials and Methods. Trainingand classiﬁcation were implemented using algorithms contained in package “caret" [83] publicly available in R freesoftware environment [69].

Support vectors machines (SVM) aim at ﬁnding the optimal separating plane or hyperplane in the feature space amonggroups. In particular, for a set of points ( x i , y i ) i =1 , ..N , where N is the number of subjects, x i ∈ R d contains d attributes/features selected for subject i and y i ∈ ( − , the subject’s class (here, either healthy or patient), SVMattempts to ﬁnd the optimal plane or hyperplane that divides the two classes by maximizing the margin of separation.15 PREPRINT - M AY

27, 2020Any hyperplane with a given set of points x i can be modelled as w · x i + b = 0 where w represent the weightsof features x i . Parallel hyperplanes can be described as w · x i + b ≥ if y i = 1 and w · x i + b ≤ − if y i = − . The optimization problem then aims at maximizing the margin between hyperplanes k w k such that for every ( y i ) i =1 , ..N , y i · ( w · x i + b ) ≥ . One can take advantage of a regularization parameter C indicating the penalty oferror z i that gives a trade-off between misclassiﬁcations and the width of the separating margin. This leads to the ﬁnaloptimization problem, which minimizes k w k + C · P i z i subject to y i · ( w · x i + b ) ≥ − z i , i = 1 , ..N .Based on the idea that the data maybe better separable in a higher dimensional space, SVM may utilize a kernelfunction to map x i ∈ R d to φ ( x i ) ∈ R D , D > d . In our study, besides standard Linear SVM (LSVM), we also usedRadial SVM (RSVM) making use of the Radial basis functions kernel given by K ( x i , x j ) = exp ( − k x i − x j k · γ ) , where γ is the kernel’s scale parameter. k-nearest neighbours algorithm is one of the simplest classiﬁcation/machine learning algorithms. Given ( x i , y i ) i =1 , ..N , where N is the number of subjects, x i ∈ R d contains d attributes/features selected for subject i and y i the subject’s class (here, either healthy or patient), k-NN utilizes euclidean distance in the feature space to per-form a voting system among κ closest neighbours. In this manner, each point is classiﬁed as “control", if the numberof“control" neighbours is greater than the number of “patient" neighbours and inversely. The number κ of closestneighbours is a parameter of choice that plays a crucial role in method’s performance. In this study, it is important tonote that we chose odd values of κ (i.e how many neighbours we take into consideration) in order not to have to breakpossible ties in the voting system among neighbours In this study, we also used feed-forward Artiﬁcial Neural Networks (ANN) consisting of one hidden layer. Theinput units were three as the number of the features considered for classiﬁcation. We have chosen one hidden layerconsisting from 1 to 5 neurons along with a bias term. The activation function used for all neurons was the logistictransfer function [84]. The output was one node (reﬂecting simple binary classiﬁcation control/patient). The trainingprocedure of the model was done via back-propagation [85] using a 10-fold cross validation scheme repeated 100times. Finally a weight decay parameter a (regularization parameter) was used to prevent over-ﬁtting and improvegeneralization [86] of the ﬁnal model. For the implementation of the ANN we used the “nnet" software package [87]publicly available in R free software environment [69]. We tuned the parameters of the algorithms via grid search.For the SVM: C = (0 . , . , . , . , , . , , . , , , , , , , , , γ = (0 . , . , . , . , . , . , , . , , . , , , , , , , , , .For the k-NN classiﬁer: κ = (1 , , , , .For the ANN: number of neurons in the hidden layer p = (1 , , , , ,decay level a = (0 . , . , . , . , . , . , . . References [1] Karl J Friston, Andrew P Holmes, Keith J Worsley, J-P Poline, Chris D Frith, and Richard SJ Frackowiak.Statistical parametric maps in functional imaging: a general linear approach.

Human brain mapping , 2(4):189–210, 1994.[2] Stephen M Smith, Mark Jenkinson, Mark W Woolrich, Christian F Beckmann, Timothy EJ Behrens, HeidiJohansen-Berg, Peter R Bannister, Marilena De Luca, Ivana Drobnjak, David E Flitney, et al. Advances infunctional and structural mr image analysis and implementation as fsl.

Neuroimage , 23:S208–S219, 2004.[3] Karl J Friston. Functional and effective connectivity: a review.

Brain connectivity , 1(1):13–36, 2011.[4] Meenakshi Khosla, Keith Jamison, Gia H Ngo, Amy Kuceyeski, and Mert R Sabuncu. Machine learning inresting-state fmri analysis.

Magnetic resonance imaging , 2019.[5] Yuan Zhou, Kun Wang, Yong Liu, Ming Song, Sonya W Song, and Tianzi Jiang. Spontaneous brain activityobserved with functional magnetic resonance imaging as a potential biomarker in neuropsychiatric disorders.

Cognitive neurodynamics , 4(4):275–294, 2010. 16

PREPRINT - M AY

27, 2020[6] Bharat Biswal, F Zerrin Yetkin, Victor M Haughton, and James S Hyde. Functional connectivity in the motorcortex of resting human brain using echo-planar mri.

Magnetic resonance in medicine , 34, 1995.[7] Michael D Greicius, Ben Krasnow, Allan L Reiss, and Vinod Menon. Functional connectivity in the restingbrain: a network analysis of the default mode hypothesis.

Proceedings of the National Academy of Sciences ,100(1):253–258, 2003.[8] Michael D Fox, Abraham Z Snyder, Justin L Vincent, Maurizio Corbetta, David C Van Essen, and Marcus ERaichle. The human brain is intrinsically organized into dynamic, anticorrelated functional networks.

Proceed-ings of the National Academy of Sciences , 102(27):9673–9678, 2005.[9] Daniel S Margulies, AM Clare Kelly, Lucina Q Uddin, Bharat B Biswal, F Xavier Castellanos, and Michael PMilham. Mapping the functional connectivity of anterior cingulate cortex.

Neuroimage , 37(2):579–588, 2007.[10] David M Cole, Stephen M Smith, and Christian F Beckmann. Advances and pitfalls in the analysis and interpre-tation of resting-state fmri data.

Frontiers in systems neuroscience , 4:8, 2010.[11] Michael D Saxe, Fortunato Battaglia, Jing-Wen Wang, Gael Malleret, Denis J David, James E Monckton,A Denise R Garcia, Michael V Sofroniew, Eric R Kandel, Luca Santarelli, et al. Ablation of hippocampalneurogenesis impairs contextual fear conditioning and synaptic plasticity in the dentate gyrus.

Proceedings ofthe National Academy of Sciences , 103(46):17501–17506, 2006.[12] Aapo Hyvärinen and Erkki Oja. Independent component analysis: algorithms and applications.

Neural networks ,13(4-5):411–430, 2000.[13] Christian F Beckmann, Marilena DeLuca, Joseph T Devlin, and Stephen M Smith. Investigations into resting-state connectivity using independent component analysis.

Philosophical Transactions of the Royal Society B:Biological Sciences , 360(1457):1001–1013, 2005.[14] Christian F Beckmann and Stephen M Smith. Tensorial extensions of independent component analysis for mul-tisubject fmri analysis.

Neuroimage , 25(1):294–311, 2005.[15] Dae Il Kim, Jing Sui, Srinivas Rachakonda, Tonya White, Dara S. Manoach, V. P. Clark, Beng-Choon Ho,S. Charles Schulz, and Vince D. Calhoun. Identiﬁcation of imaging biomarkers in schizophrenia: A coefﬁcient-constrained independent component analysis of the mind multi-site schizophrenia study.

Neuroinformatics ,8(4):213–229, jul 2010.[16] Stephen M Smith, Peter T Fox, Karla L Miller, David C Glahn, P Mickle Fox, Clare E Mackay, Nicola Filippini,Kate E Watkins, Roberto Toro, Angela R Laird, et al. Correspondence of the brain’s functional architectureduring activation and rest.

Proceedings of the National Academy of Sciences , 106(31):13040–13045, 2009.[17] Raimon HR Pruim, Maarten Mennes, Daan van Rooij, Alberto Llera, Jan K Buitelaar, and Christian F Beckmann.Ica-aroma: A robust ica-based strategy for removing motion artifacts from fmri data.

Neuroimage , 112:267–277,2015.[18] Johan Himberg, Aapo Hyvärinen, and Fabrizio Esposito. Validating the independent components of neuroimag-ing time series via clustering and visualization.

Neuroimage , 22(3):1214–1222, 2004.[19] Gustavo SP Pamplona, Bruno H Vieira, Frank Scharnowski, and Carlos EG Salmon. Personode: A toolbox forica map classiﬁcation and individualized roi deﬁnition.

Neuroinformatics , pages 1–11, 2020.[20] Zhi Yang, Stephen LaConte, Xuchu Weng, and Xiaoping Hu. Ranking and averaging independent componentanalysis by reproducibility (raicar).

Human brain mapping , 29(6):711–725, 2008.[21] Yi-Ou Li, Tülay Adalı, and Vince D Calhoun. Estimating the number of independent components for functionalmagnetic resonance imaging data.

Human brain mapping , 28(11):1251–1266, 2007.[22] I.T Jollife.

Principal Component Analysis, Second Edition . Springer-Verlag, 2002.[23] Keith J Worsley, Jen-I Chen, Jason Lerch, and Alan C Evans. Comparing functional connectivity via thresholdingcorrelations and singular value decomposition.

Philosophical Transactions of the Royal Society B: BiologicalSciences , 360(1457):913–920, 2005.[24] R Baumgartner, L Ryner, W Richter, R Summers, M Jarmasz, and R Somorjai. Comparison of two exploratorydata analysis methods for fmri: fuzzy clustering vs. principal component analysis.

Magnetic Resonance Imaging ,18(1):89–94, 2000.[25] Joseph B Kruskal. Multidimensional scaling by optimizing goodness of ﬁt to a nonmetric hypothesis.

Psychome-trika , 29(1):1–27, 1964.[26] Karl J Friston, Chris D Frith, Paul Fletcher, PF Liddle, and Richard SJ Frackowiak. Functional topography:multidimensional scaling and functional connectivity in the brain.

Cerebral cortex , 6(2):156–164, 1996.17

PREPRINT - M AY

27, 2020[27] Armin Iraji, Vince D Calhoun, Natalie M Wiseman, Esmaeil Davoodi-Bojd, Mohammad RN Avanaki, E MarkHaacke, and Zhifeng Kou. The connectivity domain: Analyzing resting state fmri data using feature-baseddata-driven and model-based methods.

Neuroimage , 134:494–507, 2016.[28] Roberto Viviani, Georg Grön, and Manfred Spitzer. Functional principal component analysis of fmri data.

Humanbrain mapping , 24(2):109–129, 2005.[29] Kaiming Li, Lei Guo, Jingxin Nie, Gang Li, and Tianming Liu. Review of methods for functional brain connec-tivity detection using fmri.

Computerized Medical Imaging and Graphics , 33(2):131–139, 2009.[30] Svetlana V Shinkareva, Jing Wang, and Douglas H Wedell. Examining similarity structure: multidimensionalscaling and related approaches in neuroimaging.

Computational and mathematical methods in medicine , 2013,2013.[31] Charidimos Tzagarakis, Trenton A Jerde, Scott M Lewis, Kâmil U˘gurbil, and Apostolos P Georgopoulos. Cere-bral cortical mechanisms of copying geometrical shapes: a multidimensional scaling analysis of fmri patterns ofactivation.

Experimental brain research , 194(3):369–380, 2009.[32] Alice J O’Toole, Fang Jiang, Hervé Abdi, Nils Pénard, Joseph P Dunlop, and Marc A Parent. Theoretical,statistical, and practical perspectives on pattern-based classiﬁcation approaches to the analysis of functionalneuroimaging data.

Journal of cognitive neuroscience , 19(11):1735–1752, 2007.[33] James V Haxby, M Ida Gobbini, Maura L Furey, Alumit Ishai, Jennifer L Schouten, and Pietro Pietrini.Distributed and overlapping representations of faces and objects in ventral temporal cortex.

Science ,293(5539):2425–2430, 2001.[34] Svetlana V Shinkareva, Vicente L Malave, Marcel Adam Just, and Tom M Mitchell. Exploring commonalitiesacross participants in the neural representation of objects.

Human brain mapping , 33(6):1375–1383, 2012.[35] Hans P Op de Beeck, Marijke Brants, Annelies Baeck, and Johan Wagemans. Distributed subordinate speciﬁcityfor bodies, faces, and buildings in human ventral visual cortex.

Neuroimage , 49(4):3414–3425, 2010.[36] Raymond Salvador, John Suckling, Martin R Coleman, John D Pickard, David Menon, and ED Bullmore. Neuro-physiological architecture of functional magnetic resonance images of human brain.

Cerebral cortex , 15(9):1332–1342, 2005.[37] Simon Benjaminsson, Peter Fransson, and Anders Lansner. A novel model-free data analysis technique basedon clustering in a mutual information space: application to resting-state fmri.

Frontiers in systems neuroscience ,4:34, 2010.[38] Pierre-Yves Hervé, Annick Razaﬁmandimby, Mathieu Vigneau, Bernard Mazoyer, and Nathalie Tzourio-Mazoyer. Disentangling the brain networks supporting affective speech comprehension.

NeuroImage ,61(4):1255–1267, 2012.[39] Amit Etkin and Tor D Wager. Functional neuroimaging of anxiety: a meta-analysis of emotional processing inptsd, social anxiety disorder, and speciﬁc phobia.

American Journal of Psychiatry , 164(10):1476–1488, 2007.[40] DE Welchew, GD Honey, Tonmoy Sharma, TW Robbins, and ET Bullmore. Multidimensional scaling of inte-grated neurocognitive function and schizophrenia as a disconnexion disorder.

NeuroImage , 17(3):1227–1239,2002.[41] David E Welchew, Chris Ashwin, Karim Berkouk, Raymond Salvador, John Suckling, Simon Baron-Cohen,and Ed Bullmore. Functional disconnectivity of the medial temporal lobe in asperger’s syndrome.

Biologicalpsychiatry , 57(9):991–998, 2005.[42] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. science ,290(5500):2323–2326, 2000.[43] Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric framework for nonlinear dimen-sionality reduction. science , 290(5500):2319–2323, 2000.[44] Ronald R Coifman and Stéphane Lafon. Diffusion maps.

Applied and computational harmonic analysis , 21(1):5–30, 2006.[45] Anqi Qiu, Annie Lee, Mingzhen Tan, and Moo K Chung. Manifold learning on brain functional networks inaging.

Medical image analysis , 20(1):52–60, 2015.[46] Hui Shen, Lubin Wang, Yadong Liu, and Dewen Hu. Discriminative analysis of resting-state functional con-nectivity patterns of schizophrenia using low dimensional embedding of fmri.

Neuroimage , 49(4):3110–3121,2010. 18

PREPRINT - M AY

27, 2020[47] Peter Mannfolk, Ronnie Wirestam, Markus Nilsson, Freddy Ståhlberg, and Johan Olsrud. Dimensionality reduc-tion of fmri time series data using locally linear embedding.

Magnetic Resonance Materials in Physics, Biologyand Medicine , 23(5-6):327–338, 2010.[48] Ariana Anderson and Mark Steven Cohen. Decreased small-world functional network connectivity and clusteringacross resting state networks in schizophrenia: an fmri classiﬁcation tutorial.

Frontiers in Human Neuroscience ,7:520, 2013.[49] Ariana Anderson, Ivo D Dinov, Jonathan E Sherin, Javier Quintana, Alan L Yuille, and Mark S Cohen. Classiﬁ-cation of spatially unaligned fmri scans. neuroimage , 49(3):2509–2519, 2010.[50] Koen V Haak, Andre F Marquand, and Christian F Beckmann. Connectopic mapping with resting-state fmri.

Neuroimage , 170:83–94, 2018.[51] Xilin Shen and François G Meyer. Analysis of event-related fmri data using diffusion maps. In

Biennial Interna-tional Conference on Information Processing in Medical Imaging , pages 652–663. Springer, 2005.[52] Tuomo Sipola, Fengyu Cong, Tapani Ristaniemi, Vinoo Alluri, Petri Toiviainen, Elvira Brattico, and Asoke KNandi. Diffusion map for clustering fmri spatial maps extracted by independent component analysis. In , pages 1–6. IEEE, 2013.[53] Wenzhao Lian, Ronen Talmon, Hitten Zaveri, Lawrence Carin, and Ronald Coifman. Multivariate time-seriesanalysis and diffusion maps.

Signal Processing , 116:13–28, 2015.[54] Dominique Duncan, Ronen Talmon, Hitten P Zaveri, and Ronald R Coifman. Identifying preseizure state inintracranial eeg data using diffusion kernels.

Math. Biosci. Eng , 10(3):579–590, 2013.[55] Constantinos Siettos and Jens Starke. Multiscale modeling of brain dynamics: from single neurons and networksto mathematical tools.

Wiley Interdisciplinary Reviews: Systems Biology and Medicine , 8(5):438–458, 2016.[56] Jonas Richiardi, Sophie Achard, Horst Bunke, and Dimitri Van De Ville. Machine learning with brain graphs:predictive modeling approaches for functional imaging in systems neuroscience.

IEEE Signal Processing Maga-zine , 30(3):58–70, 2013.[57] Vince D Calhoun, Jing Sui, Kent Kiehl, Jessica A Turner, Elena A Allen, and Godfrey Pearlson. Exploring thepsychosis functional connectome: aberrant intrinsic networks in schizophrenia and bipolar disorder.

Frontiers inpsychiatry , 2:75, 2012.[58] Andrew R Mayer, David Ruhl, Flannery Merideth, Josef Ling, Faith M Hanlon, Juan Bustillo, and Jose Cañive.Functional imaging of the hemodynamic sensory gating response in schizophrenia.

Human brain mapping ,34(9):2302–2312, 2013.[59] Muhammad Naveed Iqbal Qureshi, Jooyoung Oh, Dongrae Cho, Hang Joon Jo, and Boreom Lee. Multimodaldiscrimination of schizophrenia using hybrid weighted feature concatenation of brain functional connectivity andanatomical features with an extreme learning machine.

Frontiers in neuroinformatics , 11:59, 2017.[60] Mark Jenkinson, Peter Bannister, Michael Brady, and Stephen Smith. Improved optimization for the robust andaccurate linear registration and motion correction of brain images.

Neuroimage , 17(2):825–841, 2002.[61] Stephen M Smith. Fast robust automated brain extraction.

Human brain mapping , 17(3):143–155, 2002.[62] Christian F Beckmann and Stephen M Smith. Probabilistic independent component analysis for functional mag-netic resonance imaging.

IEEE transactions on medical imaging , 23(2):137–152, 2004.[63] Regina J Meszlényi, Petra Hermann, Krisztian Buza, Viktor Gál, and Zoltán Vidnyánszky. Resting state fmrifunctional connectivity analysis using dynamic time warping.

Frontiers in neuroscience , 11:75, 2017.[64] James S Hyde and Andrzej Jesmanowicz. Cross-correlation: an fmri signal-processing strategy.

NeuroImage ,62(2):848–851, 2012.[65] Archana Venkataraman, Koene RA Van Dijk, Randy L Buckner, and Polina Golland. Exploring functional con-nectivity in fmri via clustering. In

Proceedings of the... IEEE International Conference on Acoustics, Speech, andSignal Processing/sponsored by the Institute of Electrical and Electronics Engineers Signal Processing Society.ICASSP (Conference) , volume 2009, page 441. NIH Public Access, 2009.[66] Cyril Goutte, Peter Toft, Egill Rostrup, Finn Å Nielsen, and Lars Kai Hansen. On clustering fmri time series.

NeuroImage , 9(3):298–310, 1999.[67] Martijn P. van den Heuvel, Siemon C. de Lange, Andrew Zalesky, Caio Seguin, B.T. Thomas Yeo, and RubenSchmidt. Proportional thresholding in resting-state fMRI functional connectivity networks and consequences forpatient-control connectome studies: Issues and recommendations.

NeuroImage , 152:437–449, may 2017.19

PREPRINT - M AY

27, 2020[68] Rami F Algunaid, Ali H Algumaei, Muhammad A Rushdi, and Inas A Yassine. Schizophrenic patient iden-tiﬁcation using graph-theoretic features of resting-state fmri data.

Biomedical Signal Processing and Control ,43:289–299, 2018.[69] R Core Team.

R: A language and environment for statistical computing. Vienna, Austria: R Foundation forStatistical Computing; 2014 , 2014.[70] Edsger W Dijkstra. A note on two problems in connexion with graphs.

Numerische mathematik , 1(1):269–271,1959.[71] Jari Oksanen, Roeland Kindt, Pierre Legendre, Bob O’Hara, M Henry H Stevens, Maintainer Jari Oksanen, andMASS Suggests. The vegan package.

Community ecology package , 10:631–637, 2007.[72] Boaz Nadler, Stephane Lafon, Ioannis Kevrekidis, and Ronald R Coifman. Diffusion maps, spectral clusteringand eigenfunctions of fokker-planck operators. In

Advances in neural information processing systems , pages955–962, 2006.[73] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.

Neural computation , 15(6):1373–1396, 2003.[74] Amit Singer, Radek Erban, Ioannis G Kevrekidis, and Ronald R Coifman. Detecting intrinsic slow variables instochastic dynamical systems by anisotropic diffusion maps.

Proceedings of the National Academy of Sciences ,106(38):16090–16095, 2009.[75] J De la Porte, BM Herbst, W Hereman, and SJ Van Der Walt. An introduction to diffusion maps. In

Proceedingsof the 19th Symposium of the Pattern Recognition Association of South Africa (PRASA 2008), Cape Town, SouthAfrica , pages 15–25, 2008.[76] Boaz Nadler, Stephane Lafon, Ronald Coifman, and Ioannis G Kevrekidis. Diffusion maps-a probabilistic in-terpretation for spectral embedding and clustering algorithms. In

Principal manifolds for data visualization anddimension reduction , pages 238–260. Springer, 2008.[77] Joseph Richards. Diffusion map.

R package version , pages 1–1, 2014.[78] Harry Strange and Reyer Zwiggelaar.

Open Problems in Spectral Dimensionality Reduction . Springer, 2014.[79] Lawrence K Saul, Kilian Q Weinberger, Jihun H Ham, Fei Sha, and Daniel D Lee. Spectral methods for dimen-sionality reduction.

Semisupervised learning , pages 293–308, 2006.[80] Cornelis J Stam and Jaap C Reijneveld. Graph theoretical analysis of complex networks in the brain.

Nonlinearbiomedical physics , 1(1):3, 2007.[81] Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations.

Neuroimage , 52(3):1059–1069, 2010.[82] Gabor Csardi and Tamas Nepusz. The igraph software package for complex network research.

InterJournal,Complex Systems , 1695(5):1–9, 2006.[83] Max Kuhn et al. Building predictive models in r using the caret package.

Journal of statistical software , 28(5):1–26, 2008.[84] Brian D Ripley.

Pattern recognition and neural networks . Cambridge university press, 2007.[85] Robert Hecht-Nielsen. Theory of the backpropagation neural network. In

Neural networks for perception , pages65–93. Elsevier, 1992.[86] Anders Krogh and John A Hertz. A simple weight decay can improve generalization. In

Advances in neuralinformation processing systems , pages 950–957, 1992.[87] Brian Ripley and William Venables. nnet: Feed-forward neural networks and multinomial log-linear models.