[PDF] Statistical comparison of (brain) networks

Abstract

The study of random networks in a neuroscientific context has developed extensively over the last couple of decades. By contrast, techniques for the statistical analysis of these networks are less developed. In this paper, we focus on the statistical comparison of brain networks in a nonparametric framework and discuss the associated detection and identification problems. We tested network differences between groups with an analysis of variance (ANOVA) test we developed specifically for networks. We also propose and analyse the behaviour of a new statistical procedure designed to identify different subnetworks. As an example, we show the application of this tool in resting-state fMRI data obtained from the Human Connectome Project. Finally, we discuss the potential bias in neuroimaging findings that is generated by some behavioural and brain structure variables. Our method can also be applied to other kind of networks such as protein interaction networks, gene networks or social networks.

Full PDF

SStatistical comparison of brain networks

Daniel Fraiman , and Ricardo Fraiman Departamento de Matem´atica y Ciencias, Universidad de San Andr´es, Buenos Aires,Argentina.e-mail: [email protected] Consejo Nacional de Investigaciones Cient´ıﬁcas y Tecnol´ogicas, Buenos Aires, Argentina. Centro de Matem´atica, Facultad de Ciencias, Universidad de la Rep´ublica, Uruguay.e-mail: [email protected]

Abstract:

The study of brain networks has developed extensively over thelast couple of decades. By contrast, techniques for the statistical analysis ofthese networks are less developed. In this paper, we focus on the statisticalcomparison of brain networks in a nonparametric framework and discuss theassociated detection and identiﬁcation problems. We tested network diﬀer-ences between groups with an analysis of variance (ANOVA) test we devel-oped speciﬁcally for networks. We also propose and analyse the behaviourof a new statistical procedure designed to identify diﬀerent subnetworks.As an example, we show the application of this tool in resting-state fMRIdata obtained from the Human Connectome Project. Finally, we discussthe potential bias in neuroimaging ﬁndings that is generated by some be-havioural and brain structure variables. Our method can also be applied toother kind of networks such as protein interaction networks, gene networksor social networks.

Keywords and phrases:

Brain networks, Statistics of networks, ANOVAtest, Detection and identiﬁcation.

1. Introduction

Understanding how individual neurons, groups of neurons and brain regionsconnect is a fundamental issue in neuroscience. Imaging and electrophysiologyhave allowed researchers to investigate this issue at diﬀerent brain scales. Atthe macroscale, the study of brain connectivity is dominated by MRI, whichis the main technique used to study how diﬀerent brain regions connect andcommunicate. Researchers use diﬀerent experimental protocols in an attemptto describe the true brain networks of individuals with disorders as well asthose of healthy individuals. Understanding resting state networks is crucialfor understanding modiﬁed networks, such as those involved in emotion, pain,motor learning, memory, reward processing, and cognitive development, etc.Comparing brain networks accurately can also lead to the precise early diagnosisof neuropsychiatric and neurological disorders [1, 2]. Rigorous mathematicalmethods are needed to conduct such comparisons.Currently, the two main techniques used to measure brain networks at thewhole brain scale are Diﬀusion Tensor Imaging (DTI) and resting-state func-tional magnetic resonance imaging (rs-fMRI). In DTI, large white-matter ﬁbres a r X i v : . [ q - b i o . N C ] J u l raiman & Fraiman/Statistics of brain networks are measured to create a connectional neuroanatomy brain network, while in rs-fMRI, functional connections are inferred by measuring the BOLD activity ateach voxel and creating a whole brain functional network based on functionally-connected voxels (i.e., those with similar behaviour). Despite technical limita-tions, both techniques are routinely used to provide a structural and dynamicexplanation for some aspects of human brain function. These magnetic reso-nance neuroimages are typically analysed by applying network theory [3, 4],which has gained considerable attention for the analysis of brain data over thelast 10 years.The space of networks with as few as 10 nodes (brain regions) contains asmany as 10 diﬀerent networks. Thus, one can imagine the number of networksif one analyses brain network populations (e.g. healthy and unhealthy) with, say,50 nodes. However, most studies currently report data with few subjects, andthe neuroscience community has recently begun to address this issue [5, 6, 7] andquestion the reproducibility of such ﬁndings [8, 9, 10]. In this work, we presenta novel tool for comparing samples of brain networks. This study contributes toa fast-growing area of research: network statistics of network samples [11, 12,13, 14].We organized the paper as follows: In Section I, we present the method forcomparing brain networks and identifying network diﬀerences that works welleven with small samples. In Section II, we present an example that illustrates ingreater detail the concept of comparing networks. Next, we apply the method toresting-state fMRI data from the Human Connectome Project and discuss thepotential biases generated by some behavioural and brain structural variables.Finally, in Section III, we discuss possible improvements, the impact of samplesize, and the eﬀects of confounding variables.

2. Network Theory Framework

A network (or graph), denoted by G = ( V, E ), is an object described by a set V of nodes (vertices) and a set E ⊂ V × V of links (edges) between them. In whatfollows, we consider families of networks deﬁned over the same ﬁxed ﬁnite setof n nodes (brain regions). A network is completely described by its adjacencymatrix A ∈ { , } n × n , where A ( i, j ) = 1 if and only if the link ( i, j ) ∈ E . Ifthe matrix A is symmetric, then the graph is undirected; otherwise, we have adirected graph.Let us consider the brain networks of two speciﬁc individuals who will mostlikely diﬀer from each other to some degree. If we randomly choose a personfrom any given population, what we obtain is a random network. This randomnetwork, G , will have a given probability of being network G , another proba-bility of being network G , and so on until G ˜ n . Therefore, a random network iscompletely characterized by its probability law, p k := P ( G = G k ) for all k ∈ { , . . . , ˜ n } . Likewise, a random variable is also completely characterized by its probabilitylaw. In this case, the most common test for comparing many subpopulations raiman & Fraiman/Statistics of brain networks is the analysis of variance test (ANOVA). This test rejects the null hypothesisof equal means if the averages are statistically diﬀerent. Here, we propose anANOVA test designed speciﬁcally to compare networks.The ﬁrst step for comparing networks is to deﬁne a distance or metric betweenthem. Given two networks G , G we consider the most classical distance, theedit distance [15] deﬁned as d ( G , G ) = (cid:88) i

Given a sample of networks { G , . . . , G l } (a) The average distance around a graph H is deﬁned as d G ( H ) := (cid:96) (cid:96) (cid:80) k =1 d ( G k , H ) , which corresponds to the mean population distance ˜ d G ( H ) = ˜ n (cid:80) k =1 d ( G k , H ) p k . (b) The average weighted “network” M has adjacency matrix A M ( i, j ) = l (cid:96) (cid:80) k =1 A G k ( i, j ),which in terms of the population version corresponds to the mean matrix ˜ M ( i, j ) = E ( A G ( i, j )) =: p ij .With these deﬁnitions in mind, the natural way to deﬁne a measure of networkvariability is σ := d G ( M ) , ˜ σ = ˜ d G ( ˜ M ) , which measures the average distance (variability) of the networks around theaverage weighted network. Detection.

Now we address the testing problem. Let G , G , . . . , G n denotethe networks from subpopulation 1, G , G , . . . , G n the ones from subpopu-lation 2, and so on until G m , G , . . . , G mn m the networks of subpopulation m .Let G , G , . . . , G n denote, without superscript, the complete pooled sample ofnetworks, where n = (cid:80) mi =1 n i . And ﬁnally, let M i and σ i denote the averagenetwork and the variability of the i -subpopulation of networks. We want to test(H ) H : ˜ M = ˜ M = · · · = ˜ M m (2.2)that all the subpopulations have the same mean network, under the alternativethat at least one subpopulation has a diﬀerent mean network. raiman & Fraiman/Statistics of brain networks It is interesting to note that for objects that are networks, the average network( M ) and the variability ( σ ) are not independent summary measures. In fact,the relationship between them is given by σ = 2 (cid:88) i

Under the null hypothesis, the T statistic fulﬁlls i) and ii), while T is sensitive to the alternative hypothesis, and iii) holds true.i) E ( T ) = 0 .ii) T is asymptotically ( K := min { n , n , .., n m } → ∞ ) Normal(0,1).iii) Under the alternative hypothesis, T will be smaller than any negative valueif K is large enough (The test is consistent). This theorem provides a procedure for testing whether two or more groupsof networks are diﬀerent. Although having a procedure like the one describedis important, we not only want to detect network diﬀerences, we also want toidentify the speciﬁc network changes or diﬀerences. We discuss this issue next.

Identiﬁcation.

Let us suppose that the ANOVA test for networks rejectsthe null hypothesis, and now the main goal is to identify network diﬀerences.Two main objectives are discussed:(a) Identiﬁcation of all the links that show statistical diﬀerences betweengroups.(b) Identiﬁcation of a set of nodes (a subnetwork) that present the highestnetwork diﬀerences between groups.The identiﬁcation procedure we describe below aims to eliminate the noise(links or nodes without diﬀerences between subpopulations) while keeping thesignal (links or nodes with diﬀerences between subpopulations).Given a network G = ( V, E ) and a subset of links ˜ E ⊂ E , let us genericallydenote G ˜ E the subnetwork with the same nodes but with links identiﬁed bythe set ˜ E . The rest of the links are erased. Given a subset of nodes ˜ V ⊂ V let raiman & Fraiman/Statistics of brain networks us denote G ˜ V the subnetwork that only has the nodes (with the links betweenthem) identiﬁed by the set ˜ V . The T statistic for the sample of networks withonly the set of ˜ E links is denoted by T ˜ E , and the T statistic computed for allthe sample networks with only the nodes that belong to ˜ V is denoted by T ˜ V .The procedure we propose for identifying all the links that show statisticaldiﬀerences between groups is based on the minimization for ˜ E ⊂ E of T ˜ E . Theset of links, ¯ E , deﬁned by ¯ E ≡ arg min ˜ E ⊂ E T ˜ E (2.4)contain all the links that show statistical diﬀerences between subpopulations.One limitation of this identiﬁcation procedure is that the space E is huge ( E =2 n ( n − / where n is the number of nodes) and an eﬃcient algorithm is neededto ﬁnd the minimum. That is why we focus on identifying a group of nodes (ora subnetwork) expressing the largest diﬀerences.The procedure proposed for identifying the subnetwork with the highest sta-tistical diﬀerences between groups is similar to the previous one. It is based onthe minimization of T ˜ V . The set of nodes, N , deﬁned by N ≡ arg min ˜ V ∈ V T ˜ V , (2.5)contains all relevant nodes. These nodes make up the subnetwork with thelargest diﬀerence between groups. In this case, the complexity is smaller, sincethe space V is not so big ( V = 2 n − n − j := N of the number ofnodes in the true subnetwork is a diﬃcult problem due to possible overestimationof noisy data. The advantage of knowing ˜ j is that it reduces the computationalcomplexity for ﬁnding the minimum to an order of n ˜ j instead of 2 n if we have tolook for all possible sizes. However, the problem in our setup is less severe thanother cases since the objective function ( T ˜ V ) is not monotonic when the size ofthe space increases. To solve this problem, we suggest the following algorithm.Let V { j } be the space of networks with j distinguishable nodes, j ∈ { , , . . . , n } and V = ∪ j V { j } . The nodes N j N j ≡ arg min ˜ V ∈ V { j } T ˜ V , with T j ≡ min ˜ V ∈ V { j } T ˜ V (2.6)deﬁne a subnetwork. In order to ﬁnd the true subnetwork with diﬀerences be-tween the groups, we now study the sequence T , T , . . . , T n . We continue withthe search (increasing j ) until we ﬁnd ˜ j fulﬁlling˜ j ≡ max { j ∈ { , , . . . , n } : T j − T j − < − g (sample size) } , where g is a positive function that decreases together with the sample size (inpractice, a real value). N ˜ j are the nodes that make up the subnetwork with thelargest diﬀerences among the groups or subpopulations studied. raiman & Fraiman/Statistics of brain networks It is important to mention that the procedures described above do not imposeany assumption regarding the real connectivity diﬀerences between the popula-tions. With additional hypothesis the procedure can be improved. For instance,in [14, 25] the authors proposed a methodology for the edge-identiﬁcation prob-lem that is powerful when the real diﬀerence connection between the populationsform large unique connected component.

3. Examples and Applications

Example 1.

Let us suppose we have three groups of subjects with equal sam-ple size, K , and the brain network of each subject is studied using 16 regions(electrodes or voxels). Studies show connectivity between certain brain regions isdiﬀerent in certain neuropathologies, in aging, under the inﬂuence of psychedelicdrugs, and more recently, in motor learning [19, 20]. Recently, we have shownthat a simple way to study connectivity is by what the physics community calls“the correlation function” [21]. This function describes the correlation betweenregions as a function of the distance between them. Although there exist longrange connections, on average, regions (voxels or electrodes) closer to each otherinteract strongly, while distant ones interact more weakly. We have shown thatthe way in which this function decays with distance is a marker of certain dis-eases [22, 23, 24]. For example, patients with a traumatic brachial plexus lesionwith root avulsions revealed a faster correlation decay as a function of distancein the primary motor cortex region corresponding to the arm [23].Next we present a toy model for analyzing the performance of the methodol-ogy. In a network context, the behaviour described above can be modeled in thefollowing way: since the probability that two regions are connected is a mono-tonic function of the correlation between them (i.e. on average, distant regionsshare fewer links than nearby regions) we decided to skip the correlations andmodel directly the link probability as an exponential function that decays withdistance. We assume that the probability that region i is connected with j isdeﬁned as P ( i ↔ j ) = e − λ d ( i,j ) , where d ( i, j ) is the distance between regions i and j . For the alternative hypoth-esis, we consider that there are six frontal brain regions (see Fig. 1 Panel A) thatinteract with a diﬀerent decay rate in each of the three subpopulations. Figure1 panel (A) shows the 16 regions analysed on an x-y scale. Panel (B) showsthe link probability function for all electrodes and for each subpopulation. Asshown, there is a slight diﬀerence between the decay of the interactions betweenthe frontal electrodes in each subpopulation ( λ = 1, λ = 0 . λ = 0 . K ). K networks were computed for each of the raiman & Fraiman/Statistics of brain networks Fig 1:

Detection problem (A) Diagram of the brain regions on an x-y scaleand the link probability. The three groups conﬁrm the equation P ( ◦ ↔ • ) = P ( • ↔ • ) = e − d . (B) Link probability of frontal electrodes, P ( ◦ ↔ ◦ ), as afunction of the distance for the three subpopulations. (C) Power of the tests asa function of sample size, K . Both tests are presented. raiman & Fraiman/Statistics of brain networks Fig 2:

Null hypothesis. (A) Histogram of T statistics for K = 30. (B) Percent-age of Type I Error as a function of sample size, K . Both tests are presented.three subpopulations and the T statistic was computed for each of 10,000 repli-cates. The proportion of replicates with a T value smaller than -1.65 is anestimation of the power of the test for a signiﬁcance level of 0.05 (unilateral hy-pothesis testing). Star symbols in Figure 1 C represent the power of the test forthe diﬀerent sample sizes. For example, for a sample size of 100, the test detectsthis small diﬀerence between the networks 100% of the time. As expected, thetest has less power for small sample sizes, and if we change the values λ and λ in the model to 0.66 and 0.5, respectively, power increases. In this last case,the power changed from 64% to 96% for a sample size of 30 (see Appendix, Fig.A1 for the complete behaviour).To the best of our knowledge, the T statistic is the ﬁrst proposal of anANOVA test for networks. Thus, here we compare it with a naive test whereeach individual link is compared among the subpopulations. The procedureis as follows: for each link, we calculate a test for equal proportions betweenthe three groups to obtain a p-value for each link. Since we are conductingmultiple comparisons, we apply the Benjamini-Hochberg procedure controllingat a signiﬁcance level of α = 0 .

05. The procedure is as follows:1. Compute the p-value of each link comparison, pv , pv , . . . , pv m .2. Find the j largest p-value such that pv ( j ) ≤ jm α.

3. Declare that the link probability is diﬀerent for all links that have a p-value ≤ pv ( j ) .This procedure detects diﬀerences in the individual links while controlling formultiple comparisons. Finally, we consider the networks as being diﬀerent ifat least one link (of the 15 that have real diﬀerences) was detected to havesigniﬁcant diﬀerences. We will call this procedure the “Links Test”. Crosses inFig. 1 C correspond to the power of this test as a function of the sample size.As can be observed, the test proposed for testing equal mean networks is muchmore powerful than the previous test.Theorem 1 states that T is asymptotically (sample size → ∞ ) Normal(0,1) raiman & Fraiman/Statistics of brain networks under the Null hypothesis. Next we investigated how large the sample size mustbe to obtain a good approximation. Moreover, we applied Theorem 1 in thesimulations above for K = { , , , } , but we did not show that the ap-proximation is valid for K = 30, for example. Here, we show that the normalapproximation is valid even for K = 30 in the case of 16-node networks. Wesimulated 10,000 replicates of the model considering that all three groups haveexactly the same probability law given by group 1, i.e. all brain connectionsconﬁrm the equation P ( i ↔ j ) = e − λ d ( i,j ) for the three groups (H hypothe-sis). The T value is computed for each replicate of sample size K = 30, and thedistribution is shown in Fig. 2 (A). The histogram shows that the distributionis very close to normal. Moreover, the Kolmogorov-Smirnov test against a nor-mal distribution did not reject the hypothesis of a normal distribution for theT statistic (p-value=0.52). For sample sizes smaller than 30, the distributionhas more variance. For example, for K = 10, the standard deviation of T is 1.1instead of 1 (see Appendix, Fig. A2). This deviation from a normal distribu-tion can also be observed in panel B where we show the percentage of Type Ierrors as a function of the sample size ( K ). For sample sizes smaller than 30,this percentage is slightly greater than 5%, which is consistent with a varianceis greater than 1. The Links test procedure yielded a Type I error percentagesmaller than 5% for small sample sizes.Finally, we applied the subnetwork identiﬁcation procedure described beforeto this example. Fifty simulations were performed for the model with a samplesize of K = 100. For each replication, the minimum statistic T j was studied asa function of the number of j nodes in the subnetwork. Figures 3 A and B showtwo of the 50 simulation outcomes for the T j function of ( j ) number of nodes.Panel A shows that as nodes are incorporated into the subnetwork, the statisticsharply decreases to six nodes, and further incorporating nodes produces a verysmall decay in T j in the region between six and nine nodes. Finally, adding evenmore nodes results in a statistical increase. A similar behaviour is observed inthe simulation shown in panel B, but the “change point” appears for a numberof nodes equal to ﬁve. If we deﬁne that the number of nodes with diﬀerences, ˜ j ,conﬁrms ˜ j ≡ max { j ∈ { , , . . . , n } : T j − T j − < − . } , we obtain the values circled. For each of the 50 simulations, we studied the value˜ j and a histogram of the results is shown in Panel C. With the criteria deﬁned,most of the simulations (85%) result in a subnetwork of 6 nodes, as expected.Moreover, these 6 nodes correspond to the real subnetwork with diﬀerencesbetween subpopulations (white nodes in Fig. 1 A). This was observed in 100%of simulations with ˜ j =6 (blue circles in Panel D). In the simulations where thisvalue was 5, ﬁve of the six true nodes were identiﬁed, and ﬁve of the six nodeswith diﬀerences vary between simulations (represented with grey circles in PanelD). For the simulations where ˜ j =7, all six real nodes were identiﬁed and a falsenode (grey circle) that changed between simulations was identiﬁed as being partof the subnetwork with diﬀerences.The identiﬁcation procedure was also studied for a smaller sample size of raiman & Fraiman/Statistics of brain networks Fig 3:

Identiﬁcation problem (A-B) Statistic T j as a function of the number ofnodes of the subnetwork ( j ) for two simulations. Blue circles represent the value˜ j following the criteria described in the text. (C) Histogram of the number ofsubnetwork nodes showing diﬀerences, ˜ j . (D) Identiﬁcation of the nodes. Blueand grey circles represent the nodes identiﬁed from the set N ˜ j . Circled bluenodes are those identiﬁed 100% of the time. Grey circles represent nodes thatare identiﬁed some of the time. On the left, grey circles alternate between the sixwhite nodes. On the right, the grey circle alternates between the black nodes. raiman & Fraiman/Statistics of brain networks K = 30, and in this case, the real subnetwork was identiﬁed only 28% of thetime (see Appendix Fig. A3 for more details). Identifying the correct subnetworkis more diﬃcult (larger sample sizes are needed) than detecting global diﬀerencesbetween group networks. In this section, we analysed resting-state fMRI data from the 900 participants inthe 2015 Human Connectome Project (HCP [26]). We included data from the820 healthy participants who had four complete 15-minute rs-fMRI runs, fora total of one hour of brain activity. We partitioned the 820 participants intothree subgroups and studied the diﬀerences between the brain groups. Clearlyif the participants are randomly partitioned no brain subgroup diﬀerences is ex-pected, but if the participants are partitioned in an intentional way diﬀerencescan appear. For example, if we partitioned the 820 by the amount of hours spentsleep the night previous to the scan ( G less than 6 hours, G between 6 and7 hours, and G more than 7) it is expected [27, 28] to observe diﬀerences intheir brain connectivity in the day of the scan. Moreover, as by-product, we ob-tain that this variable is an important factoring variable to be controlled beforethe scan. Fortunately, HCP provides interesting individual socio-demographic,behavioural and structural brain data to make possible this analysis. Moreover,using a previous release of the HCP data (461 subjects), Smith et al. [29] usingmultivariate analysis (canonical correlation) showed that a linear combination ofdemographics and behavior variables highly correlates with a linear combinationof functional interactions between brain parcellations (obtained by IndependentComponent Analysis). Our approach has the same spirit, but has some diﬀer-ences. In our case the main objective is to identify non-imaging variables that“explain” (that are dependent with) the individual brain network. We do notimpose a linear relationship between non-imaging and imaging variables, andwe study the brain network as a whole object without diﬀerent “loads” in eachedge.Data were processed by HCP [30, 31, 32], yielding the following outputs:1. Group-average brain regional parcellations obtained by means of group-Independent Component Analysis (ICA [33]). Fifteen components are de-scribed.2. Subject-speciﬁc time series per ICA component.Figure 4 (A) shows three of the 15 ICA components with the speciﬁc onehour time series for a particular subject. These signals were used to construct anassociation matrix between pairs of ICA components per subject. This matrixrepresents the strength of the association between each pair of components,which can be quantiﬁed by diﬀerent functional coupling metrics, such as thePearson correlation coeﬃcient between the signals of the component, which weadopted in the present study (panel (B)). For each of the 820 subjects, westudied functional connectivity by transforming each correlation matrix, Σ, into raiman & Fraiman/Statistics of brain networks Fig 4: (A) ICA components and their corresponding time series. (B) Correlationmatrix of the time series. (C) Network representation. The links correspond tothe nine highest correlations. raiman & Fraiman/Statistics of brain networks binary matrices or networks, G , (panel (C)). Two criteria for this transformationwere used [34, 35, 36]: a ﬁxed correlation threshold and a ﬁxed number of linkscriterion. In the ﬁrst criterion, the matrix was thresholded by a value ρ aﬀordingnetworks with varying numbers of links. In the second, a ﬁxed number of linkcriteria were established and a speciﬁc threshold was chosen for each subject.Additionally, the HCP provides interesting individual socio-demographic, be-havioural and structural brain data. Variables are grouped into seven main cat-egories: alertness, motor response, cognition, emotion, personality, sensory, andbrain anatomy. Volume, thickness and areas of diﬀerent brain regions were com-puted using the T1-weighted images of each subject in Free Surfer [37]. Thus,for each subject, we obtained a brain functional network, G , and a multivariatevector X that contains this last piece of information.The main focus of this section is to analyse the “impact” of each of thesevariables ( X ) on the brain networks (i.e., on brain activity). To this end, weﬁrst selected a variable such as k , X k , and grouped each subject according to itsvalue into only one of three categories (Low, Medium, or High) just by placingthe values in order and using the 33.3% percentile. In this way, we obtained threegroups of subjects, each identiﬁed by its correlation matrix Σ L , Σ L , . . . , Σ Ln L , ,Σ M , Σ M , . . . , Σ Mn M , , and Σ H , Σ H , . . . , Σ Hn H , or by their corresponding network(once the criteria and the parameter are chosen) G L , G L , . . . , G Ln L , G M , G M , . . . , G Mn M ,and G H , G H , . . . , G Hn H . The sample size of each group ( n L , n M , and n H ) is ap-proximately 1/3 of 820, except in cases where there were ties. Once we obtainedthese three sets of networks, we applied the developed test. If diﬀerences existbetween all three groups, then we are conﬁrming an interdependence betweenthe factoring variable and the functional networks. However, we cannot yet elu-cidate directionality (i.e., diﬀerent networks lead to diﬀerent sleeping patternsor vice versa?).After ﬁltering the data, we identiﬁed 221 variables with 100% complete in-formation for the 820 subjects. We applied the network ANOVA test for each ofthese 221 variables and report the T statistic. Figure 5 (A) shows the T statisticfor the variable Thickness of the right Inferior Parietal region. All values of the T statistic were veriﬁed to be between -2 and 2, which is the case for all ρ valuesusing the ﬁxed correlation criterion (left panel) for constructing the networks.The same occurs when a ﬁxed number of link criteria are used (right panel).According to Theorem 1, when there are no diﬀerences between groups, T isasymptotically normal (0,1), and therefore a value smaller than -3 is very un-likely (p-value = 0.00135). In panel (B), we show the T statistic for the variable Amount of hours spent sleep the night previous to the scan which correspondsto the alertness category. As one can see, most T values are much lower than -3.Importantly, this shows that the number of hours a person sleeps before the scanis associated with their brain functional networks (or brain activity). However,as explained above, we do not know whether the number of hours slept thenight before the scan represent these individuals’ habitual sleeping patterns,complicating any eﬀort to infer causation. In other words, six hours of sleepfor an individual who habitually sleeps six hours may not produce the samenetwork pattern as six hours in an individual who normally sleeps eight hours raiman & Fraiman/Statistics of brain networks (and is likely tired during the scan). Alternatively, diﬀerent activity observedduring waking hours may “produce” diﬀerent sleep behaviours. Nevertheless,we know that the amount of hours slept the night before the scan should bemeasured when scanning a subject. In Panel (C), we show that brain volumetricvariables can also inﬂuence resting-state fMRI networks. In this panel we showthe T value for the variable Area of the left Middle temporal region. Signiﬁcantdiﬀerences for both network criteria are also observed for this variable.Under the hypothesis of equal mean networks between groups, we expectednot to obtain a T statistic less than -3 when comparing the sample networks.However, we tested several diﬀerent thresholds and numbers of links, generatingsets of networks that are dependent on each criterion and between criteria. Thisfact makes the statistical inference more diﬃcult. To address this problem, wedecided to deﬁne a new statistic based on T , W , and study its distributionusing the bootstrap resampling technique. The new statistic is deﬁned as, W = min { ∆ ρ + , ∆ ρ − , ∆ L + , ∆ L − } , where ∆ is the number of values of T that are lower than -3 for the resolution(grid of thresholds) studied. The supraindex in ∆ indicates the criteria (cor-relation threshold, ρ or number of links ﬁxed, L ) and the subindex indicateswhether it is for positive or negative parameter values ( ρ or number of links).For example, Fig. 5 (C) reveals that the variable Area of the left Middle tem-poral conﬁrms having ∆ ρ + = 10, ∆ ρ − = 10, ∆ L + = 9, and ∆ L − = 9, and therefore W = 9. The distribution of W under the null hypothesis is studied numerically.Ten thousand random resamplings of the real networks were selected and the W statistic was computed for each one. Figure 5 (D) shows the W empiricaldistribution (under the null hypothesis) with black bars. Most W values arezero, as expected. In this ﬁgure, the W values of the three variables describedare also represented by dots. The extreme values of W for the variables Amountof Sleep and Middle Temporal Area L conﬁrm that these diﬀerences are not amatter of chance. Both variables are related to brain network connectivity.So far we have shown, among other things, that functional networks diﬀerbetween individuals who get more or fewer hours of sleep, but how do thesenetworks diﬀer exactly? Figure 6 (A) shows the average networks for the threegroups of subjects. There are diﬀerences in connectivity strength between someof the nodes (ICA components). These diﬀerences are more evident in panel (B),which presents a weighted network Ψ with links showing the variability amongthe subpopulation’s average networks. This weighted network is deﬁned asΨ( i, j ) = 13 (cid:88) s =1 |M grp s( i, j ) − M ( i, j ) | , where M ( i, j ) =

13 3 (cid:80) s =1 M grp s. The role of Ψ is to highlight the diﬀerencesbetween the mean networks. The greatest diﬀerence is observed between nodes1 and 11. Individuals that sleep 6.5 hours or less show the strongest connection raiman & Fraiman/Statistics of brain networks Fig 5: (A-C) T –statistics as a function of (left panel) ρ and (right panel) thenumber of links for three variables: (A) Right Inferioparietal Thickness, (B)Number of hours slept the night prior to the scan. (C) Left Middle temporalArea. (D) W -statistic distribution (black bars) based on a bootstrap strategy.The W -statistic of the three variables studied is depicted with dots. raiman & Fraiman/Statistics of brain networks Fig 6: (A) Average network for each subgroup deﬁned by hours of sleep (B)Weighted network with links that represent the diﬀerences among the subpop-ulation mean networks. (C) T j -statistic as a function of the number of nodes ineach subnetwork ( j ). The nodes identiﬁed by the minimum T j are presented inthe boxes, while the number of nodes identiﬁed by the procedure are representedwith a red circle. (D) Nodes from the identiﬁed subnetwork are circled in blue.The nodes identiﬁed in (D) correspond to those in panel (B). raiman & Fraiman/Statistics of brain networks between ICA component number 1 (which corresponds to the occipital pole andthe cuneal cortex in the occipital lobe) and ICA component number 11 (whichincludes the middle and superior frontal gyri in the frontal lobe, the superiorparietal lobule and the angular gyrus in the parietal lobe). Another importantconnection that diﬀers between groups is the one between ICA components 1and 8, which corresponds to the anterior and posterior lobes of the cerebellum.Using the subnetwork identiﬁcation procedure previously described (see Fig.6 C) we identiﬁed a 7-node subnetwork as the most signiﬁcant for networkdiﬀerences. The nodes that make up that network are presented in panel D.The results described above refer to only three of the 221 variables we anal-ysed. In terms of the remaining variables, we observed more variables that par-titioned the subjects into groups presenting statistical diﬀerences between thecorresponding brain networks. The complete list of these variables is shown inSI Tables 1 and 2. For each variable, we calculated W as well as W and W ,which counts the number of T statistics lower than -4 and -5, respectively. Usinga resampled bootstrap, we obtained empirical probabilities P ( W ≥

1) = 0 (lessthan 1/10000) and P ( W = 1) = 1 / W values of 2 orgreater are shown in SI Table 1 and variables with W values of 1 are shownin SI Table 2. We identiﬁed 30 diﬀerent brain volumetric variables as havingdiﬀerent brain functional networks. However, it is important to note that thesevariables are largely dependent on each other; for example, individuals withlarger inferior-temporal areas often have a greater supratentorial volume, andso on (see SI Fig. A4). The variable that diﬀered most between groups with thehighest diﬀerence in brain networks is right Inferiortemporal area .

4. Discussion

Performing statistical inference on brain networks is important in neuroimag-ing. In this paper, we presented a new method for comparing anatomical andfunctional brain networks of two or more subgroups of subjects. Two problemswere studied: the detection of diﬀerences between the groups and the identiﬁ-cation of the speciﬁc network diﬀerences. For the ﬁrst problem, we developedan ANOVA test based on the distance between networks. This test performedwell in terms of detecting existing diﬀerences (high statistical power). Finally,based on the statistics developed for the testing problem, we proposed a way ofsolving the identiﬁcation problem. On the following, we discuss our ﬁndings.

Based on the minimization of the T statistic, we propose a method for identifyingthe subnetwork that diﬀers among the subgroups. This subnetwork is extremelyuseful. On the one hand, it allows us to understand which brain regions areinvolved in the speciﬁc comparison study, and on the other, it allows us toidentify/diagnose new subjects with greater accuracy. raiman & Fraiman/Statistics of brain networks The relationship between the minimum T value for a ﬁxed number of nodesas a function of the number of nodes ( T j vs. j ) is very informative. A largedecrease in T j incorporating a new node into the subnetwork ( T j +1 << T j )means that the new node and its connections explain much of the diﬀerencebetween groups. A very small decrease shows that the new node explains onlysome of the diﬀerence because either the subgroup diﬀerence is small for theconnections of the new node, or because there is a problem of overestimation.The correct number of nodes in each subnetwork must verify˜ j ≡ max { j ∈ { , , . . . , n } : T j − T j − < − g (sample size) } In this paper, we present ad hoc criteria in each example (a certain constantfor g (sample size)) and we do not give a general formula for g (sample size).We believe that this could be improved in theory, but in practice, one canpropose a natural way to deﬁne the upper bound and subsequently identifythe subnetwork, as we showed in the two examples and in the application byobserving T j as a function of j . What is the adequate sample size for comparing brain networks? This is typi-cally the ﬁrst question in any comparison study. Clearly, the response dependson the magnitude of the network diﬀerences between the groups. If the subpop-ulations diﬀer greatly, then 30 networks in each group is enough. On the otherhand, if the diﬀerences are not very big, then a larger sample size is requiredto have a reasonable power of detection. The problem gets more complicatedwhen it comes to identiﬁcation. We showed in Example 1 that we obtain a goodidentiﬁcation rate when a sample size of 100 networks is selected from eachsubgroup. Thus, the rate of correct identiﬁcation is small for a sample size of30. Once again, identiﬁcation depends on the magnitude of the diﬀerence insubnetwork size (number of edges or nodes) between subpopulations.

Humans are highly variable in their brain activity, which can be inﬂuenced, inturn, by their level of alertness, mood, motivation, health and many other fac-tors. Even the amount of coﬀee drunk prior to the scan can greatly inﬂuenceresting-state neural activity. What variables must be controlled to have a faircomparison between two or more groups? Certainly age, gender, and educationare among those variables, and in this study we found that another relevantvariable is the amount of hours slept the night prior to the scan. Although thismight seem pretty obvious, to the best of our knowledge, most studies do notcontrol for this variable. Another variable that we identiﬁed as being relevant isright inferior-temporal area volume. However, we also identiﬁed 29 other vari-ables, which, as we have shown, are highly interdependent. In principle, the role raiman & Fraiman/Statistics of brain networks of these variables is not surprising, since comparing brain activity between in-dividuals requires one to pre-process the images by realigning and normalizingthem to a standard brain. In other words, the relevance of speciﬁc area volumesmay simply be a by-product of the standardization process. However, if ourﬁnding that brain volumetric variables aﬀect functional networks is replicatedin other studies, this poses a problem for future experimental designs. Speciﬁ-cally, groups will not only have to be matched by variables such as age, genderand education level, but also in terms of volumetric variables, which can only beobserved in the scanner. Therefore, several individuals would have to be scannedbefore selecting the ﬁnal study groups.In sum, at least 60 subjects in each group must be tested to obtain highlyreproducible ﬁndings when analysing resting-state data with network method-ologies. Also, whenever possible, the same participants should be tested both ascontrols and as the treatment group (paired samples). In the case of comparinghealthy subjects with patients, many patients will need to be tested until a largeenough sample size is reached.

5. Acknowledgments

Data were provided by the Human Connectome Project, WU-Minn Consortium(Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657)funded by the 16 NIH Institutes and Centers that support the NIH Blueprint forNeuroscience Research; and by the McDonnell Center for Systems Neuroscienceat Washington University. This work was partially supported by PAI UdeSAand FAPESP grant 2013/07699-0.

References [1] Deco G, and Kringelbach ML (2014). Great expectations: using whole-brain computational connectomics for understanding neuropsychiatric dis-orders.

Neuron , 84, 892-905.[2] Stephan KE, Iglesias S, Heinzle J, and Diaconescu AO (2015). Transla-tional perspectives for computational neuroimaging.

Neuron , 87, 716-732.[3] Bullmore E. and Sporns O (2009). Complex brain networks: networktheoretical analysis of structural and functional systems.

Nature ReviewsNeuroscience , 10, 186-196.[4] Fornito A, Zalesky A, Bullmore E (2017). Fundamentals of Brain NetworkAnalysis.

Elsevier .[5] Anonymous (2017). Focus on human brain mapping.

Nat. Neurosci.

Nat. Rev. Neurosci.

14, 365-376.[7] Poldrack R (2017). Scanning the horizon: towards transparent and repro-ducible neuroimaging research.

Nat. Rev. Neurosci.

18, 115-126. raiman & Fraiman/Statistics of brain networks [8] Nichols TE, et al. (2016). Best Practices in Data Analysis and Sharing inNeuroimaging using MRI. Nat. Neurosci. , 20, 299-303.[9] Bennett CM, Miller MB (2010). How reliable are the results from func-tional magnetic resonance imaging?

Annals of the New York Academy ofSciences

Proc Natl Acad Sci USA ,114, E3368-E3369.[11] Fraiman D, Fraiman N, Fraiman R (2017). Non Parametric Statis-tics of Dynamic Networks with distinguishable nodes.

Test , DOI:10.1007/s11749-017-0524-8.[12] Cerqueira A, Fraiman D, Vargas C, Leonardi F (2017). Atest of hypotheses for random graph distributions built from EEGdata.

IEEE Transactions on Network Science and Engineering , DOI:10.1109/TNSE.2017.2674026[13] Kolar M, Song L, Ahmed A, Xing E (2010) Estimating Time-varyingnetworks.

Ann. Appl. Stat.

Estimating Time-varying networks. 4, 94-123.[14] Zalesky A, Fornito A, Bullmore E (2010). Network-based statistic: iden-tifying diﬀerences in brain networks.

Neuroimage , 53, 1197-1207.[15] Sanfeliu A, Fu K (1983). A distance measure between attributed relationalgraphs.

IEEE T. Sys. Man. Cyb. , 13, 353-363.[16] Schieber T, Carpi L, D´ıaz-Guilera A, Pardalos P, Masoller, C and RavettiM (2017). Quantiﬁcation of network structural dissimilarities.

Naturecommunications , 8, 13928.[17] Shimada Y, Hirata Y, Ikeguchi T, Aihara K (2016). Graph distance forcomplex networks.

Scientiﬁc reports , 6.[18] Gao X, Xiao B, Tao D, Li X (2010). A survey of graph edit distance.

Pattern Anal Appl , 13, 113-129.[19] Della-Maggiore V, Villalta JI, Kovacevic N, McIntosh AR (2015). Func-tional Evidence for Memory Stabilization in Sensorimotor Adaptation: A24-h Resting-State fMRI Study.

Cerebral Cortex bhv289.[20] Mawase F, Bar-Haim S, Shmuelof L (2017). Formation of Long-TermLocomotor Memories Is Associated with Functional Connectivity Changesin the Cerebellar?Thalamic?Cortical Network.

Journal of Neuroscience

37, 349-361.[21] Fraiman D, Chialvo D (2012). What kind of noise is brain noise: anomalousscaling behavior of the resting brain activity ﬂuctuations.

Frontiers inPhysiology , 3, 307.[22] Garcia-Cordero I, et al. (2015). Stroke and neurodegeneration inducediﬀerent connectivity aberrations in the insula.

Stroke , 46, 2673-2677.[23] Fraiman D, et al. (2016). Reduced functional connectivity within theprimary motor cortex of patients with brachial plexus injury.

NeuroimageClinical , 12, 277-284.[24] Dottori M, et al. (2017). Towards aﬀordable biomarkers of frontotem-poral dementia: A classiﬁcation study via network’s information sharing.

Scientiﬁc Reports , 7, 3822. raiman & Fraiman/Statistics of brain networks [25] Zalesky A, Cocchi L, Fornito A, Murray M, Bullmore E (2012). Connec-tivity diﬀerences in brain networks. Neuroimage

NeuroImage , 127, 324-332.[28] Krause A, et al. (2017). The sleep-deprived human brain.

Nature ReviewsNeuroscience , 18, 404-418.[29] Smith S, et al. (2015). A positive-negative mode of population covariationlinks brain connectivity, demographics and behavior.

Nature neuroscience ,18, 1565-1567.[30] Human Connectome Project (2015). WU-Minn HCP 900 Subjects DataRelease: Reference Manual. 67-87.[31] Griﬀanti L ,et al. (2014). ICA-based artefact removal and acceleratedfMRI acquisition for improved resting state network imaging.

Neuroimage ,95, 232-247.[32] Smith SM, et al. (2013). Resting-state fMRI in the Human ConnectomeProject.

Neuroimage , 80, 144-168.[33] Beckmann C, DeLuca M, Devlin J, and Smith S (2005). Investigations intoresting-state connectivity using independent component analysis.

Philo-sophical Transactions of the Royal Society of London B: Biological Sci-ences , 360, 1001-1013.[34] Fraiman D, Saunier G, Martins E, Vargas C (2014). Biological MotionCoding in the Brain: Analysis of Visually Driven EEG Functional Net-works.

PloS One , 0084612.[35] Amoruso L, Ib´a˜nez A, Fonseca B, Gadea S, Sede˜no L, Sigman M, Garc´ıaA, Fraiman R, Fraiman D (2017). Brain network organization predictsstyle-speciﬁc expertise during Tango dance observation.

Neuroimage , 146,690-700.[36] van den Heuvel MP, de Lange SC, Zalesky A, Seguin C, Yeo T, Schmidt R(2017). Proportional thresholding in resting-state fMRI functional connec-tivity networks and consequences for patient-control connectome studies:Issues and recommendations

Neuroimage , 152, 437-449.[37] Fischl, B. (2012). FreeSurfer.

Neuroimage , 62, 774-781. raiman & Fraiman/Statistics of brain networks Appendix

A1- Proof of Theorem 1 T := √ ma m (cid:88) i =1 √ n i (cid:18) n i n i − d G i ( M i ) − nn − d G ( M i ) (cid:19) (5.1) Theorem 2.

Under the null hypothesis the statistic T veriﬁes i) and ii), while T is sensitive to the alternative hypothesis, verifying iii).i) E ( T ) = 0 .ii) T is asymptotically ( K := min { n , n , .., n m } → ∞ ) Normal(0,1).iii) Under the alternative hypothesis T will be smaller than any negative valuefor K large enough (The test is consistent).Notation Let T = √ ma Z with- Z = (cid:80) mi =1 W i - W i = b i D G i ( M i ) − c i D G ( M i )- b i = √ n i n i − - c i = √ n i n − - D G i ( M i ) = n i d G i ( M i )- D G ( M i ) = nd G ( M i ). A1.1- Proof i : E ( T ) = 0 . Let denote by G , G , . . . , G n the sample networks from subpopulation 1, G , G , . . . , G n the ones from subpopulation 2, and so on until G m , G , . . . , G mn m the networksfrom subpopulation m . Let denote without superscript G , G , . . . , G n the com-plete pooled sample of networks where n = (cid:80) mi =1 n i . And let G k ⊕ , G k ⊕ , . . . , G k ⊕ n ⊕ be the pooled sample of networks without the sample k where n k ⊕ = (cid:80) mh (cid:54) = k n h .The sum of the distance from the pooled sample to the average network ofsample k ( M k ) can be decomposed in the following way, D G ( M k ) = D G k ( M k ) + D G k ⊕ ( M k ) . Where D G k ( M k ) = n k (cid:88) i

V ar ( D G k ⊕ ( M k )) = V ar ( 1 n k (cid:88) i

Cov ( X ki,j , X k ⊕ i,j X ki,j ) = M ( X ki,j ) M ( X k ⊕ i,j )) − ( M ( X ki,j , ) M ( X k ⊕ i,j )) . The third term on the right on eq. 4,

Cov ( D G k ( M k ) , D G k ⊕ ( M k )) = Cov ( 2 n k (cid:88) i

Cov ( D G k ( M k ) , D G ( M k )) Cov ( D G k ( M k ) , D G ( M k )) = Cov ( D G k ( M k ) , D G k ( M k ) + D G k ⊕ ( M k ))) == V ar ( D G k ( M k )) + Cov ( D G k ( M k ) , D G k ⊕ ( M k )) . The two terms have been previously calculated.

Cov ( D G r ( M r ) , D G t ( M t )) with r (cid:54) = tCov ( D G r ( M r ) , D G t ( M t )) = 0 , raiman & Fraiman/Statistics of brain networks since D G r ( M r ) and D G t ( M t ) are independent random variables Cov ( D G r ( M r ) , D G ( M t )) with r (cid:54) = tCov ( D G r ( M r ) , D G ( M t )) = Cov ( D G r ( M r ) , D G r ( M t ) + D G r ⊕ ( M t )) == Cov ( D G r ( M r ) , D G r ( M t )) + Cov ( D G r ( M r ) , D G r ⊕ ( M t ))Now using that D G r ( M t ) = n t (cid:80) i

1) toboth sides of the found solution for E ( T ) = 0 (the line x=y) and see if the signof E ( T ) change. If the sign changes then E ( T ) is an hyperbolic paraboloid, ifnot E ( T ) is a parabolic cylinder.We will study E ( T ) for ( x , y ) = (1 / , / (cid:15) ) and for ( x , y ) = (1 / , / − (cid:15) ) with (cid:15) >

0. For simplicity we will study lim n →∞ √ n E ( T ) which is enough forproof .It is straightforward to see that for both ( x , y ) and ( x , y )lim n →∞ √ n E ( T ) = − − c m ) √ c m (cid:15) , which is negative value since 0 < c m <

1, conﬁrming that E ( T ) is a paraboliccylinder that goes than, i.e. if q ( i ∗ , j ∗ ) (cid:54) = p ( i ∗ , j ∗ ) thenlim n →∞ E ( T ) = −∞ . To ﬁnish the proof we say that any other alternative hypothesis can be provedfrom this particular alternative scenario. For example, if there exist another( i ∗∗ , j ∗∗ ) with p ( i ∗∗ , j ∗∗ ) (cid:54) = q ( i ∗∗ , j ∗∗ ) then E ( T ) = E (cid:16) T i ∗ ,j ∗ (cid:17) + E (cid:16) T i ∗∗ ,j ∗∗ (cid:17) and we apply the same proof for each term. Another alternative hypothesismight be that there exist a unique ( i ∗ , j ∗ ) where p r ( i ∗ , j ∗ ) (cid:54) = p s ( i ∗ , j ∗ ) for r (cid:54) = s being p r ( i ∗ , j ∗ ) the probability of observing link ( i ∗ , j ∗ ) in subpopulation r . Inthis case E ( T ) is a quadric expression in p ( i ∗ , j ∗ ), p ( i ∗ , j ∗ ), ..., and p m ( i ∗ , j ∗ ) . And the same argument can be used obtaining the same result, under the alter-native hypothesis lim n →∞ E ( T ) = −∞ . Based on T it is easy to see that the rate of convergence √ n raiman & Fraiman/Statistics of brain networks A2- Example 1

20 40 60 80 100 K P o w e r Network TestLinks Test

Fig A1: Power of the tests as a function of the sample size for the model withparameters λ = 0 . λ = 2 /

3, and λ = 0 . l l l

10 15 20 25 30 . . . . . K V a r( T ) Fig A2:

Null hypothesis.

Variance of T statistics as a function of the samplesize, K for the model with parameters λ = 0 . λ = 0 .

8, and λ = 0 . raiman & Fraiman/Statistics of brain networks Number of nodes of subnetwork ( j ) P e r c en t age Fig A3: Histogram of number of nodes determine by identiﬁcation procedure.These results correspond to the model with parameters λ = 0 . λ = 0 . λ = 0 . K = 30. raiman & Fraiman/Statistics of brain networks A3- HCP Resting-state fMRI functional networks

Variable W W W PSQI–AmtSleep 9 6 4FS–R–Inferiortemporal–Area 5 5 3FS–SupraTentorial–Vol 5 5 3FS–R–WM–Vol 5 4 3FS–R–Cort–GM–Vol 6 3 3FS–BrainSeg–Vol 5 3 3FS–Tot–WM–Vol 5 3 3FS–Mask–Vol 5 3 3FS–L–Middletemporal–Area 9 4 2FS–R–Cuneus–Area 5 4 2FS–L–Lateraloccipital–Area 4 4 2FS–R–Superiorfrontal–Area 5 3 2FS–BrainSeg–Vol–No–Vent 5 3 2FS–L–Superiorfrontal–Area 4 3 2FS–L–WM–Vol 4 3 2FS–R–Precuneus–Area 3 3 2FS–R–VentDC–Vol 3 3 2FS–TotCort–GM–Vol 3 3 2FS–SubCort–GM–Vol 3 3 2FS–R–Fusiform–Area 5 2 2FS–OpticChiasm–Vol 4 2 2FS–L–Fusiform–Area 3 2 2FS–L–Pallidum–Vol 3 2 2FS–R–Putamen–Vol 3 2 2FS–L–Fusiform–Area 3 2 2

Table 1

Variables that partitioned the subjects in groups that present very high statistical diﬀerencesbetween the corresponding brain networks. Except the ﬁrst variable, the rest correspond tobrain volumetric variables. Only variables with W ≥ are included.raiman & Fraiman/Statistics of brain networks W W W FS–L–Supramarginal–Area 5 3 1FS–BrainStem–Vol 5 2 1FS–R–Precentral–Area 5 2 1FS–R–Rostralmiddlefrontal–Area 4 2 1FS–Total–GM–Vol 3 2 1FS–R–Hippo–Vol 3 2 1

Table 2

Variables that partitioned the subjects in groups that present high statistical diﬀerencesbetween the corresponding brain networks. Except the ﬁrst variable, the rest correspond tobrain volumetric variables. Only variables with W = 1 are included. Note that the total greymatter volume is part of these relevant variables.are included. Note that the total greymatter volume is part of these relevant variables.