[PDF] Identification of brain states, transitions, and communities using functional MRI

Abstract

Full PDF

IIdentiﬁcation of brain states, transitions, and communities usingfunctional MRI

Lingbin Bian a,b,* , Tiangang Cui a , B.T. Thomas Yeo c , Alex Fornito b , Adeel Razi b,d,+,* and Jonathan Keith a,+a School of Mathematics, Monash University, Australia b Turner Institute for Brain and Mental Health, School of Psychological Sciences and Monash Biomedical Imaging, MonashUniversity, Australia c Department of Electrical and Computer Engineering, National University of Singapore, Singapore d Wellcome Centre for Human Neuroimaging, University College London, United Kingdom + Joint senior authors * Corresponding authors: Lingbin Bian ([email protected]) and Adeel Razi ([email protected])

Abstract

Brain function relies on a precisely coordinated and dynamicbalance between the functional integration and segregationof distinct neural systems. Characterizing the way in whichneural systems reconﬁgure their interactions to give rise todistinct but hidden brain states remains an open challenge.In this paper, we propose a Bayesian model-based character-ization of latent brain states and showcase a novel methodbased on posterior predictive discrepancy using the latentblock model to detect transitions between latent brain statesin blood oxygen level-dependent (BOLD) time series. Theset of estimated parameters in the model includes a latentlabel vector that assigns network nodes to communities, andalso block model parameters that reﬂect the weighted con-nectivity within and between communities. Besides exten-sive in-silico model evaluation, we also provide empiricalvalidation (and replication) using the Human ConnectomeProject (HCP) dataset of 100 healthy adults. Our results ob-tained through an analysis of task-fMRI data during workingmemory performance show appropriate lags between exter-nal task demands and change-points between brain states,with distinctive community patterns distinguishing ﬁxation,low-demand and high-demand task conditions. I identifying changes in brain connectivity over timecan provide insight into fundamental properties of hu-man brain dynamics. However, the deﬁnition of dis-crete brain states and the method of identifying the stateshas not been commonly agreed [1]. Experiments targetingunconstrained spontaneous ‘resting-state’ neural dynamics[2, 3, 4, 5, 6, 7, 8, 9, 10] have limited ability to infer latentbrain states or determine how the brain segues from onestate to another because it is not clear whether changes inbrain connectivity are induced by variations in neural ac-tivity (for example induced by cognitive or vigilance states)or ﬂuctuations in non-neuronal noise [11, 12, 13]. A re-cent study with naturalistic movie stimuli used a hidden Markov model to explore dynamic jumps between discretebrain states and found that the variations in the sensory andnarrative properties of the movie can evoke discrete brainprocesses [14]. Task-fMRI studies with more restrictive con-straints on stimuli have demonstrated that functional con-nectivity exhibits variation during motor learning [15] andanxiety-inducing speech preparation [16]. Although task-based fMRI experiments can, to some extent, delineate theexternal stimuli (for example, the onset and duration ofstimuli in the block designed experiments), which constitutereference points against which to identify changes in the ob-served signal, this information does not precisely determinethe timing and duration of the latent brain state relativeto psychological processes or neural activity. For example,an emotional stimulus may trigger a neural response whichis delayed relative to stimulus onset and which persists forsome time after stimulus oﬀset. Moreover, the dynamics ofbrain states and functional networks are not induced onlyby external stimuli, but also by unknown intrinsic latentmental processes [17]. Therefore, the development of non-invasive methods for identifying transitions of latent brainstates during both task performance and task-free conditionsis necessary for characterizing the spatiotemporal dynamicsof brain networks.Change-point detection in multivariate time series is a sta-tistical problem that has clear relevance to identifying tran-sitions in brain states, particularly in the absence of knowl-edge regarding the experimental design. Several change-point detection methods based on spectral clustering [18, 19]and dynamic connectivity regression (DCR) [16] have beenpreviously developed and applied to the study of fMRI timeseries, and these have enhanced our understanding of braindynamics. However, change-point detection with spectralclustering only evaluates changes to the component eigen-structures of the networks but neglects the weighted connec-tivity between nodes, while the DCR method only focuseson the sparse graph but ignores the modules of the brainnetworks Other change-point detection strategies include afrequency-speciﬁc method [20], applying a multivariate cu-mulative sum procedure to detect change-points using EEGdata, and methods which focus on large scale network es-1 a r X i v : . [ q - b i o . N C ] J a n imation in fMRI time series [21, 22, 23, 24]. Many fMRIstudies use sliding window methods for characterizing thetime-varying functional connectivity in time series analysis[2, 8, 25, 26, 27, 28, 29]. Methods based on hidden Markovmodels (HMM) are also widely used to analyze transientbrain states [30, 31, 32].A community is deﬁned as a collection of nodes that aredensely connected in a network. The problem of commu-nity detection is a topical area of network science [33, 34].How communities change or how the nodes in a networkare assigned to speciﬁc communities is an important prob-lem in the characterization of networks. Although manycommunity detection problems in network neuroscience arebased on modularity [15, 35, 36], recently a hidden Markovstochastic block model combined with a non-overlappingsliding window was applied to infer dynamic functional con-nectivity for networks, where edge weights were only binaryand the candidate time points evaluated were not consecu-tive [37, 38]. More general weighted stochastic block mod-els [39] have been used to infer structural connectivity forhuman lifespan analysis [40] and to infer functional connec-tivity in the mesoscale architecture of drosophila, mouse,rat, macaque, and human connectomes [41]. However, thesestudies using the weighted stochastic block model only ex-plore the brain network over the whole time course of theexperiment and neglect dynamic properties of networks.Weighted stochastic block models [39] are described in termsof exponential families (parameterized probability distribu-tions), with the estimation of parameters performed usingvariational inference [42, 43]. Another relevant statisticalapproach introduces a fully Bayesian latent block model[2, 3], which includes both a binary latent block model and aGaussian latent block model as special cases. The Gaussianlatent block model is similar to the weighted stochastic blockmodel, but diﬀerent methods have been used for parameterestimation, including Markov chain Monte Carlo (MCMC)sampling.Although there is a broad literature exploring change-point detection, and also many papers that discuss com-munity detection, relatively few papers combine these ap-proaches, particularly from a Bayesian perspective. In thispaper, we develop Bayesian model-based methods whichunify change-point detection and community detection toexplore when and how the community structure of discretebrain state changes under diﬀerent external task demands atdiﬀerent time points using functional MRI. There are severaladvantages of our approach compared to existing change-point detection methods. Compared to data-driven methodslike spectral clustering [18, 19] and DCR [16], which eitherignore characterizing the weighted connectivity or the com-munity patterns, the fully Bayesian framework and Markovchain Monte Carlo method provide ﬂexible and powerfulstrategies that have been under-used for characterizing thelatent properties of brain networks, including the dynamicsof both the community memberships and weighted connec-tivity properties of the nodal community structures. Exist-ing change-point detection methods based on the stochasticblock model all use non-overlapping sliding windows andwere applied only to binary brain networks [37, 38]. In contrast to the stochastic block model used in time-varyingfunctional connectivity studies, the Gaussian latent blockmodel used in our work considers the correlation matrix asan observation without imposing any arbitrary thresholds,so that all the information contained in the time series ispreserved, resulting in more accurate detection of change-points. Moreover, unlike methods based on ﬁxed commu-nity memberships over the time course [38], our methodsconsider both the community memberships and parametersrelated to the weighted connectivity to be time varying,which results in more ﬂexible estimation of both commu-nity structure and connectivity patterns. Furthermore, ourBayesian change-point detection method uses overlappingsliding windows that assess all of the potential candidatechange-points over the time course, which increases the res-olution of the detected change-points compared to methodsusing non-overlapping windows [37, 38]. Finally, the pro-posed Bayesian change-point detection method is computa-tionally eﬃcient, scaling to whole-brain networks potentiallycovering hundreds of nodes within a reasonable time framein the order of tens of minutes.Our paper presents four main contributions, namely: (i)we quantitatively characterize discrete brain states withweighted connectivity and time-dependent community mem-berships, using the latent block model within a tempo-ral interval between two consecutive change-points; (ii) wepropose a new Bayesian change-point detection methodcalled posterior predictive discrepancy (PPD) to estimatetransition locations between brain states, using a Bayesianmodel ﬁtness assessment; (iii) in addition to the locationsof change-points, we also infer the community architectureof discrete brain states, which we show are distinctive of 2-back, 0-back, and ﬁxation conditions in a working-memorytask-based fMRI experiment, and; (iv) we further empir-ically ﬁnd that the estimated change-points between brainstates show appropriate lags compared to the external work-ing memory task conditions. Results

Our proposed method is capable of identifying transitionsbetween discrete brain states and infer the patterns of con-nectivity between brain regions that underlie those brainstates by modelling time-varying dynamics in BOLD sig-nal under diﬀerent stimuli. In this section, we validate ourproposed methodology by applying Bayesian change-pointdetection and network estimation to both synthetic dataand real fMRI data. The Bayesian change-point detectionmethod is described in Fig. 1 and the mathematical formu-lation and detailed descriptions are in the

Methods section(also see

Supplementary information ). We ﬁrst use syn-thetic multivariate Gaussian data for extensive validationand critically evaluate the performance of our change-pointdetection and sampling algorithms. For real data analysis,we use working memory task fMRI (WM-tfMRI) data fromthe Human Connectome Project (HCP) [46]. We extractedthe time series of 35 nodes whose MNI coordinates were de-termined by signiﬁcant activations obtained via clusterwise2 ig. 1:

The framework for identifying brain states, transitions and communities: a Schematic of the proposed Bayesian change-point detectionmethod. Three diﬀerent background colors represent three brain states of individual subjects with diﬀerent community architectures. Thecolors of the nodes represent community memberships. A sliding window of width W centered at t is applied to the time series. The diﬀerentcolored time series correspond to BOLD time series for each node. The sample correlation matrix x t (i.e., an observation for our Bayesianmodel) is calculated from the sample data Y t within the sliding window. We use the Gaussian latent block model to ﬁt the observationsand evaluate model ﬁts to the observations to obtain the posterior predictive discrepancy index (PPDI). We then calculate the cumulativediscrepancy energy (CDE) from the PPDI and use the CDE as a scoring criterion to estimate the change-points of the community architectures. b Dynamic community memberships of networks with N = 16 nodes. A latent label vector z contains the labels ( k ) of speciﬁc communitiesfor the nodes. Nodes of the same color are located in the same community. The dashed lines represent the (weighted) connectivity betweencommunities and the solid lines represent the (weighted) connectivity within the communities. c Model ﬁtness assessment. The observationis the realized adjacency matrix; diﬀerent colors in the latent block model represent diﬀerent blocks with the diagonal blocks representingthe connectivity within a community and the oﬀ-diagonal blocks representing the connectivity between communities. To demonstrate distinctblocks of the latent block model, in this schematic we group the nodes in the same community adjacently and the communities are sorted. Inreality, the labels of the nodes are mixed with respect to an adjacency matrix. The term π kl represents the model parameters in block kl . ig. 2: Results of method validation using synthetic data: a CDE of the multivariate Gaussian data with SNR=5dB using diﬀerent models( K =6, 5, 4, and 3). The sliding window size for converting from time series to correlation matrices sequence is W = 20 , whereas (for smoothing)the sliding window size for converting from PPDI to CDE is W s = 10 . The vertical dashed lines are the locations of the true change-points( t =20, 50, 80, 100, 130, and 160). The colored scatterplots in the ﬁgures are the CDEs of individual (virtual) subjects and the black curve isthe group CDE (averaged CDE over 100 subjects). The red points are the local maxima and the blue points are the local minima. b Localﬁtting with diﬀerent models (from K =3 to 18) for synthetic data (SNR=5dB). Diﬀerent colors represent the PPDI values of diﬀerent stateswith the true number of communities K true . c The estimation of community constituents for SNR=5dB at each discrete state: t =36, 66, 91,116, 146) for brain states 1 to 5, respectively. The estimations of the latent label vectors ( Estimation ) and the label vectors (

Method validation using synthetic data

To validate the Bayesian change-point detection algorithm,we ﬁrst use synthetic data with the signal to noise ratio(SNR)=5dB. The simulated states of segments between twotrue change-points in the synthetic data could be repeat-ing or all diﬀerent, depending on the settings of the pa-rameters in the generative model. A detailed descriptionof the generative model and parameter settings for simu-lating the synthetic data are provided in

SupplementarySection 1 . Further simulation results with diﬀerent lev-els of SNR (SNR=10dB, SNR=0dB, and SNR=-5dB) areprovided in

Supplementary Figures 1, 2, and 3 . Theresulting group-level cumulative discrepancy energy (CDE)scores (see

Methods section for how we deﬁne CDE) usingmodels with diﬀerent values of K , where K is the numberof communities, are shown in Fig. 2a. We use a latentblock model to ﬁt the adjacency matrix at consecutive timepoints for change-point detection, which we call global ﬁt-ting. The local maxima (red dots) of the group CDE indi-cate locations of change-points and the local minima (bluedots) correspond to distinct states that diﬀer in their com-munity architecture. We ﬁnd that the local maxima (reddots) are located very close to the true change-points in allof the graphs (in Fig. 2a) which means that the global ﬁttinghas good performance. Here we clarify that global ﬁtting isused to estimate the locations of the change-points or tran-sitions of brain states, and local ﬁtting is used to select alatent block model to estimate the community structures ofdiscrete brain states (see Methods section for a detailedexplanation of global and local ﬁtting).Next, using the global ﬁtting results, with K = 6 and W = 20 , where W is the width of the sliding (rectangular)window, we ﬁnd the local minima (the blue dots) locationsto be t = { , , , , } , where each location corre-sponds to a discrete state. Next, we use local ﬁtting toselect a model (i.e. K for local inference) to infer the com-munity membership and model parameters relating to theconnectivity of the discrete states. For local inference, thegroup averaged adjacency matrix is considered as the obser-vation. We assess the goodness of ﬁt between observationand a latent block model with various values of K (from K = 3 , · · · , ) using posterior predictive discrepancy foreach local minimum, as shown in Fig. 2b. We selected thevalue of K at which the curve starts to ﬂatten as the pre-ferred model. We ﬁnd that the model assessment curves forstates 1, 2, 4, and 5 ﬂatten at K = 4 , whereas the modelassessment curve for state 3 is ﬂat over the entire range(from K = 3 and up). Therefore the selected models are K = { , , , , } for states 1 to 5, respectively.To validate the MCMC sampling of the density p ( z | x , K ) ,we compare the estimate of the latent label vector to theground truth of the node memberships. Fig. 2c shows theinferred community architectures of the discrete states in-cluding the estimated latent label vectors and the modelparameters of block mean and variance. The true label vec-tors that determine the covariance matrix in the generative models are also included in this ﬁgure. We use the mostfrequent latent label vectors in the Markov chain after theburn-in steps as the estimate. Note that label-switching oc-curs in the MCMC sampling, which is a well-known prob-lem in Bayesian estimation of mixture models [1]. In theresults presented here, the node memberships have been re-labelled to correct for label switching. The algorithm usedfor this purpose is described in Supplementary Section 2 .We ﬁnd that the estimated latent label vectors are (largely)consistent with the ground truth of labels that determinedthe covariance matrix. The discrepant ‘True’ and ‘Estima-tion’ patterns with respect to states 2 and 4 are due to thebias induced by the selected model ( K = 5 for the groundtruth and K = 4 for the selected model). Although thecolors of the labels in the ‘True’ and ‘Estimation’ patternsare discrepant, we can see that the values of the labels arelargely consistent, with some labels of k = 5 missing in the‘Estimation’ pattern compared to the ‘True’ pattern.Given the estimated latent label vector, we then drawsamples of the block mean and variance from the posterior p ( π | x , z ) conditional on the estimated latent label vector z . However, there is no ground truth for the block meanand variance when we generate the synthetic data. Thevalidation of sampling model parameters is illustrated in the Supplementary Figure 4 . Method validation using working memory(WM) task-fMRI data

In this analysis, we used preprocessed working memory(WM)-tfMRI data obtained from 100 unrelated healthyadult subjects under a block designed paradigm, availablefrom the Human Connectome Project (HCP) [46]. Wemainly focused on the working memory load contrasts of2-back vs ﬁxation, 0-back vs ﬁxation, or 2-back vs 0-back,and determine the brain regions of interest from the GLManalysis. After group-level GLM analysis, we obtained clus-ter activations with locally maximum Z statistics for diﬀer-ent contrasts. The results in the form of thresholded localmaximum Z statistic (Z-MAX) maps are shown in

Supple-mentary Figure 5 . The light box views of thresholdedlocal maximum Z statistic with diﬀerent contrasts are pro-vided in

Supplementary Figures 6 . Signiﬁcant activa-tions obtained by clusterwise inference and the correspond-ing MNI coordinates with region names are shown in Table1. We ﬁnally extracted the time series of 35 brain regionscorresponding to the MNI coordinates. Refer to

Methods section for the details of experimental design, GLM analysisand time series extraction.

Change-point detection for tfMRI time series

In the main text, we illustrate the results using the HCPworking memory data of session 1 i.e. with the polarity ofLeft to Right (LR). The replication of results obtained byusing session 2 (RL) are demonstrated in

SupplementaryFigures 10 to 15) and

Supplementary Table 1 . Wecompare the brain states of diﬀerent working memory loadsfor a speciﬁc kind of picture (tool, body, face, and place)5NI coordinates Voxel locationsNode number Z-MAX x y z x y z Region name1 4.97 48 -58 22 21 34 47 Angular Gyrus2 9.61 36 8 12 27 67 42 Central Opercular Cortex3 8.27 -36 4 12 63 65 42 Central Opercular Cortex4 6.48 40 34 -14 25 80 29 Frontal Orbital Cortex5 7.83 -12 46 46 51 86 59 Frontal Pole6 4.84 54 32 -4 18 79 34 Inferior Frontal Gyrus7 6 52 38 10 19 82 41 Inferior Frontal Gyrus8 4.38 -52 40 6 71 83 39 Inferior Frontal Gyrus9 6.05 52 -70 36 19 28 54 Inferior Parietal Lobule PGp R10 7.26 -48 -68 34 69 29 53 Inferior Parietal Lobule PGp L11 6.18 44 -24 -20 23 51 26 Inferior Temporal Gyrus12 9.54 36 -86 16 27 20 44 Lateral Occipital Cortex13 8.04 -30 -80 -34 60 23 19 Left Crus I14 7.6 -8 -58 -52 49 34 10 Left IX15 6.9 -22 -48 -52 56 39 10 Left VIIIb16 14.5 6 -90 -10 42 18 31 Lingual Gyrus17 10.3 30 10 58 30 68 65 Middle Frontal Gyrus18 6.61 66 -30 -12 12 48 30 Middle Temporal Gyrus19 4.53 -68 -34 -4 79 46 34 Middle Temporal Gyrus20 14.5 18 -88 -8 36 19 32 Occipital Fusiform Gyrus21 5.06 -12 -92 -2 51 17 35 Occipital Pole22 9.87 6 40 -6 42 83 33 Paracingulate Gyrus23 12 42 -16 -2 24 55 35 Planum Polare24 11.3 -40 -22 0 65 52 36 Planum Polare25 9.03 38 -26 66 26 50 69 Postcentral Gyrus26 8.31 -10 -60 14 50 33 43 Precuneus Cortex27 5.7 46 -60 -42 22 33 15 Right Crus I28 8.34 32 -80 -34 29 23 19 Right Crus I29 10.9 32 -58 -34 29 34 19 Right Crus I30 6.41 10 -8 -14 40 59 29 Right Hippocampus31 6.19 32 -52 2 29 37 37 Right Lateral Ventricle32 7.69 24 -46 16 33 40 44 Right Lateral Ventricle33 6.13 0 10 -14 45 68 29 Subcallosal Cortex34 10.7 48 -44 46 21 41 59 Supramarginal Gyrus35 4.23 -50 -46 10 70 40 41 Supramarginal Gyrus

Table 1:

Signiﬁcant activations of cluster wise inference (cluster-corrected Z>3.1, P<0.05): Activations are described in terms of localmaximum Z (Z-MAX) statistic within each cluster including the activations of all contrast maps among 2-back, 0-back, and ﬁxation. involved in the experiments. As there is no repetition oftask conditions in a single session, the estimated patterns ofbrain states do not recur in LR session. One can comparethe LR and RL session for the recurrence of a speciﬁc taskcondition. To detect change-points in the extracted timeseries, we ﬁrst converted each time series into a sequenceof correlation matrices for each subject. We then modeledthis sequence of correlation matrices for each subject usingthe latent block model and evaluated posterior predictivediscrepancy (PPD) to assess the model ﬁtness. Next, weconverted the resulting PPD index (PPDI) to a CDE scorefor each subject. For group-level analysis, we averaged theresulting individual CDE scores over 100 subjects to obtaina sequence of group CDE as shown in Fig. 3 with diﬀer-ent window sizes W =22, 26, 30, 34 (Fig. 3a-d). We chosethe window size for converting from PPDI to CDE to bea constant W s = 10 for all of the assessments. The multi-colored scatterplots in the ﬁgures are the individual CDE scores. Although there are some false positives in terms ofboth local maxima and local minima (here false positivesare deﬁned as multiple points of local minima or local max-ima that should be discarded in a single task block), wenote that the onsets of the stimuli precede the inferred localmaxima, and the local minima also show appropriate lags(for example, about 10 frames, or 7 seconds as shown inFig. 3c) compared to the mid-points of the working mem-ory blocks. For ﬁxation blocks, the local maxima show lagscompared to the mid-points of the blocks. These lags arelikely due to the delay in the haemodynamic response ofbrain activation. With the same number of communities K = 3 , we found there are more false positives with windowsize W = 22 compared to W = 26 , W = 30 and W = 34 .This is because there are fewer sample data contained inthe sliding window if the window size is smaller. We alsotried diﬀerent models with K = 4 and K = 5 . We foundthat there are more false positives with larger values of K .6 ig. 3: The results of Bayesian change-point detection for working memory tfMRI data (session 1, LR): Cumulative discrepancy energy(CDE) with diﬀerent sliding window sizes ( W =22, 26, 30, and 34; a - d under the model K = 3 ) and diﬀerent models (K=3, 4, and 5; c , e and f using a sliding window of W = 30 ). W s is width of the sliding window used for converting from PPDI to CDE. The vertical dashed linesare the times of onset of the stimuli, which are provided in the EV.txt ﬁles in the released data. The multi-color scatterplots in the ﬁguresrepresent the CDEs of individual subjects and the black curve is the group CDE (averaged CDE over 100 subjects). The red dots are the localmaxima, which are taken to be the locations of change-points, and the blue dots are the local minima, which are used for local inference of thediscrete brain states. Larger values of K imply more blocks in the model, whichgives rise to relatively better model ﬁtness. In this situation,there will be less distinction between relatively static brainstates and transition states with change-points in the win-dow. The false positives among the local minima and localmaxima are also inﬂuenced by the window size W s used for transforming from PPDI to CDE. A larger window size (forexample W s = 30 ) reduces the accuracy of the estimatesand results in false negatives. Too small a value of W s in-creases the false positive rate. We found that W s = 10 workswell for all of the real data analyses. The time spent to runthe posterior predictive assessment on each subject ( T =4057 ig. 4: Detected change-points and the locations of brain states matching the task blocks for working memory tfMRI data (session 1, LR)with K = 3 , and W = 30 . The numbers in the small rectangular frame are the boundaries of the external task demands, the backgroundcolors in the large rectangular are the diﬀerent task conditions, and the blue and red bars with speciﬁed numbers are the estimated locationsof discrete brain states and change-points. frames, posterior predictive replication number S =50, K =3,and the window size W =30) by using a 2.6 GHz Intel Corei7 processor unit was about 10 minutes. Fig. 5:

Local ﬁtting between averaged adjacency matrix and modelsfrom K =3 to 18. Diﬀerent colors represent the PPDI values of diﬀerentbrain states. The ‘local inference’ is deﬁned as a way to estimate thediscrete brain state corresponding to each task conditionvia Bayesian modelling. The group averaged dynamic func-tional networks were analyzed by performing ‘local infer-ence’ as follows. In this experiment, we used results ob-tained for K = 3 and W = 30 (see Fig. 3c). We ﬁrstlisted all of the local maxima and minima, where the timepoints with distances smaller than 8 are grouped as vec-tors. Maxima and minima deemed to be false positiveswere discarded. The time points corresponding to the lo-cal minimum value of group CDE are (41, 46, 48, 50,54) and (140, 147, 152). These were determined as singletime points corresponding to discrete brain states, specif-ically 41 and 140 respectively, with all the other elementsin the vectors presumed to be false positives and discarded.Time points with CDE value diﬀerence smaller than 0.002were also discarded (points (191, 192) and (290, 292)).Then the resulting estimated change-point locations (max-ima) are at { , , , , , , } , and the esti- mated time points of the discrete brain states (minima) are { , , , , , , , } . A comparison of the de-tected change-points to the task blocks for working memorytfMRI data are shown in Fig. 4. Local inference for discrete brain states

For ‘local inference’, we ﬁrst calculated the group averagedadjacency matrix with a window of W g = 20 , for all brainstates. The center of the window is located at the time pointof the local minimum value. We evaluated the goodness ofﬁt for models with diﬀerent values of K (Fig. 5). The re-sults demonstrate that the goodness of ﬁt trends to ﬂat at K = 6 . To avoid empty communities, K = 6 is then selectedas the number of communities in local inference. Note thatthe value of K is unchanged in Markov chain Monte Carloestimation, but an empty community containing no labelsmay take place. In the remainder of this section, we usedthe model with K = 6 for all brain states. The times spentto run the estimation for latent label vector and model pa-rameters for a single discrete brain state (MCMC samplingnumber S s =200, K =6, and the window size W =20) by us-ing a 2.6 GHz Intel Core i7 processor unit were about 1.85and 1.25 seconds respectively.The inferred community structures are visualized usingBrainNet Viewer [49] and Circos maps [50] as shown in Fig.6. Estimated latent label vectors are visualized using dif-ferent colors to represent diﬀerent communities. The nodesare connected by weighted links at a sparsity level of 10%(we also visualized the brain states with sparsity levels of20% and 30%: Supplementary Figures 7 and 8 ). Thedensity and variation of connectivity within and betweencommunities are characterized by the estimated block meanmatrix and block variance matrix in

Supplementary Fig-ure 9 and 15 . We ﬁrst describe the working memory tasksinvolving the 2-back tool (Fig. 6a), 0-back tool (Fig. 6e),and ﬁxation (Fig. 6c, f, i). The locations of ﬁxation statesare considered as the locations of the change-points at 107,206, and 306 (we consider the ﬁxation state as a transitionbuﬀer between two working memory blocks). We found thatthe connectivity between the inferior parietal lobule (node9) and middle frontal gyrus (node 17), and the connectiv-ity between the inferior parietal lobule (node 9) and supra-8 ig. 6:

Community structure of the discrete brain states: The ﬁgures with blue frames represent brain states corresponding to workingmemory tasks (2-back tool at t = 41 ; 0-back body at t = 76 ; 2-back face at t = 140 ; 0-back tool at t = 175 ; 2-back body at t = 239 ; 2-back placeat t = 278 ; 0-back face at t = 334 ; and 0-back place at t = 375 in a - k ) and those with red frames represent brain states corresponding to ﬁxation(ﬁxation at t =107, 206, and 306 in c , f , and i ). Each brain state shows connectivity at a sparsity level of 10%. The diﬀerent colors of the labelsrepresent community memberships. The strength of the connectivity is represented by the colors shown in the bar at the right of each frame.In Circos maps, nodes in the same community are adjacent and have the same color. Node numbers and abbreviations of the correspondingbrain regions are shown around the circles. In each frame, diﬀerent colors represent diﬀerent community numbers. The connectivity above thesparsity level is represented by arcs. The blue links represent connectivity within communities and the red links represent connectivity betweencommunities. k =3, 4, and 6 andweak connections in communities k =1, 2, and 5 for a ma-jority of the states. The Circos map provides a diﬀerentperspective on the community pattern of the brain state.We summarise the common community pattern for speciﬁcworking memory load or ﬁxation in Table 1. Discussion

We proposed a model-based method for identifying transi-tions and characterising brain states between two consecu-tive transitions. The transitions between brain states identi-ﬁed by the Bayesian change-point detection method exhibitappropriate lags compared to the external task demands.This indicates a signiﬁcant diﬀerence between the temporalboundaries of external task demands and the transitions oflatent brain states. We also estimated the community mem-bership of brain regions that interact with each other to giverise to the brain states. Furthermore, we showed that the es-timated patterns of community architectures show distinctnetworks for 2-back and 0-back working memory load andﬁxation. We ﬁrst focus on the results of the brain states inferredfrom the WM-tfMRI data and discuss the estimated pat-terns of connectivity for diﬀerent blocks of working memorytasks after local inference. We ﬁnd that there are distinctconnectivity diﬀerences between 2-back, 0-back, and ﬁxa-tion. We ﬁrst compare the working memory and the ﬁxationconditions, with particular reference to the middle frontalgyrus (node 17) and inferior parietal lobule (node 9) whichincludes the angular gyrus (node 1) and supramarginal gyrus(node 34). The middle frontal gyrus is related to manipu-lation, distractor resistance, refreshing, selection for actionand monitoring, and the inferior parietal lobule is relatedto focus recognition and long-term recollection [51]. In ourresults, we ﬁnd that the connectivity between the middlefrontal gyrus and inferior parietal lobule is increased in theworking memory tasks compared to the ﬁxation state. Theconnectivity between the lateral occipital cortex (node 12)and occipital fusiform cortex (node 21) is strong and stablein ﬁxation compared to the working memory tasks, and ahigher working memory load may increase the instability ofthis connectivity.Regarding the diﬀerence between 2-back and 0-back work-ing memory tasks, we focus on the angular gyrus and supra-marginal gyrus. In our experimental results, we ﬁnd thatthere is increased connectivity between the angular gyrus(node 1) and supramarginal gyrus (node 34) in 2-back com-pared to 0-back working memory task blocks. The angulargyrus is located in the posterior part of the inferior parietallobule. The inferior parietal cortex, including the supra-marginal gyrus and the angular gyrus, is part of a “bottom-up” attentional subsystem that mediates the automatic al-location of attention to task-relevant information [52]. Pre-vious work has shown that activation of the inferior parietallobe is involved in the shifting of attention towards particu-lar stimuli [53]. The right inferior parietal lobule includingangular gyrus is related to attention maintaining and salientevent encoding in the environment [54]. These research ﬁnd-ings are consistent with and justify our results.Next, we focus on the methodology. We introduced pos-terior predictive discrepancy (PPD), a novel method basedon model ﬁtness assessment combined with sliding windowanalysis to detect change-points in various functional brainnetworks and to infer the dynamics when a brain changesstate. Posterior predictive assessment is a method based onBayesian model comparison. Other Bayesian model compar-ison methods including Bayes factors [55, 56], the Bayesianinformation criterion (BIC) [57], and Kullback–Leibler di-vergence [58] are also widely used in mixture modelling. Oneadvantage of the posterior predictive assessment is that thecomputation for the assessment is a straightforward byprod-uct of the posterior sampling required in the conventionalBayesian estimation.We deﬁned a new criterion named cumulative discrepancyenergy (CDE) to estimate locations of these change-pointsor transitions. The main idea underlying this novel strategyis to recognize that the goodness-of-ﬁt between the modeland observation is reduced if there is a change-point locatedwithin the current sliding window (the sample data in thewindow can be considered as being generated from two la-10 -back 0-back Fixation

Community Node number Community Node number Community Node numberk=1 k=1 k=1k=2 15 30 k=2 k=2 15 31 32k=3 16 20 k=3 16 20 21 k=3 12 16 20 21k=4 1 9 17 34 k=4 k=4 1 9 25k=5 11 14 k=5 k=5 3 11 14k=6 8 19 35 k=6 19 k=6 5 8 10 19

Table 2:

This table summarises the nodes commonly located in a speciﬁc community k for all of the picture types in the working memorytasks. tent brain network architectures in this case), resulting ina signiﬁcant increase in CDE. We use overlapping, rectan-gular, sliding windows so that all of the time points areincluded.The dynamics of the brain states are not only induced byexternal stimuli, but also the latent mental process, such asmotivation, alertness, fatigue, and momentary lapse [17].Crucially, directly using the temporal boundaries (onsetsand durations) associated with predeﬁned task conditionsto infer the functional networks may not be suﬃciently rig-orous and accurate. The boundaries of the task demand arenot the timing and duration of the latent brain state. Theestimated change-points in our experiments are consistentwith the working memory task demands but show a delayrelative to the onsets of the task blocks or the mid-points ofﬁxation blocks. These results reﬂect the delay involved dueto the haemodynamic response, and also the delay arisingfrom recording the data using the fMRI scanner, betweensignal emission and reception.The results of the task fMRI data analysis show that thechange-point detection algorithm is sensitive to the choiceof model. We found that a less complex model (with smaller K ) for global ﬁtting gave fewer false positives, so it had bet-ter change-point detection performance than models withlarger K . Selecting a suitable window size W is also veryimportant for our method. Too small a window size re-sults in too little information being extracted from the datawithin the window, causing the calculated CDE to ﬂuctu-ate more, making it diﬃcult to discriminate local maximaand local minima in the CDE score time series. Too large awindow size (larger than the task block length) reduces theresolution at which the change-points can be distinguished.In the working memory task fMRI data set, the length ofthe task block is around 34 frames and the ﬁxation is about20 frames. Therefore, we made the window size at most 34frames to ensure all potential change-points can be distin-guished, and at least 20 frames to ensure the eﬀectiveness ofthe posterior predictive assessment. In our experiments, weused window sizes of 22, 26, 30, and 34, which were all largerthan the length of the ﬁxation block. This means it was notpossible to detect the two change-points at both ends of ﬁx-ation blocks, so we consider the whole ﬁxation block as asingle change-point (i.e., a buﬀer between two task blocks).The latent block model provides a ﬂexible approach tomodeling and estimating the dynamical assignment of nodesto a community. Note that the latent block model was ﬁttedto the adjacency matrix of each individual subject in global ﬁtting, and was ﬁtted to the group-averaged adjacency ma-trix in the local ﬁtting. Diﬀerent choices of π can generatediﬀerent connection patterns in the adjacency matrix. Thelikelihood is Gaussian and the connectivity is weighted, bothof which facilitate treating the correlation matrix as an ob-servation, without losing much relevant information fromthe time series.We treat both the latent label vector and block modelparameters as quantities to be estimated. Changes in com-munity memberships of the nodes are reﬂected in changesin the latent labels, and changes in the densities and vari-ations in functional connectivity are reﬂected in changes inthe model parameters.Empirical fMRI datasets have no ground truth regardingthe locations of latent transitions of the brain states andnetwork architectures. Although the task data experimentsinclude the timings of stimuli, the exact change-points be-tween discrete latent brain states are uncertain. Here, weused the multivariate Gaussian model to generate syntheticdata (ground truth) to validate our proposed algorithmsby comparing ground truth to the estimated change-pointsand latent labels. With extensive experiments using syn-thetic data, we demonstrated the very high accuracy of ourmethod. The multivariate Gaussian generative model cancharacterize the community patterns via determining thememberships of the elements in the covariance matrix, butit is still an unrealistic benchmark. In the future, we will in-tegrate the clustering method into the dynamic causal mod-elling [59, 60] to simulate more biologically realistic syntheticdata to validate the algorithm.There are still some limitations of the MCMC allocationsampler [2, 3] which we use to infer the latent label vec-tors. When Markov chains are generated by the MCMCalgorithm, the latent label vectors typically get stuck in lo-cal modes. This is in part because the Gibbs moves in theallocation sampler only update one element of the latentlabel vector at a time. Although the M3 move (see Supple-mentary Section 7 for details on the M3 move) updatesmultiple elements of the latent label vector, the update isconditional on the probability ratio of a single reassignment,which results in similar problems to the Gibbs move. Im-proving the MCMC allocation sampler so that it can jumpbetween diﬀerent local modes, without changing the value of K , is a topic worth exploring. Currently, we use an MCMCsampler with a Gibbs move and an M3 move for local in-ference as well, keeping K constant. In future work, wewill extend the sampler using an absorption/ejection move,11hich is capable of sampling K along with latent labels di-rectly from the posterior distribution.The label switching phenomenon (see SupplementarySection 2 ) does not happen frequently if the chain is stuckin a local mode. However, the estimated labels in the latentlabel vector do switch in some experiments. To correct forlabel switching, we permute the labels in a post-processingstage.In this paper, we treat the group-averaged adjacency ma-trix as an observation in local inference, which neglects vari-ation between subjects [61]. In the future, we propose to usehierarchical Bayesian modeling to estimate the communityarchitecture at the group-level. In the local inference, we willmodel the individual adjacency matrix using the latent blockmodel, and infer the number of communities along with thelatent label vectors via an absorption/ejection strategy. Atthe group-level, we will model the estimated number of com-munities of the subjects using a Poisson-Gamma conjugatepair and model the estimated latent label vectors using aCategorical-Dirichlet pair. The posterior distribution of thenumber of communities will be modeled using a Gamma dis-tribution and the posterior distribution of the latent labelvector will be modelled using a Dirichlet distribution. Theestimated rate of the Poisson posterior distribution and theestimated label assignment probability matrix of the Dirich-let posterior distribution will characterize the brain networksat the group-level.The change-point detection method described in this pa-per can be applied to locate the relatively static brain statesoccurring in block designed task fMRI data. In future work,we aim to apply the method to explore the dynamic charac-teristics of event-related task fMRI, where applying a slid-ing window approach may be diﬃcult, as the changes of thestates will be the pulses. We will be also interested in apply-ing a change-point detection algorithm to resting-state fMRIdata, which is also challenging given there is no stimuli tim-ing available and there is relatively less distinct switching ofbrain states.

Methods

Working memory task fMRI data processing

WM-tfMRI paradigms

The original experiment involved a version of an N-back task, usedto assess working memory/cognitive control. In the working memorytask, each block of tasks consisted of trials with pictures of faces, places,tools and body parts. A speciﬁc stimulus type was presented in eachblock within each run. In 2-back blocks, the subjects judged whetherthe current stimulus is the same as the stimulus previously presented“two back”. In 0-back blocks, the subjects were given a target cue atthe beginning of each task block, and judged whether any stimulusduring that block is the same as the target cue. There were 405 frames(with 0.72 s repetition time - TR) in the time course with four blocksof 2-back working memory tasks (each for 25 s), four blocks of 0-backworking memory tasks (each for 25 s) and four ﬁxation blocks (eachfor 15 s). tfMRI data acquisition

The whole brain echo-planar imaging (EPI) was acquired with a 32channel head coil on a modiﬁed 3T Siemens Skyra with TR = 0.72 s,TE = 33.1 ms, ﬂip angle = 52 degrees, BW = 2290 Hz/Px, in-plane FOV = × mm, 72 slices with isotropic voxels of 2 mm witha multi-band acceleration factor of 8. Two runs of the tfMRI wereacquired (one right to left, the other left to right). tfMRI data preprocessing The tfMRI data in HCP are minimally preprocessed including gradi-ent unwarping, motion correction, ﬁeldmap-based EPI distortion cor-rection, brain-boundary-based registration of EPI to structural T1-weighted scan, non-linear (FNIRT) registration into MNI152 space,and grand-mean intensity normalization. The data analysis pipelineis based on FSL (FMRIB’s Software Library) [47]. Further smoothingprocessing is conducted by Volume-based analysis and Grayordinates-based analysis, the details of which are illustrated in the correspondingsections of [46].

GLM analysis

The general linear model (GLM) analysis in this work includes 1st-level(individual scan run), 2nd-level (combining multiple scan runs for anindividual participant) and 3rd-level (group analysis across multipleparticipants) analyses [7, 8]. At 1st-level, ﬁxed eﬀects analyses are con-ducted to estimate the average eﬀect size of runs within sessions, wherethe variation only contains the within-subject variance. At 2nd-level,we also use ﬁxed eﬀects analysis, averaging the two sessions within theindividuals. At 3rd-level, mixed eﬀects analyses are conducted, withthe subject eﬀect size considered to be random. The estimated meaneﬀect size is across the population and the between subject variance iscontained in the group level of GLM. We can set up diﬀerent contraststo compare the activation with respect to the memory load or stimulustype.

Time series extraction

We created spheres of binary masks with radius 6 mm (the centerof each sphere corresponded to the coordinates of locally maximum zstatistics, and the voxel locations of the centers were transferred fromMNI coordinates in fsleyes) and extracted the eigen time series of 35regions of interest from the 4-D functional images. We obtained 100sets of time series from 100 unrelated subjects using the same masks.

The framework of Bayesian change-point de-tection

An overview of the Bayesian change-point detection framework isshown in Fig. 1a. We consider a collection of N nodes { v , · · · , v N } representing brain regions for a single subject, and suppose thatwe observe a collection of N time series Y ∈ (cid:60) N × T where Y =( y , y , · · · , y T ) , and T is the number of time points. Diﬀerent back-ground colors represent diﬀerent latent network community architec-tures. The nodes in the networks are assumed to be clustered intocommunities and the diﬀerent colors of the nodes represent the diﬀer-ent community memberships. A more detailed example of changes innetwork architectures with 16 nodes is shown in Fig. 1b, where thecommunity memberships are deﬁned as a latent label vector z and K isthe number of communities. A transition or change-point is deﬁned asa time point at which the community structure changes. Correlationsbetween time series suggest interactions between the correspondingbrain regions; we therefore ﬁrst process the time series to construct asequence of graphs in which temporal correlations between time seriesare represented by an edge connecting the corresponding nodes.We apply a sliding window of width W (even numbered) to the timeseries as shown in Fig. 1a. The sliding windows overlap and the centersof the windows are located at consecutive time points. Change-pointsmay occur only at times t ∈ { W +1 , · · · , T − W } where W is a marginsize used to avoid computational and statistical complications. The ad-vantage of using overlapping windows is that we can potentially detecttransitions in network architecture at any time during the time course(except the margin area). For each time point t ∈ { W +1 , · · · , T − W } ,we deﬁne Y t = { y t − W , · · · , y t , , · · · , y t + W − } as the data in thesliding window at time t and calculate a sample correlation matrix x t within this window. We interpret this correlation matrix as a weightedadjacency matrix. This means for each t , we obtain a sample adjacency atrix x t . Subsequently, instead of time series Y , we use the sampleadjacency matrix x t as the realized observation at time t .Fig. 1c provides a schematic illustrating the posterior predictivemodel ﬁtness assessment. Speciﬁcally, we propose to use the Gaussianlatent block model [3] to quantify the likelihood of a network, and theMCMC allocation sampler (with the Gibbs move and the M3 move)[2, 3] to infer a latent label vector z from a collapsed posterior distribu-tion p ( z | x , K ) derived from this model. The model parameters π foreach block are sampled from a posterior distribution p ( π | x , z ) , condi-tional on the sampled latent label vector z . The proposed model ﬁtnessprocedure draws parameters (both latent label vectors and model pa-rameters) from posterior distributions and uses them to generate areplicated adjacency matrix x rep . It then calculates a disagreementindex to quantify the diﬀerence between the replicated adjacency ma-trix x rep and realized adjacency matrix x . To evaluate model ﬁtness,we use the parameter-dependent statistic PPDI by averaging the dis-agreement index. The latent block model

The latent block model (LBM) [3] is a random process generating net-works on a ﬁxed number of nodes N . The model has an integer pa-rameter K , representing the number of communities. Identifying asuitable value of K is a model ﬁtting problem that will be discussedin a later section; here we assume K is given. A schematic of a la-tent block model is shown in the brown box on the right side of Fig.1c. A deﬁning feature of the model is that nodes are partitioned into K communities, with interactions between nodes in the same commu-nity having a diﬀerent (usually higher) probability than interactionsbetween nodes in diﬀerent communities. The latent block model ﬁrstassigns the N nodes into the K communities resulting in K blocks,which are symmetric, then generates edges with a probability deter-mined by the community memberships. The diagonal blocks representthe connectivity within the communities and the oﬀ-diagonal blocksrepresent the connectivity between diﬀerent communities.In this paper, we consider the edges between nodes to be weighted,so the model parameter matrix π consists of the means and variancesthat determine the connectivity in the blocks. We treat the correlationmatrix as an observation, thus preserving more information from theBOLD time series than using binary edges. Given a sampled z we candraw π from the posterior directly. For mathematical illustration ofthe latent block model, see Supplementary section 3.1 and 3.2 .Methods for sampling the latent vector z will be discussed in latersections. Sampling from the posterior

The posterior predictive method we outline below involves samplingparameters from the posterior distribution. The sampled parametersare the latent label vector z and model parameter matrix π . There areseveral methods for estimating the latent labels and model parametersof a latent block model described in the literature. One method eval-uated the model parameters by point estimation but considered thelatent labels in z as having a distribution [64], making this approachsimilar to an EM algorithm. Another method used point estimation forboth the model parameters and latent labels [65]. We sample the la-tent label vector z from the collapsed posterior p ( z | x , K ) (see detailedderivation of p ( z | x , K ) in Supplementary Section 3.3 ). We use theMarkov chain Monte Carlo (MCMC) [6] method to sample the latentlabel vector from the posterior using Gibbs moves and M3 moves [2]for updating z . The details of the MCMC allocation sampler and thecomputational complexity are illustrated in Supplementary Section3.4 . After sampling the latent label vector z , we then separately sam-ple π from the density p ( π | x , z ) (See Supplementary section 3.2 for the details).

Model ﬁtting

Global ﬁtting

Global ﬁtting uses a model with a constant number of communities K to ﬁt consecutive individual adjacency matrices within a slidingwindow, for all time frames. For global ﬁtting, we consider K in ourlatent block model to be ﬁxed over the time course. We detect the change-points based on Bayesian model comparison using posteriorpredictive discrepancy, which does not determine whether the model is‘true’ or not, but rather quantiﬁes the preference for the model giventhe data. One can imagine the model as a moving ruler under thesliding window, and the observation at each time step as the objectto be measured. The discrepancy increases signiﬁcantly if there is achange-point located within the window. Although K is constant inglobal ﬁtting, diﬀerent values of K can be used if we select diﬀerentmodels. The evaluation of K can be considered as a Bayesian modelcomparison problem. We repeat the inference with diﬀerent values of K and compare the change-point detection performance to identify anappropriate value for K . Local ﬁtting

Local ﬁtting involves ﬁrst selecting a model (i.e., choosing a value of K ) that best ﬁts the group averaged adjacency matrix for a discretebrain state. Subsequently, the data between change-points is usedto estimate the community membership that constitutes that discretebrain state. We treat K as constant for this local inference . The num-ber of communities K can potentially be inferred using the absorp-tion/ejection move [2] in the allocation sampler, an innovation thatwill be explored in future research. Posterior predictive discrepancy

Given inferred values of z and π under the model K , one can drawa replicated adjacency matrix x rep from the predictive distribution P ( x rep | z , π , K ) as shown in Fig. 1c. Note that the realized adjacencymatrix (i.e., an observation) and the replicated adjacency matrix areconditionally independent, P ( x , x rep | z , π , K ) = P ( x rep | z , π , K ) P ( x | z , π , K ) . (0.1)Multiplying both sides of this equality by P ( z , π | x , K ) /P ( x | z , π , K ) gives P ( x rep , z , π | x , K ) = P ( x rep | z , π , K ) P ( z , π | x , K ) . (0.2)Here we use a replicated adjacency matrix in the context of pos-terior predictive assessment [67] to evaluate the ﬁtness of a positedlatent block model to a realized adjacency matrix. We generate areplicated adjacency matrix by ﬁrst drawing samples ( z , π ) from thejoint posterior P ( z , π | x , K ) . Speciﬁcally, we sample the latent labelvector z from p ( z | x , K ) and model parameter π from p ( π | x , z ) andthen draw a replicated adjacency matrix from P ( x rep | z , π , K ) . Wecompute a discrepancy function to assess the averaged diﬀerence be-tween the replicated adjacency matrix x rep and the realized adjacencymatrix x , as a measure of model ﬁtness.In [67], the χ function is used as the discrepancy measure, wherethe observation is considered as a vector. However, in the latent blockmodel, the observation is a weighted adjacency matrix and the sizes ofthe sub-matrices can vary. In this paper, we propose a new discrepancyindex to compare adjacency matrices x rep and x . We deﬁne a disagree-ment index to evaluate the diﬀerence between the realized adjacencymatrix and the replicated adjacency matrix. This disagreement indexis denoted γ ( x rep ; x ) and can be considered as a parameter-dependentstatistic. In mathematical notation, the disagreement index γ is de-ﬁned as γ ( x rep ; x ) = (cid:80) Ni =1 ,j =1 | x ij − x repij | N , (0.3)For the evaluation of model ﬁtness, we generate S replicated adjacencymatrices and deﬁne the posterior predictive discrepancy index (PPDI) γ as follows. γ = (cid:80) Si =1 γ ( x rep i ; x ) S . (0.4)The computational cost of the posterior predictive discrepancy pro-cedure in our method depends mainly on two aspects. The ﬁrst isthe iterated Gibbs and M3 moves used to update the latent variablevectors. The computational cost of these moves has been discussed inprevious sections. The second aspect is the number of replications S needed for the predictive process. Posterior predictive assessment is notsensitive to the replication number S , but S linearly impacts the com-putational cost, that is, the computational complexity of model ﬁtnessassessment is O ( S ) . There is a natural trade-oﬀ between increasingthe replication number and reducing the computational speed. umulative discrepancy energy Our proposed strategy to detect network community change-points isto assess the ﬁtness of a latent block model by computing the posteriorpredictive discrepancy index (PPDI) γ t for each t ∈ { W + 1 , · · · , T − W } . The key insight here is that the ﬁtness of the model is relativelyworse when there is a change-point within the window used to compute x t . If there is a change-point within the window, the data observed inthe left and right segments are generated by diﬀerent network architec-tures, resulting in poor model ﬁt and a correspondingly high posteriorpredictive discrepancy index.In practice, we ﬁnd that the PPDI ﬂuctuates severely. To identifythe most plausible position of a change-point, we use another windowwith window size W s to accumulate the PPDI time series. We obtainthe cumulative discrepancy energy (CDE) E t , given by E t = t + Ws − (cid:88) i = t − Ws γ i . (0.5)We take the locations of change-points to be the local maxima of thecumulative discrepancy energy, where those maxima rise suﬃcientlyhigh above the surrounding sequence. The change-point detection al-gorithm is summarized in Supplementary Section 8 .Note that the posterior predictive discrepancy index and cumulativediscrepancy energy for change-point detection are calculated under theconditions of global ﬁtting. For group analysis, we average CDEs acrosssubjects to obtain the Group CDE. After discarding false positives, thechange-points are taken to be the local maxima and the discrete statesare inferred at the local minima.

Local inference

We estimate the community structure of brain states via local infer-ence. For local inference, we ﬁrst calculate the group averaged adja-cency matrix of 100 subjects using the data between two estimatedchange-points and treat this as an observation of each discrete brainstate, then we use local ﬁtting to select a value K using the latentblock model for Bayesian estimation of community structure for eachbrain state. Code availability

The code for GLM analysis (Shell script), Bayesian change-point de-tection (MATLAB), and brain network visualization (MATLAB, Perl)is available at: https://github.com/LingbinBian/BCPD1.0.

References [1] Morten L. Kringelbach and Gustavo Deco. Brain states and tran-sitions: Insights from computational neuroscience.

Cell Reports ,32(10):108128, 2020.[2] Daniel J. Lurie, Daniel Kessler, Danielle S. Bassett, Richard F.Betzel, Michael Breakspear, Shella Kheilholz, Aaron Kucyi,Raphaël Liégeois, Martin A. Lindquist, Anthony Randal McIn-tosh, Russell A. Poldrack, James M. Shine, William HedleyThompson, Natalia Z. Bielczyk, Linda Douw, Dominik Kraft,Robyn L. Miller, Muthuraman Muthuraman, Lorenzo Pasquini,Adeel Razi, Diego Vidaurre, Hua Xie, and Vince D. Calhoun.Questions and controversies in the study of time-varying func-tional connectivity in resting fMRI.

Network Neuroscience ,4(1):30–69, 2020.[3] Adeel Razi and Karl J. Friston. The connected brain: Causality,models, and intrinsic dynamics.

IEEE Signal Processing Maga-zine , 33(3):14–35, 2016.[4] Adeel Razi, Mohamed L. Seghier, Yuan Zhou, Peter McColgan,Peter Zeidman, Hae-Jeong Park, Olaf Sporns, Geraint Rees, andKarl J. Friston. Large-scale dcms for resting-state fmri.

NetworkNeuroscience , 1(3):222–241, 2017. [5] Adeel Razi, Joshua Kahan, Geraint Rees, and Karl J. Friston.Construct validation of a dcm for resting state fmri.

NeuroImage ,106:1 – 14, 2015.[6] Karl J. Friston, Joshua Kahan, Bharat Biswal, and Adeel Razi.A DCM for resting state fMRI.

NeuroImage , 94:396 – 407, 2014.[7] Karl J. Friston, Erik D. Fagerholm, Tahereh S. Zarghami, ThomasParr, Ines Hipalito, Loic Magrou, and Adeel Razi. Parcels andparticles: Markov blankets in the brain.

Network Neuroscience ,0(ja):1–76, 2021.[8] Elena A. Allen, Eswar Damaraju, Sergey M. Plis, Erik B. Erhardt,Tom Eichele, and Vince D. Calhoun. Tracking whole-brain con-nectivity dynamics in the resting state.

Cerebral Cortex , 24:663–676, 2014.[9] R. Matthew Hutchison, Thilo Womelsdorf, Elena A. Allen, Pe-ter A. Bandettini, Vince D. Calhoun, Maurizio Corbetta, Stefa-nia Della Penna, Jeﬀ H. Duyn, Gary H. Glover, Javier Gonzalez-castillo, Daniel A. Handwerker, Shella Keilholz, Vesa Kiviniemi,David A. Leopold, Francesco De Pasquale, Olaf Sporns, Mar-tin Walter, and Catie Chang. Dynamic functional connectivity: Promise , issues , and interpretations.

NeuroImage , 80:360–378,2013.[10] Vince D. Calhoun, Robyn Miller, Godfrey Pearlson, and TulayAdali. The chronnectome: Time-varying connectivity networksas the next frontier in fMRI data discovery.

Neuron , 84(2):262–274, 2014.[11] Jonathan D. Power, Mark Plitt, Timothy O. Laumann, and AlexMartin. Sources and implications of whole-brain fMRI signals inhumans.

NeuroImage , 146(September 2016):609–625, 2017.[12] Linden Parkes, Ben Fulcher, Murat Yücel, and Alex Fornito. Anevaluation of the eﬃcacy, reliability, and sensitivity of motion cor-rection strategies for resting-state functional MRI.

NeuroImage ,171(December 2017):415–436, 2018.[13] Kevin M. Aquino, Ben D. Fulcher, Linden Parkes, KristinaSabaroedin, and Alex Fornito. Identifying and removingwidespread signal deﬂections from fMRI data: Rethinking theglobal signal regression problem.

NeuroImage , 212(Febru-ary):116614, 2020.[14] Johan N.van der Meer, Michael Breakspear, Luke J. Chang,Saurabh Sonkusare, and Luca Cocchi. Movie viewing elicitsrich and reliable brain state dynamics.

Nature Communications ,11(1):1–14, 2020.[15] Danielle S. Bassett, Nicholas F. Wymbs, Mason A. Porter, Pe-ter J. Mucha, Jean M. Carlson, and Scott T. Grafton. Dynamicreconﬁguration of human brain networks during learning.

PNAS ,108(18):7641–7646, 2011.[16] Ivor Cribben, Ragnheidur Haraldsdottir, Lauren Y. Atlas, Tor D.Wager, and Martin A. Lindquist. Dynamic connectivity regres-sion: determining state-related changes in brain connectivity.

NeuroImage , 61:720–907, 2012.[17] Jalil Taghia, Weidong Cai, Srikanth Ryali, John Kochalka,Jonathan Nicholas, Tianwen Chen, and Vinod Menon. Uncov-ering hidden brain state dynamics that regulate performance anddecision-making during cognition.

Nature Communications , 9(1),2018.[18] Ulrike von Luxburg. A tutorial on spectral clustering.

Statisticsand Computing , 17:359–416, 2007.[19] Ivor Cribben and Yi Yu. Estimating whole-brain dynamics byusing spectral clustering.

Journal of the Royal Statistical Society.Series C (Applied Statistics) , 66:607–627, 2017.[20] Anna Louise Schröder and Hernando Ombao. FreSpeD:frequency-speciﬁc change-point detection in epileptic seizuremulti-channel EEG data.

Journal of the American Statistical As-sociation , 114(525):115–128, 2019.

21] Klaus Frick, Axel Munk, and Hannes Sieling. Multiscale changepoint inference (with discussion).

Journal of the Royal StatisticalSociety. Series B (Statistical Methodology) , 76:495–580, 2014.[22] Haeran Cho and Piotr Fryzlewicz. Multiple-change-point detec-tion for high dimensional time series via sparsiﬁed binary segmen-tation.

Journal of the Royal Statistical Society. Series B (Statis-tical Methodology) , 77:475–507, 2015.[23] Tengyao Wang and Richard J. Samworth. High-dimensionalchange point estimation via sparse projection.

Journal of theRoyal Statistical Society. Series B (Methodology) , 80(1):57–83,2017.[24] Hae-Jeong Park, Karl J Friston, Chongwon Pae, Bumhee Park,and Adeel Razi. Dynamic eﬀective connectivity in resting statefMRI.

NeuroImage , 180(November 2017):594–608, 2018.[25] Catie Chang and Gary H. Glover. Time-frequency dynamics ofresting-state brain connectivity measured with fMRI.

NeuroIm-age , 50:81–98, 2010.[26] Daniel A. Handwerker, Vinai Roopchansingh, Javier Gonzalez-Castillo, and Peter A. Bandettini. Periodic changes in fMRI con-nectivity.

NeuroImage , 63:1712–1719, 2012.[27] Andrew Zalesky, Alex Fornito, Luca Cocchi, Leonardo L. Gollo,and Michael Breakspear. Time-resolved resting-state brain net-works.

PNAS , 111(28):10341–10346, 2014.[28] Ricardo Pio Monti, Peter Hellyer, David Sharp, Robert Leech,Christoforos Anagnostopoulos, and Giovanni Montana. Estimat-ing time-varying brain connectivity networks from functional MRItime series.

NeuroImage , 103:427–443, 2014.[29] Seok-Oh Jeong, Chongwon Pae, and Hae-Jeong Park.Connectivity-based change point detection for large-size func-tional networks.

NeuroImage , 143:353–363, 2016.[30] Diego Vidaurre, Andrew J. Quinn, Adam P. Baker, David Dupret,Alvaro Tejero-Cantero, and Mark W. Woolrich. Spectrally re-solved fast transient brain states in electrophysiological data.

Neu-roImage , 126:81–95, 2016.[31] Diego Vidaurre, Stephen M. Smith, and Mark W. Woolrich. Brainnetwork dynamics are hierarchically organized in time.

PNAS ,114(48):12827–12832, 2017.[32] Diego Vidaurre, Romesh Abeysuriya, Robert Becker, Andrew J.Quinn, Fidel Alfaro-Almagro, Stephen M. Smith, and Mark W.Woolrich. Discovering dynamic brain networks from big data inrest and task.

NeuroImage , 180(June 2017):646–656, 2018.[33] Y. X. Rachel Wang and Peter J. Bickel. Likelihood-based modelselection for stochastic block models.

The Annals of Statistics ,45(2):500–528, 2017.[34] Jiashun Jin. Fast community detection by SCORE.

The Annalsof Statistics , 43(1):57–89, 2015.[35] M. E. J. Newman. Modularity and community structure in net-works.

PNAS , 103(23):8577–8582, 2006.[36] Danielle S. Bassett, Mason A. Porter, Nicholas F. Wymbs,Scott T. Grafton, Jean M. Carlson, and Peter J. Mucha. Robustdetection of dynamic community structure in networks.

CHAOS ,23:13142, 2013.[37] Lucy F. Robinson, Lauren Y. Atlas, and Tor D. Wager. Dy-namic functional connectivity using state-based dynamic commu-nity structure: Method and application to opioid analgesia.

Neu-roImage , 108:274–291, 2015.[38] Chee-Ming Ting, S. Balqis Samdin, Meini Tang, and HernandoOmbao. Detecting Dynamic Community Structure in FunctionalBrain Networks Across Individuals: A Multilayer Approach. 2020. [39] Christopher Aicher, Abigail Z. Jacobs, and Aaron Clauset. Learn-ing latent block structure in weighted networks.

Journal of Com-plex Networks , 3(2):221–248, 2015.[40] Joshua Faskowitz, Xiaoran Yan, Xi-nian Zuo, and Olaf Sporns.Weighted stochastic block models of the human connectome acrossthe life span.

Scientiﬁc Reports , 8:1–16, 2018.[41] Richard F. Betzel, John D. Medaglia, and Danielle S. Bassett. Di-versity of meso-scale architecture in human and non-human con-nectomes.

Nature Communications , 9(1), 2018.[42] Matthew D. Hoﬀman, David M. Blei, Chong Wang, and JohnPaisley. Stochastic variational inference.

Journal of MachineLearning Research , 14:1303–1347, 2013.[43] David M. Blei, Alp Kucukelbir, and Jon D. McAuliﬀe. Variationalinference: A review for statisticians.

Journal of the AmericanStatistical Association , 112(518):859–877, 2017.[44] Agostino Nobile and Alastair T. Fearnside. Bayesian ﬁnite mix-tures with an unknown number of components: The allocationsampler.

Statistics and Computing , 17:147–162, 2007.[45] Jason Wyse and Nial Friel. Block clustering with collapsed latentblock models.

Statistics and Computing , 22:415–428, 2012.[46] Deanna M. Barch, Gregory C. Burgess, Michael P. Harms,Steven E. Petersen, Bradley L. Schlaggar, Maurizio Corbetta,Matthew F. Glasser, Sandra Curtiss, Sachin Dixit, Cindy Feldt,Dan Nolan, Edward Bryant, Tucker Hartley, Owen Footer,James M. Bjork, Russ Poldrack, Steve Smith, Heidi Johnsen-Berg, Abraham Z. Snyder, DDavid C. Van Essen, and for theWU-Minn HCP Consortium. Function in the human connectome:Task-fMRI and individual diﬀerences in behavior.

NeuroImage ,80:169–189, 2013.[47] Stephen M. Smith, Mark Jenkinson, Mark W. Woolrich, Chris-tian F. Beckmann, Timothy E.J. Behrens, Heidi Johansen-Berg,Peter R. Bannister, Marilena De Luca, Ivana Drobnjak, David E.Flitney, Rami K. Niazy, James Saunders, John Vickers, YongyueZhang, Nicola De Stefano, J. Michael Brady, and Paul M.Matthews. Advances in functional and structural MR image anal-ysis and implementation as FSL.

NeuroImage , 23:S208–S219,2004.[48] Matthew Stephens. Dealing with label switching in mixture mod-els.

Journal of the Royal Statistical Society. Series B (StatisticalMethodology) , 62(4):795–809, 2000.[49] Mingrui Xia, Jinhui Wang, and Yong He. BrainNet viewer: Anetwork visualization tool for human brain connectomics.

PLoSONE , 8(7), 2013.[50] Martin Krzywinski, Jacqueline Schein, İnanç Birol, Joseph Con-nors, Randy Gascoyne, Doug Horsman, Steven J Jones, andMarco A Marra. Circos : An information aesthetic for compara-tive genomics.

Genome Research , 19(604):1639–1645, 2009.[51] Derek Evan Nee, Joshua W. Brown, Mary K. Askren, Marc G.Berman, Emre Demiralp, Adam Krawitz, and John Jonides. Ameta-Analysis of executive components of working memory.

Cere-bral Cortex , 23(2):264–282, 2013.[52] Mohamed L. Seghier. The angular gyrus: Multiple functions andmultiple subdivisions.

Neuroscientist , 19(1):43–61, 2013.[53] Jacqueline Gottlieb. From thought to action: The parietal cortexas a bridge between perception, action, and cognition.

Neuron ,53(1):9–16, 2007.[54] Victoria Singh-Curry and Masud Husain. The functional roleof the inferior parietal lobe in the dorsal and ventral stream di-chotomy.

Neuropsychologia , 47(6):1434–1448, 2009.[55] Robert E. Kass and Adrian E. Raftery. Bayes factors.

Journal ofthe American Statistical Association , 90(430):773–795, 1995.

56] Mike West. Bayesian Model Monitoring.

Journal of the RoyalStatistical Society. Series B (Methodological) , 48(1):70–78, 1986.[57] Andrew A. Neath and Joseph E. Cavanaugh. The Bayesian infor-mation criterion: Background, derivation, and applications.

WileyInterdisciplinary Reviews: Computational Statistics , 4(2):199–203, 2012.[58] Solomon Kullback and Richard Leibler. On information and suﬃ-ciency.

The Annals of Mathematical Statistics , 22(1):79–86, 1951.[59] K.J. Friston, L. Harrison, and W. Penny. Dynamic Causal Mod-elling.

Human Brain Function: Second Edition , 0:1063–1090,2003.[60] Karl J. Friston, Katrin H. Preller, Chris Mathys, Hayriye Cagnan,Jakob Heinzle, Adeel Razi, and Peter Zeidman. Dynamic causalmodelling revisited.

NeuroImage , 199:730 – 744, 2019.[61] Karl J. Friston, Vladimir Litvak, Ashwini Oswal, Adeel Razi,Klaas E. Stephan, Bernadette C.M. van Wijk, Gabriel Ziegler, andPeter Zeidman. Bayesian model reduction and empirical bayes forgroup (dcm) studies.

NeuroImage , 128:413 – 431, 2016.[62] Mark W. Woolrich, Brian D. Ripley, Michael Brady, andStephen M. Smith. Temporal autocorrelation in univariate lin-ear modeling of FMRI data.

NeuroImage , 14(6):1370–1386, 2001.[63] Mark W. Woolrich, Timothy E.J. Behrens, Christian F. Beck-mann, Mark Jenkinson, and Stephen M. Smith. Multilevel linearmodelling for FMRI group analysis using Bayesian inference.

Neu-roImage , 21(4):1732–1747, 2004.[64] J.-J. Daudin, F. Picard, and S. Robin. A mixture model for ran-dom graphs.

Statistics and Computing , 18:173–183, 2008.[65] Hugo Zanghi, Christophe Ambroise, and Vincent Miele. Fast on-line graph clustering via Erdös-Rényi mixture.

Pattern Recogni-tion , 41:3592–3599, 2008.[66] W.K. Hastings. Monte Carlo sampling methods using Markovchains and their applications.

Biometrika , 57(1):97–109, 2016.[67] Andrew Gelman, Xiao-Li Meng, and Hal Stern. Posterior predic-tive assessment of model ﬁtness via realized discrepancies.

Statis-tica Sinica , 6(4):733–760, 1996. upplementary information ————————————————————————————————————————————— To validate our Bayesian change-point detection algorithm, we use the multivariate Gaussian generativemodel to simulate the synthetic data. Speciﬁcally, we generate D segments of Gaussian time seriesfrom D diﬀerent network architectures. The synthetic data contains the ground truth of D − change-points over the time course. The positions of the true change-points are denoted as a row vector p = [ p , · · · , p D − ] . Within each of D segments, we suppose nodes are assigned to K true communities,the value of which diﬀers in diﬀerent segments. The true number of communities in the segments can bedenoted as a vector K true = [ K true , · · · , K trueD ] . We set the label vectors that determine the form of thecovariance matrices in the generative model to be { z , z , · · · , z D } . These label vectors are generatedusing the Dirichlet-Categorical conjugate pair. The component weights { r , r , · · · , r D } are ﬁrst drawnfrom a uniform distribution on the K true simplex and then nodes are assigned to the communities bydrawing from the corresponding Categorical distributions. Time series data in (cid:60) N are then simulatedfrom Y = f ( z , a, b ) + (cid:15) (1.1)for t = 1 , · · · , T by drawing f ( z , a, b ) ∼ N ( , Σ ( z , a, b )) , with Σ ij =  , if i = ja, if i (cid:54) = j and z i = z j b, if i (cid:54) = j and z i (cid:54) = z j (1.2)where a ∼ U (0 . , and b ∼ U (0 , . are uniformly distributed, and (cid:15) ∼ N ( , σ I ) is the additiveGaussian noise. The resulting covariance matrices for D segments are denoted as { Σ , Σ , · · · , Σ D } .The simulated data Y ∈ (cid:60) N × T can be separated into D segments which are { Y , Y , · · · , Y D } .For validation, we ﬁrst generate 100 instances (as virtual subjects) of synthetic multivariate time seriesfor a network with N = 35 nodes and T = 180 time points to imitate the scenario of real data. Weset the true change-points at { , , , , , } and the numbers of communities in the segmentsto be { , , , , , , } . Here we deﬁne the signal-to-noise ratio (SNR) as Σ ii σ , and set diﬀerent valuesof σ to control SNR ( σ = 0 . for SNR = 10dB, σ = 0 . for SNR = 5dB, σ = 1 for SNR =0dB, and σ = 1 . for SNR = -5dB). For global ﬁtting, the posterior prediction replication numberis set as S = 50 for all of our experiments. For local inference, we draw S s = 200 samples fromthe posterior densities for both latent label vectors and model parameters. We set the prior to beNIG ( ξ, κ σ kl , ν/ , ρ/ with ξ = 0 , κ = 1 , ν = 3 and ρ = 0 . , which is non-informative. For the latent block model, we set α k = 1 with { k = 1 , · · · , K } , and constant values of ξ , κ , ν and ρ for all of the blocks kl , so the prior is symmetric with respect to permutations of community labels.Permutations of community labels do not change the likelihood, which means the distributions withrespect to blocks are not identiﬁable. Therefore, the posterior is also invariant to permutations ofcommunity labels. In the Markov chain, the labels of the latent label vector switch occasionally: thiseﬀect is known as the label switching phenomenon [1, 2, 3]. For global ﬁtting, label switching does notaﬀect the results of posterior predictive discrepancy. However, for local inference, we need to assign thelabels to the communities unequivocally to estimate the memberships of the nodes. e deﬁne a distance indicating the diﬀerence of coordinates between two latent label vectors z and z (cid:48) , D ( z , z (cid:48) ) = N (cid:88) i =1 I ( z i (cid:54) = z (cid:48) i ) , (2.1)where I is the indicator function. We deﬁne σ = { σ (1) , · · · , σ ( k ) , · · · , σ ( K ) } (2.2)as a permutation of a labelling { , · · · , k, · · · , K } . Let Q = { z j ( σ j ) , j = 1 , · · · , J } be a collection oflatent label vectors with respect to a sequence of permutations { σ j , j = 1 , · · · , J } . We want to minimizethe sum of all distances between the vectors J − (cid:88) j =1 J (cid:88) l = j +1 D ( z j ( σ j ) , z l ( σ l )) . (2.3)The solution of this minimization can be considered as a sequential optimization problem of the squareassignment. For each vector z j , if the vectors that have already been processed (relabelled) up to j − are { z t , t = 1 , · · · , j − } , we deﬁne the element of a cost matrix C ( k , k ) = j − (cid:88) t =1 N (cid:88) i =1 D ( z ti (cid:54) = k , z ji = k ) . (2.4)We use the square assignment algorithm [4] returning a permutation σ j which minimizes the total cost (cid:80) Kk =1 C ( k, σ ( k )) for each z j . Finally, we permute the labels in the vector z j according to σ j . Mathematically, we denote the community memberships (also called the latent labels) of the nodes asa vector z = ( z , . . . , z N ) such that z i ∈ { , · · · , K } denotes the community containing node i . Each z i independently follows the categorical (one-trial multinomial) distribution: z i ∼ Categorical (1; r = { r , · · · , r K } ) , (3.1)where r k is the probability of a node being assigned to community k and (cid:80) Kk =1 r k = 1 . The categoricalprobability can be expressed using the indicator function I k ( z i ) as p ( z i | r , K ) = K (cid:89) k =1 r I k ( z i ) k , where I k ( z i ) = (cid:40) , if z i = k , if z i (cid:54) = k . (3.2)This implies that the N dimensional vector z is generated with probability p ( z | r , K ) = K (cid:89) k =1 r m k ( z ) k , (3.3)where m k ( z ) = (cid:80) Ni =1 I k ( z i ) . The latent allocation parameter vector r = ( r , · · · , r K ) is assumed to havea K -dimensional Dirichlet prior with density p ( r | K ) = N ( α ) K (cid:89) k =1 r α k − k , (3.4)where the normalization factor is N ( α ) = Γ( (cid:80) Kk =1 α k ) (cid:81) Kk =1 Γ( α k ) . In this work we suppose α k = 1 for k = 1 , . . . , K ,so that the prior for r is uniform on the K -simplex. Edges between nodes are represented using an djacency matrix x ∈ (cid:60) N × N . We deﬁne a block x kl comprised of weighted edges connecting the nodesin community k to the nodes in community l . The likelihood of the latent block model can be expressedas p ( x | π , z , K ) = (cid:89) k.l p ( x kl | π kl , z , K ) , (3.5)and the likelihood in speciﬁc blocks can be expanded as p ( x kl | π kl , z , K ) = (cid:89) { i | z i = k } (cid:89) { j | z j = l } p ( x ij | π kl , z , K ) , (3.6)where π = { π kl } is a K × K model parameter matrix. The block model parameter in block kl is π kl = ( µ kl , σ kl ) and each x ij in the block kl follows a Gaussiandistribution conditional on z under the model K , that is x ij | π kl , z , K ∼ N ( µ kl , σ kl ) . The parameter vectors π kl = ( µ kl , σ kl ) are assumed to independently follow the conjugate Normal-Inverse-Gamma (NIG) prior π kl ∼ NIG ( ξ, κ σ kl , ν/ , ρ/ . That is, µ kl ∼ N ( ξ, κ σ kl ) and σ kl ∼ IG ( ν/ , ρ/ . The density of the Inverse-Gamma distribution IG ( α, β ) has the general formula p ( x ) = β α Γ( α ) x − ( α +1) e ( − βx ) , where α and β are hyper-parameters.We deﬁne s kl ( x ) to be the sum of the edge weights in the block kl and q kl ( x ) to be the sum of squaresas follows: s kl ( x ) = (cid:88) i : z i = k (cid:88) j : z j = l x ij , (3.7)and q kl ( x ) = (cid:88) i : z i = k (cid:88) j : z j = l x ij . (3.8)We also deﬁne w kl ( z ) = m k ( z ) m l ( z ) to be the number of elements in the block, where m k and m l are the numbers of nodes in community k and l respectively. The prior and the likelihood in the aboveexpression is the NIG-Gaussian conjugate pair. With this conjugate pair, we can calculate the posteriordistribution for each model block, which is also a Normal-Inverse-Gamma distribution µ kl ∼ N ( ξ n , κ n σ kl ) and σ kl ∼ IG ( ν n / , ρ n / , where ν n = ν + w kl , (3.9) κ n = κ w kl κ , (3.10) ξ n = ξ + s kl κ w kl κ , (3.11) ρ n = ξ κ + q kl + ρ − ( ξ + s kl κ ) /κ + w kl . (3.12)Details of the derivation of this NIG ( ξ n , κ n σ kl , ν n / , ρ n / distribution are provided in SupplementarySection 4 . The posterior density of the whole model is a product of such terms for all blocks, as follows. p ( π | x , z ) = (cid:89) k,l p ( π kl | x kl , z ) . (3.13)Given a sampled z we can draw π from the above posterior directly. Methods for sampling the latentvector z will be discussed later in the paper. .3 The collapsed posterior of latent label vector In this model, a change-point corresponds to a change in community architecture i.e., a change inthe latent label vector z and the parameter matrix π . For the sake of computational eﬃciency, it isconvenient to construct the collapsed posterior distribution p ( z | x , K ) . We can obtain the collapsedposterior by integrating out the nuisance parameters [3, 5]. In this section, we discuss the details ofcollapsing the latent block model when the edge weights are continuously valued.Given K , the joint density of x , π , z , and r is p ( x , π , z , r | K ) = p ( z , r | K ) p ( x , π | z ) . (3.14)The parameters r and π can be integrated out (collapsed) to obtain the marginal density p ( x , z | K ) . p ( z , x | K ) = (cid:90) p ( z , r | K ) d r (cid:90) p ( x , π | z ) d π , (3.15)so that the posterior for the block-wise model can be expressed as p ( z | x , K ) ∝ p ( z , x | K ) = (cid:90) p ( z , r | K ) d r (cid:89) k,l (cid:90) p ( x kl , π kl | z ) dπ kl . (3.16)The ﬁrst integral p ( z | K ) = (cid:82) p ( z , r | K ) d r , where the integral is over the K -simplex, can be evaluated asfollows: (cid:90) p ( z , r | K ) d r = Γ( (cid:80) Kk =1 α k )Γ( (cid:80) Kk =1 ( α k + m k ( z )) (3.17) × K (cid:89) k =1 Γ( α k + m k ( z ))Γ( α k ) . (3.18)The details of this derivation are in Supplementary Section 5 below. The integral of the form (cid:82) p ( x kl , π kl | z ) dπ kl can be evaluated as (cid:90) p ( x kl , π kl | z ) dπ kl = ρ ν/ Γ { ( w kl + ν ) / } π w kl / Γ( ν/ w kl κ + 1) / (3.19) × ( − κ ( s kl + ξ/κ ) w kl κ + 1 + ξ κ (3.20) + q kl + ρ ) − ( w kl + ν ) / (3.21)The derivation is in Supplementary Section 6 . We use a Markov chain Monte Carlo (MCMC) method to sample the latent label vector from theposterior with proposal moves p ( z → z ∗ ) similar to those of the allocation sampler [2] to update z . Inthe Metropolis-Hastings algorithm [6], a candidate latent label vector z ∗ is accepted with probability min { , r } , where r = p ( K, z ∗ , x ) p ( z ∗ → z ) p ( K, z , x ) p ( z → z ∗ ) . (3.22)In each iteration of the sampler, we perform either a Gibbs move or an M3 move, with equal probability(0.5) of each. Each Gibbs move updates the latent label vector z by drawing from the collapsed posterior p ( z | x , K ) . At each iteration, one entry z i is randomly selected and updated by drawing from p ( z ∗ i | z − i , x , K ) = 1 C p ( z , · · · , z i − , z ∗ i = k, z i +1 , · · · , z n | x ) , (3.23)where k ∈ { , · · · , K } , z − i represents the elements in z apart from z i and the normalization term C = p ( z − i | x , K ) = K (cid:88) k =1 p ( z , · · · , z i − , z ∗ i = k, z i +1 , · · · , z n | x ) . (3.24) or a Gibbs move within a Metropolis-Hastings sampler, the ratio r always equals one. The computa-tional complexity of a Gibbs move depends on the cost of calculating the probability of the reassignmentof a speciﬁc entry. Each probability takes O ( K + N ) time to calculate. There are K possible reas-signments so that each Gibbs move takes O ( K + KN ) time.The details of the M3 move are provided in Supplementary Section 7 . The computational com-plexity of the M3 move depends on the cost of calculating the ratio of posterior density and proposaldensity. The time cost of calculating this ratio is O ( K + N ) , and calculating the proposal ratio takes O ( N + L ) time, so the M3 move takes O ( K + N + L ) time. Likelihood : The likelihood of the block kl with weighted edges is p ( x kl | π kl , z , K ) = (cid:89) { i | z i = k } (cid:89) { j | z j = l } p ( x ij | µ kl , σ kl , z , K )= (2 πσ kl ) − w kl / exp {− σ kl (cid:88) i : z i = k (cid:88) j : z j = l ( x ij − µ kl ) } = (2 πσ kl ) − w kl / × exp {− σ kl ( (cid:88) i : z i = k (cid:88) j : z j = l x ij − (cid:88) i : z i = k (cid:88) j : z j = l x ij µ kl + (cid:88) i : z i = k (cid:88) j : z j = l µ kl ) } = (2 πσ kl ) − w kl / exp {− σ kl ( q kl − µ kl s kl + w kl µ kl ) } , (4.1)where w kl is the number of elements in block kl , s kl is the sum of the weights and q kl is the sum ofsquares of the weights in the block kl . Posterior : We derive the posterior of the model parameter π kl with prior µ kl ∼ N ( ξ, κ σ kl ) and σ kl ∼ IG ( ν/ , ρ/ as follows. p ( π kl | x kl , z , K ) ∝ p ( π kl ) p ( x kl | π kl , z , K )= p ( µ kl ) p ( σ kl ) (cid:89) { i | z i = k } (cid:89) { j | z j = l } p ( x ij | µ kl , σ kl , z , K )= (2 πκ σ kl ) − / exp {− κ σ kl ( µ kl − ξ ) }× ( ρ/ ν/ Γ( ν/ σ − ν/ kl exp {− ρ/ σ kl }× (2 πσ kl ) − w kl / exp {− σ kl ( q kl − µ kl s kl + w kl µ kl ) } = ( ρ/ ν/ Γ( ν/

2) (2 πκ ) − / (2 π ) − w kl / σ − kl σ − ν − − w kl kl × exp {− σ kl [( 1 κ + w kl ) µ kl −

2( 1 κ ξ + s kl ) µ kl + 1 κ ξ + q kl + ρ ] } (4.2)The posterior of the Gaussian model is also a Normal-Inverse-Gamma distribution which can be denoted s µ kl ∼ N ( ξ n , κ n σ kl ) and σ kl ∼ IG ( ν n / , ρ n / . The posterior density can be expressed as p ( π kl | x kl , z , K ) = (2 πκ n σ kl ) − / exp {− κ n σ kl ( µ kl − ξ n ) }× ( ρ n / ν n / Γ( ν n / σ − ν n / kl exp {− ρ n / σ kl } = ( ρ n / ν n / Γ( ν n /

2) (2 πκ n ) − / σ − kl σ − ν n − kl × exp {− σ kl ( 1 κ n µ kl − ξ n κ n µ kl + ξ n κ n + ρ n ) } . (4.3)Comparing the terms and coeﬃcients with respect to µ kl , µ kl and σ kl , − ν n − − ν − − w kl , (4.4) κ n = 1 κ + w kl , (4.5) ξ n κ n = 2( 1 κ ξ + s kl ) , (4.6) ξ n κ n + ρ n = 1 κ ξ + q kl + ρ. (4.7)In summary, the parameters of the posterior density are given by ν n = ν + w kl , (4.8) κ n = κ w kl κ , (4.9) ξ n = ξ + s kl κ w kl κ , (4.10) ρ n = ξ κ + q kl + ρ − ( ξ + s kl κ ) /κ + w kl . (4.11)We can directly sample π kl from NIG ( ξ n , κ n σ kl , ν n / , ρ n / . We show the calculation of p ( z | K ) = (cid:82) p ( z , r | K ) d r . Given the K -dimensional Dirichlet prior withdensity p ( r | K ) = N ( α ) (cid:81) Kk =1 r α k − k , where α = { α , · · · , α K } , N ( α ) = Γ( (cid:80) Kk =1 α k ) (cid:81) Kk =1 Γ( α k ) ; and the likelihood ( z | r , K ) = (cid:81) Kk =1 r m k ( z ) k , we can collapse r as follows: p ( z | K ) = (cid:90) p ( z , r | K ) d r = (cid:90) p ( r | K ) p ( z | r , K ) d r = (cid:90) Γ( (cid:80) Kk =1 α k ) (cid:81) Kk =1 Γ( α k ) K (cid:89) k =1 r α k − k K (cid:89) k =1 r m k k d r = Γ( (cid:80) Kk =1 α k ) (cid:81) Kk =1 Γ( α k ) (cid:81) Kk =1 Γ( α k + m k )Γ( (cid:80) Kk =1 ( α k + m k )) × (cid:90) Γ( (cid:80) Kk =1 ( α k + m k )) (cid:81) Kk =1 Γ( α k + m k ) K (cid:89) k =1 r α k + m k − k d r = Γ( (cid:80) Kk =1 α k )Γ( (cid:80) Kk =1 ( α k + m k )) K (cid:89) k =1 Γ( α k + m k )Γ( α k ) (5.1) π kl in latent block model with weighted edges The collapsed posterior of the latent block model is described in the work by [3], but the details of thecollapsing procedure are not described there. We elaborate the collapsing procedure of the Gaussianlatent block model. We collapse µ kl and σ kl respectively to get the integral. (cid:90) p ( x kl , π kl | z ) dπ kl = (cid:90) (cid:90) p ( x kl , µ kl , σ kl | z ) dµ kl dσ kl = (cid:90) (cid:90) p ( µ kl ) p ( σ kl ) p ( x kl | µ kl , σ kl , z ) dµ kl dσ kl (6.1)To facilitate integrating with respect to µ kl , we denote I µ kl = (cid:90) p ( x kl , µ kl , σ kl | z ) dµ kl , (6.2)then I µ kl = ( ρ/ ν/ Γ( ν/

2) (2 πκ ) − / (2 π ) − w kl / σ − kl σ − ν − − w kl kl × (cid:90) exp {− σ kl [( 1 κ + w kl ) µ kl −

2( 1 κ ξ + s kl ) µ kl + 1 κ ξ + q kl + ρ ] } du kl . (6.3)Let M = ( ρ/ ν/ Γ( ν/

2) (2 πκ ) − / (2 π ) − w kl / σ − kl σ − ν − − w kl kl , (6.4)so that I µ kl = M × (cid:90) exp {− σ kl [ λ ( µ kl − m ) − λm + 1 κ ξ + q kl + ρ ] } du kl , (6.5)where λ = 1 κ + w kl , (6.6)and m = κ ξ + s kl κ + w kl . (6.7) hen I µ kl = M × (2 π σ kl λ ) / (cid:90) (2 π σ kl λ ) − / exp {− σ kl λ ( µ kl − m ) }× exp {− σ kl ( − λm + 1 κ ξ + q kl + ρ ) } du kl = M × (2 π σ kl λ ) / × exp {− σ kl ( − λm + 1 κ ξ + q kl + ρ ) } = (2 π ) − w kl / ( ρ/ ν/ Γ( ν/ σ − ν − w kl − kl ( w kl κ + 1) − / × exp {− σ kl [ − ( κ ξ + s kl ) κ + w kl + 1 κ ξ + q kl + ρ ] } . (6.8)To facilitate integration with respect to σ kl , we ﬁrst rewrite I µ kl as follows I µ kl = (2 π ) − w kl / ( ρ/ ν/ Γ( ν/

2) ( w kl κ + 1) − / Γ( α ) β α β α Γ( α ) ( σ kl ) − ( α +1) e ( − βσ kl ) , (6.9)where α = 12 ν + 12 w kl , (6.10)and β = 12 [ − ( κ ξ + s kl ) κ + w kl + 1 κ ξ + q kl + ρ ] . (6.11)This can be integrated as follows (cid:90) I µ kl dσ kl = (2 π ) − w kl / ( ρ/ ν/ Γ( ν/

2) ( w kl κ + 1) − / Γ( α ) β α = (2 π ) − w kl / ( ρ/ ν/ Γ( ν/

2) ( w kl κ + 1) − / × Γ( ν + w kl )( [ − ( κ ξ + s kl ) κ + w kl + κ ξ + q kl + ρ ]) ( ν + w kl ) = ρ ν/ Γ { ( w kl + ν ) / } π w kl / Γ( ν/ w kl κ + 1) / × ( − κ ( s kl + ξ/κ ) w kl κ + 1 + ξ κ + q kl + ρ ) − ( w kl + ν ) / . (6.12)In summary, (cid:90) p ( x kl , π kl | z ) dπ kl = (cid:90) (cid:90) p ( x kl , µ kl , σ kl | z ) dµ kl dσ kl = ρ ν/ Γ { ( w kl + ν ) / } π w kl / Γ( ν/ w kl κ + 1) / × ( − κ ( s kl + ξ/κ ) w kl κ + 1 + ξ κ + q kl + ρ ) − ( w kl + ν ) / . (6.13) In a Gibbs move, only one entry in z is updated at each iteration. An alternative is the M3 move [2],which updates multiple entries of z simultaneously. In M3, two communities in z are randomly selectedand denoted as k and k . Each element z i in the selected communities is reassigned to k or k with robability P ik and P ik respectively, to form the updated z ∗ . The collection of elements of z with labels k or k may be indexed by the set I = { i : z i = k or z i = k } . Let the number of such elements be L .The remaining elements of z are collected into a subvector denoted (cid:101) z . For the update, one element z i with i ∈ I is randomly selected and updated to z ∗ i according to a reassignment probability. The updatedelement is added to (cid:101) z . The size of I thus becomes L − . This procedure is repeated until all the elementsof I are processed (the length of I becomes 0) and the resulting vector (cid:101) z becomes the proposed move z ∗ . We deﬁne a sub-adjacency matrix (cid:101) x as the observations corresponding to (cid:101) z and the observations x ∗ i corresponding to the updated z ∗ i . The probabilities of the reassignment satisfy P ik + P ik = 1 and theratio P ik P ik = p ( z ∗ i = k | (cid:101) z , (cid:101) x , x ∗ i , K ) p ( z ∗ i = k | (cid:101) z , (cid:101) x , x ∗ i , K )= p ( z ∗ i = k , (cid:101) z , (cid:101) x , x ∗ i | K ) p ( z ∗ i = k , (cid:101) z , (cid:101) x , x ∗ i | K )= p ( z ∗ i = k , (cid:101) z | K ) p ( z ∗ i = k , (cid:101) z | K ) p ( (cid:101) x , x ∗ i | z ∗ i = k , (cid:101) z , K ) p ( (cid:101) x , x ∗ i | z ∗ i = k , (cid:101) z , K ) . (7.1)The ﬁrst term of this ratio is given by p ( z ∗ i = k , (cid:101) z | K ) p ( z ∗ i = k , (cid:101) z | K ) = Γ( α k + (cid:101) m k ( (cid:101) z ) + 1)Γ( α k + (cid:101) m k ( (cid:101) z )) Γ( α k + (cid:101) m k ( (cid:101) z ))Γ( α k + (cid:101) m k ( (cid:101) z ) + 1)= α k + (cid:101) m k ( (cid:101) z ) α k + (cid:101) m k ( (cid:101) z ) , (7.2)where (cid:101) m k ( (cid:101) z ) and (cid:101) m k ( (cid:101) z ) are the numbers of nodes in community k and k in (cid:101) z . The second term ofthe ratio is given by p ( (cid:101) x , x ∗ i | z ∗ i = k , (cid:101) z , K ) p ( (cid:101) x , x ∗ i | z ∗ i = k , (cid:101) z , K ) = p ( x ∗ i | (cid:101) x , z ∗ i = k , (cid:101) z , K ) p ( x ∗ i | (cid:101) x , z ∗ i = k , (cid:101) z , K )= p ( x ∗ i | (cid:101) x k , z ∗ i = k , (cid:101) z , K ) p ( x ∗ i | (cid:101) x k , z ∗ i = k , (cid:101) z , K )= p ( (cid:101) x k , x ∗ i | z ∗ i = k , (cid:101) z ) p ( (cid:101) x k | z ∗ i = k , (cid:101) z ) p ( (cid:101) x k | z ∗ i = k , (cid:101) z ) p ( (cid:101) x k , x ∗ i | z ∗ i = k , (cid:101) z ) . (7.3)Finally, the reassignment probability is given by P ik − P ik = α k + (cid:101) m k ( (cid:101) z ) α k + (cid:101) m k ( (cid:101) z ) p ( (cid:101) x k , x ∗ i | z ∗ i = k , (cid:101) z ) p ( (cid:101) x k | z ∗ i = k , (cid:101) z ) p ( (cid:101) x k | z ∗ i = k , (cid:101) z ) p ( (cid:101) x k , x ∗ i | z ∗ i = k , (cid:101) z ) . (7.4)and the proposal ratio is given by p ( z ∗ → z ) p ( z → z ∗ ) = (cid:89) i ∈ I P iz i P iz ∗ i . (7.5) Summary of the algorithm

Algorithm 1

Bayesian change-point detection by posterior predictive discrepancy

Input:

Time series Y of one subject, length of time course T , window size W , number of communities K . For t = W + 1 , · · · , T − W Calculate Y t → x t . Draw samples { z i , π i } ( i = 1 , · · · , S ) from the posterior P ( z , π | x , K ) . Simulate replicated adjacency matrix x rep i from the predictive distribution P ( x rep | z , π , K ) . Calculate the disagreement index γ ( x rep i ; x ) . Calculate the posterior predictive discrepancy index γ t = (cid:80) Si =1 γ ( x repi ; x ) S . End For t = W + W s + 1 , · · · , T − W − W s Calculate cumulative discrepancy energy E t = (cid:80) t + Ws − I = t − Ws γ I . End

Supplementary Table 1:

Community Node number Community Node number Community Node numberk=1 29 k=1 k=1k=2 11 31 32 k=2 31 32 k=2 11 30 31 32k=3 21 k=3 16 20 27 k=3 12 16 20 21k=4 1 9 17 34 k=4 1 9 17 34 k=4 1 7 9 17 34k=5 2 k=5 3 k=5 24k=6 35 k=6 5 10 k=6 8

Supplementary Table 1.

This table summarises the nodes commonly located in a speciﬁc community k for all of picture types in theworking memory tasks. upplementary Figure 1. Results of method validation using synthetic data with SNR = 10dB : a CDE of the multivariateGaussian data with SNR = 10dB using diﬀerent models ( K =6, 5, 4, and 3). The sliding window size for converting from time series tocorrelation matrices sequence is W = 20 , whereas (for smoothing) the sliding window size for converting from PPDI to CDE is W s = 10 . Thevertical dashed lines are the locations of the true change-points ( t = 20, 50, 80, 100, 130, and 160). The colored scatterplots in the ﬁgures arethe CDEs of individual (virtual) subjects and the black curve is the group CDE (averaged CDE over 100 subjects). The red points are the localmaxima and the blue points are the local minima. b Local ﬁtting with diﬀerent models (from K =3 to 18) for synthetic data (SNR=10dB).Diﬀerent colors represent the PPDI values of diﬀerent states with the true number of communities K true . c The estimation of communityconstituents for SNR = 10dB at each discrete state: t = 36, 67, 91, 116, 147) for brain states 1 to 5, respectively. The estimations of thelatent label vectors ( Estimation ) and the label vectors (

True ) that determine the covariance matrix in the generative model are shown asbar graphs. The strength and variation of the connectivity within and between communities are represented by the block mean and variancematrices within each panel. upplementary Figure 2. Results of method validation using synthetic data with SNR = 0dB : This ﬁgure is in the sameformat as the

Supplementary Figure 1 above only that it is for a SNR = 0 dB. b Local ﬁtting with diﬀerent models (from K =3 to 18) forsynthetic data (SNR=0dB). c The estimation of community constituents for SNR = 0dB at each discrete state: t =36, 66, 92, 116, 146) forbrain states 1 to 5, respectively. upplementary Figure 3. CDE of the multivariate Gaussian data with SNR = -5dB using diﬀerent models ( K =6, 5, 4, and 3 in a to d ). Change-point detection did not work in this case, hence the brain states can not be identiﬁed here. upplementary Figure 4. Validation of sampling the model parameters : a the histograms of the sampled block mean, b thehistograms of the sampled block variance for the case K = 3 . We denote the block kl sequentially (for example, the block for k = 2 , l = 3 isdenoted as block ; the block for k = 3 , l = 3 is denoted as block ). The green dashed lines are the true values and the black dashed linesare the estimates. In order to validate the algorithm for sampling the model parameters, we simulate a synthetic adjacency matrix from amixture of Gaussian distributions with ground truth of K = 3 , the true latent label vector (3, 2, 1, 1, 2, 3, 3, 3, 2, 2, 1, 3, 1, 2, 2, 2, 1, 3,3, 3, 3), the true block mean matrix (0.22, 0.05, -0.06; 0.05, 0.30, 0.02; -0.06, 0.02, 0.18) and the true block variance matrix (0.1, 0.02, 0.01;0.002, 0.15, 0.03; 0.01, 0.03, 0.09). Given this generated adjacency matrix as an observation, we draw samples of the block mean and variancefrom the posterior p ( π | x , z ) conditional on z . The shape of the histogram of mean is consistent with a Normal distribution and the histogramof variance is consistent with an Inverse-Gamma distribution. The ﬁgure shows that the estimations of the block mean and variance closelymatch the ground truth values. upplementary Figure 5. Task activation maps (thresholded Z-MAX maps) for group analysis : Contrasts of 2-back vsﬁxation, 0-back vs ﬁxation and 2-back vs 0-back for MNI coordinates (x = -50, y = -46, z = 10). For running 1st-level GLM-based FEAT [7] inFSL, we added the confound predictors ﬁles released by HCP to the design matrix of the model for each individual. We then set up a 2nd-leveldesign matrix for the contrast of 2-back, 0-back, and ﬁxation. For the 3rd-level (the group-level GLM analysis [8]), we applied cluster-wiseinference and set up the cluster deﬁning threshold (CDT) to be Z = 3 . ( P = 0 . ) to avoid cluster failure problems as described in [9],with a family-wise error-corrected threshold of P = 0 . . Maps are viewed by looking upward from the feet of the subject and the coordinatedirections are denoted as Anterior (A), Posterior (P), Superior (S), Inferior (I), Left (L), and Right (R). upplementary Figure 6. Thresholded activation maps upplementary Figure 7. Community structure of the discrete brain states with sparsity level of 20% (session 1: LR) :The ﬁgures with blue frames represent brain states corresponding to working memory tasks (2-back tool at t = 41 ; 0-back body at t = 76 ;2-back face at t = 140 ; 0-back tool at t = 175 ; 2-back body at t = 239 ; 2-back place at t = 278 ; 0-back face at t = 334 ; and 0-back place at t = 375 in a - k ) and those with red frames represent brain states corresponding to ﬁxation (ﬁxation at t = 107 , , and in c , f , and i ). Eachbrain state shows connectivity at a sparsity level of 20%. The diﬀerent colors of the labels represent community memberships. The strength ofthe connectivity is represented by the colors shown in the bar at the right of each frame. In Circos maps, nodes in the same community areadjacent and have the same color. Node numbers and abbreviations of the corresponding brain regions are shown around the circles. In eachframe, diﬀerent colors represent diﬀerent community numbers. The connectivity above the sparsity level is represented by arcs. The blue linksrepresent connectivity within communities and the red links represent connectivity between communities. upplementary Figure 8. Community structure of the discrete brain states with sparsity level of 30% (session 1: LR) :This ﬁgure is in the same format as the

Supplementary Figure 7 above only that it is for sparsity level of 30%. upplementary Figure 9. Estimated mean and variance matrices of the blocks for brain states (session 1: LR) : Theﬁgures with blue frames represent brain states corresponding to working memory tasks (2-back tool at t = 41 ; 0-back body at t = 76 ; 2-backface at t = 140 ; 0-back tool at t = 175 ; 2-back body at t = 239 ; 2-back place at t = 278 ; 0-back face at t = 334 ; and 0-back place at t = 375 in a - k ) and those with red frames represent brain states corresponding to ﬁxation (ﬁxation at t = 107 , , and in c , f , and i ). The diﬀerentcolors of the labels represent community memberships. The density and variation of connectivity within and between communities are shownin the estimated block mean matrix and block variance matrix. upplementary Figure 10. Results of Bayesian change-point detection for working memory tfMRI data (session 2 RL) :Cumulative discrepancy energy (CDE) with diﬀerent sliding window sizes ( a W = 22 , b W = 26 , c W = 30 and d W = 34 under the model K = 3 ) and diﬀerent models ( c K=3; e K=4; f K=5 using a sliding window of W = 30 ). W s is the sliding window used for converting fromPPDI to CDE. The vertical dashed lines are the times of onset of the stimuli, which are provided in the EV.txt ﬁles in the released data. Thecolourful scatterplots in the ﬁgures represent the CDEs of individual subjects and the black curve is the group CDE (averaged CDE over 100subjects). The red points are the local maxima, which are taken to be the locations of change-points, and the blue points are the local minima,which are used for local inference of the discrete brain states. upplementary Figure 11. Local ﬁtting (session 2 RL) between averaged adjacency matrix and models from K = 3 to K = 18 .Diﬀerent colours represent the PPDI values of diﬀerent brain states. upplementary Figure 12. Community structure of the discrete brain states with sparsity level of 10% (session 2: RL) :The ﬁgures with blue frames represent brain states corresponding to working memory tasks (2-back tool at t = 41 ; 0-back body at t = 77 ;2-back face at t = 139 ; 0-back tool at t = 175 ; 0-back body at t = 236 ; 2-back place at t = 275 ; 0-back face at t = 334 ; and 2-back place at t = 376 in a - k ) and those with red frames represent brain states corresponding to ﬁxation (ﬁxation at t =99, 209, and 306 in c , f , and i ). upplementary Figure 13. Community structure of the discrete brain states with sparsity level of 20% (session 2: RL) :This ﬁgure is in the same format as the

Supplementary Figure 11 above only that it is for sparsity level of 20%. upplementary Figure 14. Community structure of the discrete brain states with sparsity level of 30% (session 2: RL) :This ﬁgure is in the same format as the

Supplementary Figure 11 above only that it is for sparsity level of 30%. upplementary Figure 15. Estimated mean and variance matrices of the blocks for brain states (session 2: RL) : Thisﬁgure is in the same format as the

Supplementary Figure 9 . The ﬁgures with blue frames represent brain states corresponding to workingmemory tasks (2-back tool at t = 41 ; 0-back body at t = 77 ; 2-back face at t = 139 ; 0-back tool at t = 175 ; 0-back body at t = 236 ; 2-backplace at t = 275 ; 0-back face at t = 334 ; and 2-back place at t = 365 in a - k ) and those with red frames represent brain states corresponding toﬁxation (ﬁxation at t = 99 , , and in c , f , and i ). eferences [1] Matthew Stephens. Dealing with label switching in mixture models. Journal of the Royal StatisticalSociety. Series B (Statistical Methodology) , 62(4):795–809, 2000.[2] Agostino Nobile and Alastair T. Fearnside. Bayesian ﬁnite mixtures with an unknown number ofcomponents: The allocation sampler.

Statistics and Computing , 17:147–162, 2007.[3] Jason Wyse and Nial Friel. Block clustering with collapsed latent block models.

Statistics andComputing , 22:415–428, 2012.[4] Giorgio Carpaneto and Paolo Toth. Algorithm 548: Solution of the assignment problem [H].

ACMTransactions on Mathematical Software (TOMS) , 6(1):104–111, 1980.[5] Aaron.F. MacDaid, Thomas. Brendan. Murphy, Nial. Friel, and Neil.J. Hurley. Improved Bayesianinference for the stochastic block model with application to large networks.

Computational Statisticsand Data Analysis , 60:12–31, 2012.[6] W.K. Hastings. Monte Carlo sampling methods using Markov chains and their applications.

Biometrika , 57(1):97–109, 2016.[7] Mark W. Woolrich, Brian D. Ripley, Michael Brady, and Stephen M. Smith. Temporal autocorrela-tion in univariate linear modeling of FMRI data.

NeuroImage , 14(6):1370–1386, 2001.[8] Mark W. Woolrich, Timothy E.J. Behrens, Christian F. Beckmann, Mark Jenkinson, and Stephen M.Smith. Multilevel linear modelling for FMRI group analysis using Bayesian inference.

NeuroImage ,21(4):1732–1747, 2004.[9] Anders Eklund, Thomas E. Nichols, and Hans Knutsson. Cluster failure: Why fMRI inferences forspatial extent have inﬂated false-positive rates.

PNAS , 113(28):7900–7905, 2016., 113(28):7900–7905, 2016.