[PDF] Integrative Learning for Population of Dynamic Networks with Covariates

Abstract

Although there is a rapidly growing literature on dynamic connectivity methods, the primary focus has been on separate network estimation for each individual, which fails to leverage common patterns of information. We propose novel graph-theoretic approaches for estimating a population of dynamic networks that are able to borrow information across multiple heterogeneous samples in an unsupervised manner and guided by covariate information. Specifically, we develop a Bayesian product mixture model that imposes independent mixture priors at each time scan and uses covariates to model the mixture weights, which results in time-varying clusters of samples designed to pool information. The computation is carried out using an efficient Expectation-Maximization algorithm. Extensive simulation studies illustrate sharp gains in recovering the true dynamic network over existing dynamic connectivity methods. An analysis of fMRI block task data with behavioral interventions reveal sub-groups of individuals having similar dynamic connectivity, and identifies intervention-related dynamic network changes that are concentrated in biologically interpretable brain regions. In contrast, existing dynamic connectivity approaches are able to detect minimal or no changes in connectivity over time, which seems biologically unrealistic and highlights the challenges resulting from the inability to systematically borrow information across samples.

Full PDF

IIntegrative Learning for Population of DynamicNetworks with Covariates

Suprateek Kundu ∗ , Jin Ming † , Joe Nocera, and Keith M. McGregor Abstract:

Although there is a rapidly growing literature on dynamic connectivity methods,the primary focus has been on separate network estimation for each individual, which fails toleverage common patterns of information. We propose novel graph-theoretic approaches forestimating a population of dynamic networks that are able to borrow information across mul-tiple heterogeneous samples in an unsupervised manner and guided by covariate information.Speciﬁcally, we develop a Bayesian product mixture model that imposes independent mixturepriors at each time scan and uses covariates to model the mixture weights, which results intime-varying clusters of samples designed to pool information. The computation is carriedout using an eﬃcient Expectation-Maximization algorithm. Extensive simulation studiesillustrate sharp gains in recovering the true dynamic network over existing dynamic connec-tivity methods. An analysis of fMRI block task data with behavioral interventions revealsub-groups of individuals having similar dynamic connectivity, and identiﬁes intervention-related dynamic network changes that are concentrated in biologically interpretable brainregions. In contrast, existing dynamic connectivity approaches are able to detect minimal orno changes in connectivity over time, which seems biologically unrealistic and highlights thechallenges resulting from the inability to systematically borrow information across samples.

Keywords:

Dynamic networks; EM algorithm; integrative learning; mixture models. ∗ Corresponding author email: [email protected] and address: 1518 Clifton Road, Atlanta, GA30322, USA † equal contribution by SK and JM who are co-ﬁrst authors a r X i v : . [ s t a t . M E ] J a n Introduction

There has been a steady development of graph-theoretic approaches to compute dynamicfunctional connectivity (FC) that is fueled by an increasing agreement that the brain networkdoes not remain constant across time and instead undergoes temporal changes resulting fromendogenous and exogenous factors (Filippi et al., 2019). For example, task-related imagingstudies have shown that the brain networks will re-organize when the subjects undergodiﬀerent modulations of the experimental tasks during the scanning session (Chang andGlover, 2010; Lukemire et. al, 2020). Similarly, dynamic FC has also been observed duringresting-state experiments (Bullmore and Sporns, 2009). These, and other recent studies,have found increasing evidence of underlying neuronal bases for temporal variations in FCwhich is linked with changes in cognitive and disease states (Hutchinson et al., 2013).Dynamic connectivity approaches involve time-varying correlations derived via graph-theoretic methods, and may be broadly classiﬁed into the following categories: (i) changepoint methods (Cribben et al., 2013; Kundu et al., 2018) that assume stable phases of connec-tivity interspersed with connectivity jumps at unknown locations, which results in piecewiseconstant connectivity; (ii) Hidden Markov Models (HMMs) involving fast transient networksthat are reinforced or revisited over time, which have been applied to electrophysiologicaldata (Quinn et al., 2018) and more recently to fMRI data (Warnick et al., 2018); and (iii)sliding window approaches that enforce temporally smooth correlations (Chang and Glover,2010; Monti et al., 2014) based on the biologically plausible assumption of slowly varyingtemporal correlations with gradual changes in connectivity. Although sliding window meth-ods are arguably the most widely used, these approaches may be limited by practical issuessuch as the choice of the window length (Lindquist et. al, 2014).On the other hand, change point models and HMMs have the advantage of model par-simony by limiting the distinct number of parameters. However, the performance of thesemethods often depend on modeling assumptions, and temporal smoothness of connectiv-ity estimates can not be typically ensured. More importantly, since most of these existing2pproaches typically rely on single-subject data, they often face challenges in terms of de-tecting rapid changes in connectivity and may result in inaccurate estimates due to a limitedinformation from a single individual.Essentially, almost the entirety of the existing dynamic connectivity literature has fo-cused on data from single individuals, due to the fact that temporal changes in connectivityare expected to be subject-speciﬁc and may not be replicated across individuals. However,recent evidence suggests that combining information across individuals in a group providesmore accurate estimates for connectivity (Hindriks et al., 2016), which adheres to the com-monly used statistical principle of data aggregation using multiple samples to obtain morerobust estimates. Kundu et. al (2018) proposed a sub-sampling approach to compute timevarying dynamic connectivity networks using multi-subject fMRI data, which resulted inconsiderable gains in dynamic network estimation under limited heterogeneity across sam-ples, compared to a single-subject analyses. Unfortunately, the variations across samplesmay not be restricted in many practical settings. To our knowledge, there is a scarcityof carefully calibrated approaches for pooling information across heterogeneous samples inorder to accurately estimate a population of (single-subject) dynamic networks. This is per-haps not surprising, given that there are considerable challenges involved in developing suchmethods. From a methodological perspective, it is not immediately clear how to eﬀectivelyborrow information across individuals in a data-adaptive manner that also respects the in-herent connectivity diﬀerences between heterogeneous samples. Similarly when estimatingdynamic networks with V brain regions for N individuals each having T time scans, oneencounters computational challenges in terms of computing N T distinct V × V connectivitymatrices, which is not straightforward for high-dimensional fMRI applications.In this article, our goal is to develop a fundamentally novel hierarchical Bayesian prod-uct mixture modeling (BPMM) approach incorporating covariates (MacEachern, 1999) forestimating a population of dynamic networks corresponding to heterogeneous multi-subjectfMRI data. The importance of using covariates to model known stationary networks has3lready been illustrated in recent literature (Zhang et al., 2019; Sun and Li, 2017), wherethe networks are speciﬁed in advance. These methods suggest a strong justiﬁcation for incor-porating demographic, clinical, and behavioral covariates when modeling dynamic networksin order to obtain more accurate and reliable estimates (Shi and Guo, 2016). Motivated bythese existing studies, the proposed BPMM framework estimates unknown dynamic networksby leveraging covariate information in order to inform the clustering mechanism under themixture model, which is better designed to tackle heterogeneity across samples that ulti-mately results in more accurate network estimation. Under the proposed model, subgroupsof individuals with similar dynamic connectivity proﬁles are identiﬁed, where the subgroupmemberships are also inﬂuenced by covariate proﬁles and change over time in an unsuper-vised manner that is designed to pool information in order to estimate the dynamic networks.Another appealing feature of the proposed BPMM approach is the ability to report clusterlevel network summaries that are more robust to noise and heterogeneity in the data. Sincethe proposed approach clusters samples independently at each time scan guided by covariateinformation, it is clearly distinct from HMM approaches that instead cluster transient brainstates across time scans. To our knowledge, the proposed approach is one of the ﬁrst to es-timate a population of dynamic networks incorporating covariate knowledge by integratingheterogeneous multi-subject fMRI data, which represents considerable advances.In order to tackle the daunting task of estimating N T connectivity matrices, each ofdimension V × V , the proposed approach employs dimension reduction by clustering sam-ples under the mixture modeling framework that translates to considerable computationalgains. In particular, the BPMM approach induces model parsimony by reducing the numberof unique model parameters from NT × V ( V − to ( (cid:80) Tt =1 k t ) × V ( V − , where k t ( << N ) denotesthe number of clusters at the t -time point that is determined in an unsupervised manner.Temporal smoothness in connectivity for each network is also ensured via additional hierar-chical fused lasso priors on mixture atoms in the BPMM, which results in gradual changes inconnectivity that is biologically meaningful. In scenarios where sharp connectivity changes4re anticipated in certain localized time windows (due to changes in experimental designin a block task experiment, or other exogenous or endogenous factors), one may estimatethese connectivity change points via a post-processing step that involves applying the totalvariation penalty (Bleakley and Vert, 2010) to the dynamic connectivity estimates underthe proposed approach. Additional post-processing steps involving a K-means algorithm arealso proposed to identify subgroups of individuals with similar dynamic connectivity patternsconsolidated across time, which is particularly useful in terms of obtaining insights relatedto heterogeneity. Figure 1 provides a visual illustration of the proposed approach.The proposed BPMM is developed for dynamic pairwise correlations as well as dynamicprecision matrices, which provide complementary interpretations of dynamic connectivity.In particular, pairwise correlations encode connections between pairs of nodes without ac-counting for the eﬀects of third party nodes, whereas partial correlations report associationbetween nodes conditional on the eﬀects of the remaining network nodes. While our goals donot involve assessing the merits of one approach over the other (see Smith et al., 2013 for areview), the proposed development is designed to provide users with an option to implementeither approach as desired and suitable for respective applications. We develop an eﬃcientExpectation-Maximization (EM) algorithm to implement the dynamic pairwise correlationmethod separately for each edge, and another EM algorithm for dynamic precision matrixestimation that simultaneously involves all network nodes. We perform extensive simulationsto evaluate the performance of the proposed method in contrast to existing approaches thatinvolved a variety of dynamic network structures. The proposed methods were also used toinvestigate dynamic functional connectivity changes due to a high intensity, aerobic exercise’spin’ intervention when compared to a non-aerobic exercise, control intervention, which wereadministered to a heterogeneous group of sedentary adults who performed a fMRI block taskexperiment. Our goals are to provide connectivity insights that are complimentary to pre-vious activation-based ﬁndings from the same study (McGregor et al., 2018), but involvesanalytic challenges due to the short duration of the ﬁxation and task blocks that induce5apid connectivity changes which are usually diﬃcult to detect via existing methods.The rest of the article is structured as follows. Section 2 develops the proposed approachfor dynamic pairwise connectivity (denoted as integrative dynamic pairwise connectivity withcovariates or idPAC) and dynamic precision matrices (denoted as integrative dynamic preci-sion matrix with covariates or idPMAC), outlines a post-processing strategy for estimatingnetwork change points, as well as identifying clusters of samples with similar dynamic con-nectivity proﬁles. Section 3 develops a computationally eﬃcient EM algorithm to implementthe proposed approaches, and describes choices for tuning parameters. Section 4 reports re-sults from extensive simulation studies, and Section 5 reports our analysis and results fromthe block-task fMRI experiment. Additional discussions are provided in Section 6. Through-out the article, we will use BPMM to denote the overall Bayesian product mixture modelingframework used for developing the idPAC and idPMAC approaches, as appropriate. In this section, we propose a novel approach for estimating a population of dynamic networksusing heterogeneous multi-subject fMRI data with the same number of brain volumes acrossall individuals. For modeling purposes, we will assume that the demeaned fMRI measure-ments are normally distributed with zero mean (Kundu et al., 2018) at each time scan, andthat pre-whitening steps have been performed to minimize temporal autocorrelations. Wewill ﬁx some notations to begin with. Suppose fMRI data is collected for T scans and V nodes (voxels or regions of interest) for N individuals. Denote the fMRI measurements acrossall the nodes at time point t as y ( i ) t = ( y ( i )1 ,t , . . . , y ( i ) V,t ) (cid:48) , and denote the V × T matrix of fMRImeasurements for the i -th individual as Y ( i ) that has the t -th column as y ( i ) t , i = 1 , . . . , N .Further, denote the vector of q × x i for the i -th sample, and represent thecollection of fMRI data matrices across all individuals as Y .In what follows, we develop the idPAC method for pairwise correlations (Section 2.1)6igure 1: A schematic diagram illustrating the proposed dynamic pairwise correlation method. Amixture prior with H = 3 components is used to model dynamic correlations, where the mixtureweights are modeled using covariates. The resulting networks at each time scan for each sampleare allocated to one of the H clusters representing distinct network states that are represented byred, orange and blue cubes. Although the proposed method does not cluster transient states acrosstime, the simpliﬁed representation in the Figure illustrates the similarity of brain states contained inidentical colored cubes across the experimental session. Such temporal smoothness of the networkis imposed via hierarchical fused lasso priors on the mixture atoms. Once, the dynamic FC isestimated, a post-processing step using K-means (Section 2.3) is applied to compute sub-groupsof samples that exhibit similar dynamic connectivity patterns summarized across all time scans.The subgroups are represented by the circle, pyramid, triangle and inverted triangle shapes in theFigure and correspond to diﬀerent modes of dynamic connectivity with diﬀerent number of brainstates represented by diﬀerent patterns within each shape. The connectivity change points for eachindividual, as well as at a cluster level, are computed via another post-processing step that employsa group fused lasso penalty (Section 2.4). The method reports both individual and cluster-levelnetwork features. θ given data Y is deﬁned as P ( θ | Y ) = L ( Y | θ ) × π ( θ ) P ( Y ) using Bayes theorem,where L ( Y | θ ) denotes the data likelihood given the parameter value θ , π ( θ ) represents theprior on θ under the Bayesian model, and P ( Y ) = (cid:82) L ( Y | θ ) π ( θ ) dθ is the marginal likelihoodafter integrating out all possible values of θ . Full details of the posterior distributions for theidPAC models in (1)-(2) and the idPMAC model in (3)-(4) are provided in the Appendix. Let the unknown dynamic functional connectivity (pairwise correlation) of individual i bedenoted as ρ ( i ) := { ρ ( i ) jl,t , j < l, j, l = 1 . . . V, t = 1 . . . T } , and the corresponding Fisher-transformed pairwise correlations be denoted as γ ( i ) jl,t = arctanh ( ρ ( i ) jl,t ). We propose a Bayesianhierarchical approach that models the dynamic correlations for one edge at a time, using datafrom multiple individuals. We propose the following model for edge ( j, l ), and t = 1 , . . . , T,  y ( i ) jt y ( i ) lt  ∼ N (cid:32)   , σ y  ρ ( i ) jl,t ρ ( i ) jl,t  (cid:33) , γ ( i ) jl,t ∼ H (cid:88) h =1 ξ h,jlt ( x i ) N ( γ ∗ h,jlt , σ γ,h ) , i = 1 , . . . , N,π ( γ ∗ h,jl , . . . γ ∗ h,jlT ) ∝ exp( − λ T − (cid:88) t =1 | γ ∗ h,jlt − γ ∗ h,jl,t − | ) , σ − γ,h ∼ Ga ( a σ , b σ ) , h = 2 , . . . , H, (1)where the Fisher-transformed correlations γ ( i ) jl,t are modeled under a mixture of Gaussiansprior having H components denoted as γ ∗ h,jlt , h = 1 , . . . , H, with the prior probability forthe h -th mixture component denoted as ξ h,jlt ( x i ) that depends on covariates, such that (cid:80) Hh =1 ξ h,jlt ( x i ) = 1 for all t = 1 , . . . , T , σ y denotes the residual variance in the likelihoodterm, σ γ,h captures the (unknown) variability of the pairwise correlations under the mixtureprior speciﬁcation, | · | denotes the L norm, and N v ( µ, Σ) denotes a multivariate Gaussiandistribution with mean µ and V × V covariance matrix Σ. Under a hierarchical Bayesian8peciﬁcation, σ − γ,h is estimated under the conjugate Gamma prior with shape and scale pa-rameters a σ , b σ , respectively. The mixture prior speciﬁes that for any given time scan t ,the functional connectivity for each individual can take values revolving around any oneof the H mixture atoms denoted by ( γ ∗ ,jlt , . . . , γ ∗ H,jlt ) with respective prior probabilities( ξ ,jlt ( x i ) , . . . , ξ H,jlt ( x i )). These mixture probabilities and atoms are unknown and learntadaptively from the data via posterior distributions under the proposed idPAC approach.Modeling mixture atoms via fused lasso: The mixture atoms are modeled under a fused lassoprior in (1) that encourages temporal smoothness of pairwise correlations by assigning smallprior probabilities for large changes in the values between consecutive time scans. Althoughtemporal smoothness in correlations is encouraged, the Bayesian approach is still equippedto accommodate sharp jumps in connectivity that may arise due to changes in experimentaldesign or other factors. Such connectivity jumps are detected using a post-processing step(see Section 2.4) applied to the estimated dynamic connectivity under the proposed model.Modeling mixture weights via covariates: In order to eﬀectively tackle heterogeneity, weincorporate supplementary covariate information when modeling the mixture weights underour mixture modeling framework in (1). By incorporating covariate information, the model isdesigned to achieve more accurate identiﬁcation of clusters, which then naturally translatesto improved estimates for dynamic FC at the level of each individual. In particular, wemodel ( ξ ( i )1 ,jl , . . . , ξ ( i ) H,jl ) via a Multinomial Logistic regression (Engel, 1988) as ξ ( i ) h,jlt ( x i ) = exp ( x T i β h,jlt )1 + (cid:80) H − h =1 exp ( x T i β h,jlt ) , β h,jlt ∼ N ( , Σ β ) , t = 1 , . . . , T, h = 1 , . . . , H − , (2)where β H,jlt = 0 , t = 1 , . . . , T, is ﬁxed as the reference group, and β h,jlt , represent the vec-tor of unknown regression coeﬃcients that control the contribution of the covariates to themixture probabilities for the h -th component ( h = 1 , . . . , H − H -thcomponent. These regression coeﬃcients are assigned a Gaussian prior with mean zero andprior covariance Σ β under a hierarchical Bayesian speciﬁcation. A large value of these regres-9ion coeﬃcients implies increased importance of the corresponding covariate with respect tomodeling a particular edge under consideration, whereas β ,jlt ≈ . . . ≈ β H − ,jlt ≈ for all t = 1 , . . . , T, indicates spurious covariates unrelated to the dynamic pairwise correlations.Model (2) suggests that the log-odds for each component ξ ( i ) h,jlt ( x i ) /ξ ( i ) H,jlt ( x i ) , h = 1 , H − , can be expressed as a linear combination of covariates. When two or more samples havesimilar covariate information, the prior speciﬁcation in (2) will encourage similar mixturecomponents to characterise the dynamic connectivity for all these samples that will result inanalogous connectivity patterns. However the posterior distribution (that is used to deriveparameter estimates) should be ﬂexible enough to accurately estimate varying connectivitypatterns between individuals even when they share similar covariate values, by leveraginginformation present in the data (as evident from extensive numerical studies in Section 4).Role of clustering in tackling heterogeneity and pooling information: Under model (1), eachsample will be assigned to one of the H clusters at each time scan in an unsupervised mannerand guided by their covariate proﬁles in order to model the edge-level dynamic connectivity.Due to independent clustering at each time scan, these cluster conﬁgurations change overthe experimental session in a data-adaptive manner to characterize connectivity ﬂuctuationsacross individuals. Such time scan speciﬁc clusters represent subgroups of individuals withsimilar connectivity proﬁles over a subset of time scans, which are learnt by pooling infor-mation across all samples within a cluster. Here, it is important to note that model (1) doesnot impose identical dynamic connectivity across all time scans between multiple individu-als (that is biologically unrealistic), but instead encourages common connectivity patternswithin subgroups of samples for a subset of time points that are learnt in a data-adaptivemanner. Hence, the proposed method is designed to result in more accurate estimation com-pared to a single subject analysis that is not equipped to pool information across samples ora group level analysis that does not account for within sample heterogeneity. We note thatalthough the estimation is performed separately for each edge, the connectivity estimatesacross all edges are consolidated to obtain connectivity change point estimates (Section 2.3)10r identify subgroups with common dynamic connectivity proﬁles (Section 2.4). We now propose a mixture model for dynamic precision matrix estimation that looks at thetotality of all nodes in the network, in contrast to the edge-wise analysis in Section 2.1. Whilethe proposed approach also uses a mixture modeling framework as in Section 2.1, the twomethods are fundamentally distinct in the manner in which the mixture prior is speciﬁed andin terms of how the network edges are constructed and interpreted. The proposed approachestimates the network by computing the V × V precision matrix involving V ( V − / V nodes at each timescan. The partial correlations measure interactions between pairs of regions after removingthe inﬂuence of third party nodes, which is successful in ﬁltering out spurious correlations.Hence a zero partial correlation between two nodes implies conditional independence. Theproposed idPMAC approach enables one to report graph-theoretic network summary mea-sures that capture important patterns of network information transmission (Lukemire et al.,2020), which are otherwise diﬃcult to report using pairwise correlations (Smith et al., 2012).Denote the V × V precision matrix over all nodes for the i -th individual at the t -thtime point as Ω ( i ) t =  ω ( i ) t, ω ( i )1 ,t ω ( i ) (cid:48) ,t Ω ( i )11 ,t  , and note that the partial correlation between nodes k and l is given directly as − ω kl / √ ω kk ω ll (ignoring the subject-speciﬁc and time-scan speciﬁcnotations). We propose a Gaussian graphical model involving product mixture priors as: y ( i ) t ∼ N (cid:34) , Ω ( i ) t (cid:35) , ω ( i ) v,t ∼ H (cid:88) h =1 ξ h,t ( x i ) N V − ( ω ∗ h,t , σ ω,h I V − ) , t = 1 , . . . , T, i = 1 , . . . , N,ω ( i ) t,vv ∼ E ( α , π ( ω ∗ h, , . . . , ω ∗ h,T ) ∝ exp( − λ T − (cid:88) t =1 | ω ∗ h,t − ω ∗ h,t − | ) , h = 1 , . . . , H, (3)where Ω ( i ) t ∈ M + V , the space of positive deﬁnite matrices, E ( α ) denotes the Exponential dis-tribution with scale parameter α , and ω ( i ) v,t denotes the vector of ( V −

1) oﬀ-diagonal elements11orresponding to the v -th row of Ω ( i ) t that are modeled using a mixture of multivariate Gaus-sians prior. Speciﬁcally, the dynamic connectivity at time scan t is likely to be characterisedvia the h th mixture component with prior probability ξ h,t ( x i ) depending on covariates, wherethe prior mean and variance for this unknown mixture component is given by ω ∗ h,t and σ ω,h respectively. The idPMAC approach in (3)-(4) speciﬁes independent mixture priors on theset of all edges related to each node and at each time scan, which ensures symmetric andpositive deﬁnite precision matrices that are necessary for obtaining valid partial correlationestimates. Full details for the computational steps are presented in Section 3.Modeling mixture atoms: Under a hierarchical Bayesian speciﬁcation, the mixture atomsor component-speciﬁc means ω ∗ h,t are themselves unknown and modeled via a fused lassoprior, which encourages temporal homogeneity of partial correlations by assigning small priorprobabilities for large changes in the values. We note that although the fused lasso priorencourages temporal smoothness in partial correlations, systematic changes in connectivityreﬂected by sharp jumps may be still identiﬁed via a post-processing step in Section 2.4.Modeling mixture weights via covariates: The node level mixture weights incorporating co-variates are modeled via a Multinomial Logistic regression that is deﬁned as: ξ ( i ) h,t ( x i ) = e x T i β h,t (cid:80) H − h =1 e x T i β h,t , β h,t ∼ N ( , Σ β ) , i = 1 , . . . , N, (4)where β h,t refers to the unknown regression coeﬃcients corresponding to time scan t andmixture component h that is assigned a Gaussian prior, and β H,t = 0 , t = 1 . . . T, is set as thereference group. The prior in (3)-(4) encourages similar clustering conﬁgurations resultingin analogous time-varying partial correlations for individuals with similar covariate proﬁles.However in the presence of heterogeneity, the posterior distribution under the idPMACmethod is able to identify divergent dynamic connectivity patterns even among individualswith similar covariate proﬁles (as evident from extensive numerical studies in Section 4).Role of clustering in tackling heterogeneity and pooling information: Under model (3), each12olumn of the precision matrix is assigned to one of the H clusters at each time scan in an un-supervised manner. Hence, the mixture modeling framework allows subsets of rows/columnsof Ω ( i ) t to have the same values depending on their clustering allocation at each given timescan, which is an unique feature under the idPMAC approach that is not shared by theidPAC method. This feature results in robust estimates by pooling information across nodesand samples to estimate common partial correlations, and is a necessary dimension reductionstep for scenarios involving large networks. For example, all weak or absent edges can besubsumed into one cluster which yields model parsimony. In addition, divergent connectivitypatterns are captured via distinct time-varying clustering conﬁgurations across individualsas derived from the posterior distribution, which accommodates heterogeneity. Hence, theclustering mechanism under the idPMAC method not only enables dimension reduction, butalso provides a desirable balance between leveraging common connectivity patterns withinand across networks and addressing inherent network diﬀerences across individuals. In practical neuroimaging applications, it is often of interest to detect dissimilar modes ofdynamic connectivity patterns that are embodied by distinct subgroups of individuals whoalso diﬀer in terms of demographic or clinical characteristics, or other factors. For examplein our fMRI task study, one of the objectives is to assess variations in dynamic connectivitywith respect to subgroups of samples that were assigned diﬀerent interventions, and who alsohad varying demographic characteristics. Instead of comparing network diﬀerences betweenpre-speciﬁed subgroups that are likely to contain individuals with heterogeneous connectivitypatterns, it is more appealing to develop a data-adaptive approach to identify subgroups thatcomprise individuals with homologous dynamic connectivity, and then examine connectivityvariations across such subgroups and how these variations are related to intervention andother factors of interest. When estimating these subgroups, we do not require identicaldynamic connectivity patterns for all individuals within subgroups, but rather expect them13o have limited network diﬀerences in terms of edge strengths and connectivity change points.An inherently appealing feature of subgroup detection is that is allows one to compute clusterlevel change points and other aggregate network features (see Section 2.4) which are morereproducible in the presence of noise and heterogeneity, compared to a single-subject analysis.Subgroup level network summaries may be particularly beneﬁcial in certain scenarios suchas fMRI block task experiments where it may be challenging for single-subject analyses todetect rapidly evolving network features induced via quick transitions between rest and taskblocks within the experimental design.We propose an approach that consolidates the time-varying clusters of samples underthe BPMM approach to detect subgroups which comprises samples with similar network-level dynamic connectivity patterns. In order to identify these subgroups, we ﬁrst createa N × N similarity matrix that measures the propensity of each pair of samples to belongto the same cluster over the experimental session. This matrix is created by examining theproportion of time scans during which a pair of samples belonged to the same cluster acrossthe experimental session, averaged across all edges. Once this similarity matrix has beencomputed, a K-means algorithm is applied to identify clusters of samples that exhibit similardynamic connectivity patterns across the experimental session. The number of clusters K isdetermined using some goodness of ﬁt score such as the elbow method (Thorndike, 1953), orit is ﬁxed as the maximum number of mixture components ( H ) under the BPMM approach.Finally, we note that the subgroup identiﬁcation step is not strictly needed under the pro-posed BPMM framework for dynamic network estimation, but it is an optional analysis thatcan be used to identify cluster-level network features in certain scenarios of interest. The estimated dynamic correlations in Sections 2.1-2.2 can be used to detect connectivitychange points in scenarios involving sharp changes in the network during the session, such asin fMRI task experiments. Our strategy involves computing change points for each individual14etwork (a) at the edge level that captures localized changes; and (b) at the global levelthat captures major disruptions in connectivity over the entire network. We compute thechange points using the total variation penalty (Bleakley and Vert, 2010) that was also usedin CCPD approach by Kundu et. al (2018). However the proposed idPAC and idPMACmethods are distinct from the two-stage CCPD approach; the latter estimates connectivitychange points based on empirical time-varying connectivity measures in the ﬁrst stage, andthen in the second stage, computes piecewise constant networks conditional on the estimatedchange points that represent connectivity jumps. In contrast, proposed idPAC and idPMACmethods pool information across samples in order to ﬁrst estimate dynamic correlations thatdoes not depend on change points and can vary continuously over time, and subsequently usesa post-processing step to compute connectivity change points without requiring piecewiseconstant connectivity assumptions. An appealing feature of the proposed mixture modelingframework guided by covariates is that it is more suitable for tackling divergent dynamicconnectivity across samples, in contrast to empirical correlations under the CCPD approach.Denote the vector of estimated (pairwise or partial) correlations over all edges for the i -th individual and at time scan t as (cid:98) r ( i ) t ∈ (cid:60) V ( V − / , t = 1 , . . . , T, i = 1 , . . . , N . Thenthe functional connectivity change points for the i -th individual may be estimated usingconnections across all edges via a total variation norm penalty that is deﬁned as || u ( i ) t +1 − u ( i ) t || = V ( V − / (cid:113)(cid:80) V ( V − / m =1 ( u ( i ) t +1 ,m − u ( i ) t,m ) . In particular, the following penalized criteriais used as in Kundu et al. (2018) for detecting network level connectivity change points: min u ∈(cid:60) V ( V − / T (cid:88) t =1 || (cid:98) r ( i ) t − u ( i ) t || + λ u T − (cid:88) t =1 || u ( i ) t +1 − u ( i ) t || , (5)where λ u represents the penalty parameter and u ( i ) t ∈ (cid:60) p ( p − / represents the piecewiseconstant approximation to the time series of correlations at time point t for the i -th individualthat also assumes the presence of an unknown number of connectivity jumps. The ﬁrst termin (5) measures the error between the observed correlations and the piece-wise constant15onnectivity, while the second term controls the temporal smoothness of correlations for V ( V − / || u ( i ) t +1 − u ( i ) t || in the second term becomes negligible whenthe multivariate time series does not change signiﬁcantly between times t and t + 1, but ittakes large values corresponding to signiﬁcant connectivity changes. The network changepoints computed via (5) represent global changes functional connectivity resulting from asubset of edges that exhibit large connectivity changes. It is important to note that not alledges are expected to exhibit changes at these estimated change points. When it is of interestto compute edge-level connectivity change points, one can simply use criteria (5) separatelyfor each edge, so that the total variation term translates to the L penalty. However, it isimportant to note that edge-level connectivity changes represent granular ﬂuctuations thatare typically more challenging to detect in the presence of noise in fMRI.The number of change points is determined by the penalty parameter λ u , with a smallervalue yielding a greater number of change points and vice-versa. Tibshirani and Wang (2007)proposed an estimate of λ u based on a pre-smoothed ﬁt of a univariate time series using alowess estimator (Becker et al., 1988). We adapt this approach for a multivariate time seriesto obtain an initial estimate for λ u , and then propose post-processing steps to tune thisestimate in order to obtain change points, as in the CCPD approach in Kundu et al. (2018).Full details for these steps are provided in Supplementary Materials.Cluster-level connectivity change point estimation: For fMRI task experiments involvingmultiple subjects, subgroups of individuals are expected to share analogous dynamic con-nectivity patterns with limited variations across samples, as discussed in Section 2.3. Theproposed total variation penalty norm in (5) is equipped to leverage information across sam-ples within a cluster for identifying cluster level change points, which reﬂect aggregateddynamic connectivity changes across all samples within a cluster at the global network level.These cluster level connectivity changes are obtained by aggregating the change points ob-tained via (5) applied separately to each sample within the cluster, and then choosing thosechange points that show up repeatedly within the cluster. One can deﬁne a threshold such16hat all change points that appear with a high frequency (above the chosen threshold) acrosssamples within the cluster are determined to represent cluster level change points (Kunduet al., 2018). We note that under the proposed method, it is entirely possible for individualswithin a cluster to have unique connectivity changes in addition to the common cluster levelchange points, which reﬂect within sample heterogeneity. In our experience, this methodtypically works well in accurately recovering aggregated cluster-level connectivity changes,in certain scenarios such as block task experiments, or more generally in the presence ofsubgroups of individuals with similar dynamic connectivity patterns. Although one can use Markov chain Monte Carlo (MCMC) to sample the parameters from theposterior distribution, we use a maximum-a-posteriori or MAP estimators for our purposesin this article that bypasses the computational burden under a MCMC implementation.The MAP estimators are obtained by maximizing the posterior distribution for the modelparameters and are derived via the Expectation-Maximization or EM algorithm. The EMalgorithm is scalable to high-dimensional fMRI applications of interest that requires one tocompute N × T distinct dynamic networks each involving V × V connectivity matrices. EM Algorithm:

Denote the matrix containing the fMRI time series for the l th node as Y l = ( y ,l , . . . , y T,l ) where y t,l = ( y (1) l,t , . . . , y ( N ) l,t ) (cid:48) represents the fMRI observations acrossall samples for node l and time scan t . Further, denote ∆ h as a latent indicator vari-able for the h th mixture component (that is not observed and is imputed in the proposedEM algorithm) and ﬁnally, denote by Θ jl the collection of all model parameters underthe speciﬁcation (1)-(2) corresponding to edge ( j, l ). Note that under the proposed model(2), one has an equivalent speciﬁcation under the binary latent variables distributed as17 ∆ ( i )1 ,jlt , . . . , ∆ ( i ) H,jlt ) ∼ M N (cid:0) , ( ξ ,jlt ( x i ; β h,jlt ) , . . . , ξ H,jlt ( x i ; β h,jlt )) (cid:1) , where M N (1; p ) denotes amultinomial distribution with probability vector p , B jlt = ( β ,jlt , . . . , β H − ,jlt ) and onecan marginalize out (∆ ( i )1 ,jlt , . . . , ∆ ( i ) H,jlt ) to recover the prior in (2). The EM algorithm usesthe augmented log-posterior derived in the Appendix involving the above latent mixtureindicators, to computer MAP estimates for the model parameters by iteratively apply-ing the Expectation (E) and Maximization (M) steps. The latent indicators { ∆ ( i ) h,jlt , h =2 , . . . , H, t = 1 , . . . , T, i = 1 , . . . , N } are imputed via the E-Step by using the posteriorprobability of γ ( i ) jl,t taking values from the h -th mixture component, which is denoted by ψ ( i ) h,jlt = P r (∆ ( i ) h,jlt = 1 | − ) and updated as: E-step : Compute the posterior expectation for the latent cluster membership indicators asˆ ψ ( i ) h,jlt = (cid:8) (cid:80) Hr =1 ξ r,jlt ( x i ; β h,jlt ) φ ( γ ( i ) jl,t | γ ∗ r,jlt , σ γ,h ) (cid:9) − (cid:8) ξ h,jlt ( x i ; β h,jlt ) × φ ( γ ( i ) jl,t | γ ∗ h,jlt , σ γ,h ) (cid:9) ,where φ ( γ ( i ) jl,t | γ ∗ , σ γ ) denotes the normal density with mean γ ∗ and variance σ γ .The remaining parameters are updated via M-steps using closed form solutions except γ ( i ) jl,t that is updated using Newton-Raphson steps. These M-steps comprise several math-ematically involved derivations and are detailed in the Appendix. The E and M steps arerepeated till convergence, which occurs when the absolute change in the log-posterior betweensuccessive iterations falls below a certain threshold (we use 10 − in our implementation). Let us denote the collection of all the precision matrices as Θ , and y ( i ) (cid:48) t, − v as the ( V − t over all nodes except node v . Theprior on the precision matrix can be expressed as π (Ω ( i ) t ) = (cid:81) Vv =1 π ( ω ( i ) t,vv ) π ( ω ( i ) vt ), with thecorresponding prior distributions π ( · ) being deﬁned in (3). Denote by | · | , the element-wise L norm, denote κ ( i )1 ,t = ω ( i ) t, − ω ( i ) (cid:48) ,t Ω ( i ) − ,t ω ( i )1 ,t to represent the conditional variancecorresponding to the fMRI measurements for the v th node given all other nodes, and let ω ( i ) t,vv and ω ( i ) v,t respectively denote the diagonal and the vector of oﬀ-diagonal elements of the v th row in Ω ( i ) t . Moreover use det ( A ) to denote the determinant of the matrix A , and write18 ( i ) t = y ( i ) t y ( i ) (cid:48) t =  s ( i ) t, s ( i )1 ,t s ( i ) (cid:48) ,t S ( i )11 ,t  as the matrix of cross-products of the response variable, where s ( i ) vv,t and s ( i ) v,t denote the v -th diagonal element and the oﬀ-diagonal elements for the v -th rowrespectively. Introduce latent indicator variables (∆ ( i )1 ,vt , . . . , ∆ ( i ) H,vt ) that follow a multinomialdistribution with probability vector ( ξ ,t ( x i ) , . . . , ξ H,t ( x i )) such that (cid:80) Hh =1 ξ h,t ( x i ) = 1.Denote by Ω ( i ) vv,t , the ( V − × ( V −

1) obtained by deleting the v -th row and columnfrom Ω ( i ) t . The EM algorithm uses an E step for the latent mixture indicators, as well as aMonte Carlo E step that samples from the posterior distribution in order to obtain estimatesfor the precision matrix. These steps are described below: E-step for mixture component indicator : For v = 1 , . . . , V, use the expression: ˆ ψ ( i ) h,vt = (cid:8) (cid:80) Hr =1 ξ r,t (cid:0) x i ; β h,t (cid:1) φ V − (cid:0) ω ( i ) v,t | ω ∗ r,t , σ ω,r I V − (cid:1)(cid:9) − × (cid:8) ξ h,t (cid:0) x i ; β h,t (cid:1) φ V − (cid:0) ω ( i ) v,t | ω ∗ h,t , σ ω,h I V − (cid:1)(cid:9) ,where φ V − ( · | ω ∗ , Σ) denotes the probability density function for the ( V − ω ∗ , Σ) respectively.

Monte Carlo E-step for precision matrix:

We use an E-step to update the precisionmatrix that computes the posterior mean by averaging MCMC samples drawn from theposterior distribution, which is equivalent to a Monte Carlo EM method (Wei and Tanner,1990). We use this Monte Carlo approximation for the conditional expectation since itprovides a computationally eﬃcient approach to sample positive deﬁnite precision matricesvia closed form posterior distributions. The posterior distribution for the precision oﬀ-diagonal elements are given as π ( ˆ ω ( i ) vt | − ) ∼ N (cid:34) V ω vt (cid:18) (cid:80) Hh =1 ∆ ( i ) h,vt ω ∗ h,t σ ω,h + 2( s ( i ) v,t ) (cid:19) , V ω v,t (cid:35) , where V ω vt = (cid:18) σ ω,h I V − + ( s ( i ) vv,t + α )(Ω ( i ) − vv,t ) + (cid:80) Hh =1 ∆ ( i ) h,vt σ ω,h (cid:19) − is the posterior covariance. Moreover,writing ω ( i ) t,vv = κ ( i ) v,t + ω ( i ) (cid:48) v,t Ω ( i ) − vv,t ω ( i ) v,t , the diagonal precision matrix elements are updated viathe posterior κ ( i ) vt ∼ GA ( + 1 , s ( i ) vv,t + α ) where α is pre-speciﬁed. The above steps can bealternated to sample positive deﬁnite precision matrices as in Wang (2012), and we drawseveral MCMC samples and average over them to approximate the conditional expectation.The remaining parameters are updated via closed form expressions under the M step,which involve mathematically involved derivations and are detailed in the Appendix. The19lgorithm iterates through the E and M steps until convergence. Certain tuning parameters in the BPMM need to be selected properly or pre-speciﬁed, inorder to ensure optimal performance. For both dynamic pair-wise correlations and precisionmatrix estimation, λ is the tuning parameter used in fused lasso penalty for the mixtureatoms that controls the temporal smoothness of the dynamic connectivity. We choose anoptimal value for λ over a pre-speciﬁed grid of values, as the value of the tuning parameterthat minimizes the BIC score. In model (1) for the dynamic pairwise correlation, the σ y isalso pre-speciﬁed as the initial mean variance over all edges and across all samples. Moreoverwhen updating covariate eﬀects, Σ β is pre-ﬁxed as a diagonal matrix with the diagonal termsas 1, although it is possible to impose a hierarchical prior on Σ β and update it using theposterior distribution. Extensive simulation studies revealed that the proposed approach isnot sensitive to the choices of Σ β as long as the variances are not chosen to be exceedinglysmall. Other hyper-parameters in the hierarchical Bayesian speciﬁcation include α in theprior on the precision matrices (chosen as in Wang (2012)), and a σ = 0 . , b σ = 1 , that resultsin an uninformative prior on the mixture variance.The number of mixture components H also needs to be chosen appropriately. On theone hand, a large value of H may be used to address inherent heterogeneity, but it willalso increase the running time and may generate redundant clusters that overcompensatesfor the variations across samples. On the other hand, a small value of H may restrictthe approach to overlook connectivity variations across individuals, resulting in inaccurateestimates. One may use a data adaptive approach to select H in certain scenarios where itis reasonable to assume that the dynamic connectivity can be approximated by piecewiseconstant connectivity. In such cases that potentially involve block task experiments (Kunduet al., 2018), one can evaluate criteria (5) separately for each individual under diﬀerent valuesof H , and ﬁx the optimal choice as that which minimizes the average value of the criteria (5)20cross all individuals. Based on extensive empirical studies, we noticed the need for largervalues for H when ﬁtting the model for cases involving large number of nodes and samples. Data generation: We generate observations from Gaussian distributions with sparse andpiecewise constant precision matrices that change at a ﬁnite set of change points. Moreover,the network change points are generated based on covariate information where individualswith identical covariates have partially overlapping connectivity change points. Broadly, weuse the following few steps to generate the data, each of which is described in greater detail inthe sequel: (i) generate a given number of change points for each subject using correspondingcovariate information; (ii) conditional on the generated change points, piecewise constantnetworks are simulated such that the connectivity changes occur only at the given changepoints; (iii) conditional on the network for a given state phase, a corresponding positivedeﬁnite precision matrix is generated for each time scan where non-zero oﬀ-diagonal elementsrepresent edge strengths and zero oﬀ-diagonals represent absent edges; and (iv) the responsevariable for a given time point is generated from a Gaussian distribution having zero meanand the precision matrix in step (iii). Four clusters are created with 10 samples each, wherethe samples with each cluster have the same number of connectivity change points, commonstate phase speciﬁc networks and identical covariate values. However within each cluster,there are diﬀerences in locations of connectivity change points and the network edge strengthsare free to vary across individuals even when they share the same network structure. Allsamples in the ﬁrst two clusters have 3 connectivity change points each, whereas the samplesin the other two clusters have 4 change points, out of a total of T = 300 time scans.Conditional on the change points in step (i), several types of networks are constructedfor each state phase in step (ii) that include: (a) Erdos Renyi network where each edge21an randomly appear with a ﬁxed probability; (b) small-world network, where the meangeodesic distance between nodes are relatively small compared with the number of nodesand which mimics several practical brain network conﬁgurations; and (c) scale-free networkthat resembles a hub network where the degree of network follows a power distribution. Giventhese networks, the corresponding precision matrix was generated in step (iii) by assigningzeros to oﬀ-diagonals for absent edges, and randomly generating edge weights from uniform[-1,1] for all important edges. To ensure the positive deﬁniteness, the diagonal values of theprecision matrix were rescaled by adding the sum of the absolute values of all elements ineach row with one. Finally, the response variables were generated either (a) independentlyat each time point via a Gaussian graphical model, or (b) via a vector autoregressive (VAR)model where the response variables are autocorrelated across time. In both cases, sparsetime-varying precision matrices having dimensions V = 40 , , were used.We generated two binary features that resulted in four distinct covariate conﬁgurations,i.e. (0,0), (0,1), (1,0), (1,1), and all samples with identical covariates were allocated tothe same cluster. In addition, we also evaluated the performance of proposed method inthe presence of spurious covariates that are not related to dynamic connectivity patterns.Speciﬁcally, we introduced anywhere between 1 to 8 spurious covariates for each sample (inaddition to the two true covariates described earlier), which were randomly generated usinguniform as well as from random normal distributions. We then investigated the performanceof the proposed approach over varying number of spurious covariates. While the proposedapproach is expected to work best in practical experiments involving a carefully selected setof covariates that inﬂuence dynamic connectivity patterns, our goal was also to investigatethe change in performance as the number of spurious covariates increase.Competing methods: We perform extensive simulation studies to evaluate the performanceof the proposed approach, and compare the performance with (a) change point estimationapproaches such as the CCPD (Kundu et al., 2018) that can estimate single subject con-nectivity using multi-subject data in the presence of limited heterogeneity, and the dynamic22onnectivity regression (DCR) approach for single subjects proposed in Cribben et al. (2013);(b) an empirical sliding window based approach (SD) and the model-based SINGLE (Montiet al., 2014) method that uses sliding window correlations; and (c) a covariate-naive versionof the proposed approach using the methods in Sections 2.1 and 2.2 (denoted as BPMM-PCand BPMM-PR respectively) that employs a multinomial distribution to model the mixtureweights without covariates. While methods in (a) and (c) are designed to report connectivitychange points, we augmented the sliding window approaches in (b) using a post-processingstep similar to (5) to compute change points based on the estimated sliding window correla-tions. Moreover for the proposed approach, the data under the VAR case was prewhitenedvia an autoregressive integrated moving average (ARIMA) before ﬁtting the proposed mod-els. In particular, the ‘ auto.arima ’ in R was used to prewhiten the raw data, which yieldedresiduals that were subsequently used for analysis. We note that it was not possible to reportresults under SINGLE for V = 100 due to an infeasible computational burden.Performance metrics: We evaluate the performance of diﬀerent approaches in terms of diﬀer-ent metrics. First, we investigated the accuracy in recovering true connectivity change pointsat the network and edge level for each sample, using sensitivity (deﬁned as the proportionof truly detected change points or true positives), as well as the number of falsely detectedchange points or false positives. In addition, the performance of the network connectivitychange points at the cluster level was also evaluated by comparing the true connectivitychange points for each sample within the cluster with the aggregated cluster level changepoints. We note that since there were variations in connectivity change points within eachcluster, false positive change points are to be expected under any estimation approach; how-ever our goal is to evaluate how well these false positives are controlled and the sensitivityin detecting true change points under diﬀerent methods. In addition, we also evaluated ac-curacy in terms of estimating the strength of connections that is computed as a squared loss(MSE) between the estimated and the true edge-level pairwise correlations. The pairwisecorrelations corresponding to dynamic precision matrix approaches for computing MSE were23btained by inverting the respective precision matrices.In order to evaluate the accuracy in dynamic network estimation, we computed the F-1score deﬁned as 2( Precision × Recall ) / ( Precision + Recall ), where Precision=

T P/ ( T P + F P )is deﬁned as the true positive rate, and Recall=

T P/ ( T P + F N ) represents the sensitivityin estimating the edges in the network. Here,

T P, F P, F N, refer to the number of truepositive, false positive, and false negative edges that are obtained via binary adjacencymatrices derived by thresholding the estimated absolute partial correlations. We employedreasonable thresholds (0.05) that are commonly used in literature (Kundu et al., 2018). Incontrast, it was not immediately clear how to choose such thresholds for pairwise correlationsgiven the fact that they tend to be typically larger in magnitude and have greater variability.Hence, we did not report F-1 scores corresponding to pairwise correlations, although onecould do so in principle by choosing suitable thresholds to obtain binary adjacency matrices.Finally, we also evaluated the clustering performance in terms of the clustering error (CE) andVariation of Information (VI). CE (Patrikainen and Meila, 2006) is deﬁned as the maximumoverlap between the estimated clustering with the true clustering, whereas VI (Meilˇ a , 2007)calculates the entropy associated with diﬀerent clustering conﬁgurations. The performance in terms of recovering the true clusters of subjects is provided in Table1, in the presence of two covariates that are both related to the true connectivity changes.It is clear from the results that incorporating covariate information results in near perfectrecovery of the clusters, in contrast to the covariate-naive version of the method. For V =100, the dynamic pairwise correlation approach seems to have a slightly higher accuracy interms of cluster recovery compared to the dynamic precision matrix approach when data isgenerated from a VAR model. Table 2 reports the accuracy in recovering the true network-level change points under the proposed approaches at the level of the estimated clusters,as per discussions in Section 2.4. In this case, both idPAC and idPMAC methods are24dPAC BPMM-PCV=40 V=100 V=40 V=100CE VI CE VI CE VI CE VIGGM+Erdos-Renyi 0 0 0 0 0.64 1.93 0.62 2.19GGM+Small-world 0 0 0 0 0.57 1.92 0.71 2.23GGM+Scale-free 0 0 0 0 0.63 2.01 0.66 2.19VAR+Erdos-Renyi 0 0 0 0 0.61 1.93 0.67 1.97VAR+Small-World 0 0 0 0 0.59 1.88 0.61 1.90VAR+Scale-Free 0 0 0 0 0.61 1.78 0.61 1.93idPMAC BPMM-PRV=40 V=100 V=40 V=100GGM+Erdos-Renyi 0 0 0 0 0.43 1.41 0.54 1.59GGM+Small-world 0 0 0 0 0.41 1.41 0.51 1.68GGM+Scale-free 0 0 0 0 0.43 1.49 0.60 1.78VAR+Erdos-Renyi 0.08 0.25 0.04 0.17 0.54 1.51 0.66 1.88VAR+Small-World 0 0 0.03 0.14 0.48 1.47 0.58 1.91VAR+Scale-Free 0 0 0.04 0.11 0.49 1.42 0.63 1.75Table 1: Clustering performance under diﬀerent network types. GGM implies that theGaussian graphical model was used to generate temporally uncorrelated observations; VARimplies a vector autoregressive model that was used to generate temporally dependent ob-servations. For the VAR case, the observations were pre-whitened before ﬁtting the model.shown to have near perfect recovery of the true network connectivity change points whendata is generated under GGM, and high sensitivity when data is generated under VAR.Moreover when using data from a VAR model, the idPAC method has a comparable orhigher sensitivity but also higher false positives for V = 100 in terms of detecting connectivitychange points at the cluster level, compared to the idPMAC method. We note that althoughall samples within a cluster had identical covariate information, the proposed approach wasable to accommodate within cluster connectivity diﬀerences that is evident from low falsepositives and high sensitivity when estimating cluster level change points. Moreover asseen from Tables 3-4, the accuracy in recovering cluster level connectivity change points isconsiderably higher than the corresponding results at the level of individual networks. Theseresults indicate the usefulness of aggregating information when it is reasonable to assume theexistence of subgroups of individuals who share some similar facets of dynamic connectivity.Table 3 reports the performance under pair-wise correlation based approaches, i.e. idPAC,25dPAC idPMACV=40 V=100 V=40 V=100sens FP sens FP sens FP sens FPGGM+Erdos-Renyi 1 2.15 0.99 1.58 0.97 3.94 0.99 3.18GGM+Small-world 0.97 2.11 1 1.59 0.99 4.18 0.98 3.17GGM+Scale-free 0.99 2.09 1 1.37 1 3.91 0.97 3.09VAR+Erdos-Renyi 0.91 3.71 0.88 3.66 0.87 3.47 0.87 2.89VAR+Small-world 0.84 3.44 0.8 3.09 0.82 3.45 0.81 2.98VAR+Scale-free 0.88 3.29 0.84 3.68 0.85 3.3 0.81 3.01Table 2: Cluster-based network change point estimation under the proposed approaches,assuming that all samples within a particular cluster have the same number and similarlocation of change points, with a limited degree of heterogeneity in functional connectivity.If this assumption holds, then the cluster level network change point estimation providesgreater accuracy compared to the estimated change points at the level of individuals asreported in subsequent Tables.BPMM-PC, SD, and CCPD. It is clear for the results that the proposed idPAC method has anear perfect sensitivity when data is generated under GGM, and a suitably high sensitivityunder the VAR model, when estimating connectivity change points. The sensitivity fornetwork and edge change point estimation, along with the MSE in estimating the pairwisecorrelations are signiﬁcantly improved under idPAC compared to competing approaches inTable 3. The CCPD method is shown to have the lowest false positives when estimating thenetwork level change points, but otherwise has poor sensitivity for change point estimationand high MSE, which is potentially due to the assumption of piecewise constant connectivity.The approach based on sliding window correlations has the poorest performance across allthe reported metrics, which illustrates their drawback in estimating dynamic connectivity.Table 4 reports the performance under precision matrix based approaches, i.e. idPMAC,BPMM-PR, SINGLE, and DCR. The results under the SINGLE method is not reportedfor V = 100 due to infeasible computational burden. It is evident that the proposed idP-MAC method has near-perfect or high sensitivity for detecting network level change points,corresponding to data generated under GGM and VAR models respectively. It also has asuitably high sensitivity for detecting edge level connectivity change points under both cases.Similarly, the MSE for edge strength estimation and the F-1 scores for network estimation26 esults for V=40 Network CP Edge CP MSE Network CP Edge CP MSEsens FP sens FP MSE sens FP sens FP MSEBPMM-PC idPACGGM+Erdos-Renyi 0.91 7.31 0.50 1.12

GGM+Small-world 0.92 5.99 0.47 1.03 0.12

GGM+Scale-free 0.91 7.29 0.49 1.19 0.12 SD+GFL CCPDGGM+Erdos-Renyi 0.3 3.13 0.09 2.97 0.29 0.92 . . . VAR+Small-world 0.66 5.97 0.47 1.14 0.19

VAR+Scale-free 0.59 5.51 0.39 1.02 0.17

SD+GFL CCPDVAR+Erdos-Renyi 0.41 7.72 0.13 3.06 0.26 0.55

Results for V=100

Network CP Edge CP MSE Network CP Edge CP MSEsens FP sens FP MSE sens FP sens FP MSEBPMM-PC idPACGGM+Erdos-Renyi 0.92 4.77 0.51 1.31 0.11 VAR+Small-world 0.59 6.03 0.41 1.02 0.14

VAR+Scale-free 0.62 5.49 0.44 0.99 0.15

SD+GFL CCPDVAR+Erdos-Renyi 0.37 8.03 0.1 3.14 0.15 0.55 V = 40 , esults for V=40 Network CP Edge CP MSE F1 Network CP Edge CP MSE F1sens FP sens FP MSE F1 sens FP sens FP MSE F1BPMM-PM idPMACGGM+Erdos-Renyi 0.85 6.99 0.32 1.04 0.1 0.79

GGM+Small-world 0.88 7.14 0.33 1.16 0.08 0.77

GGM+Scale-free 0.87 7.36 0.33 1.19 0.08 0.71 .

97 5.6 0.77 0.92

DCR SINGLEGGM+Erdos-Renyi 0.22 16.15 0.41 9.39 0.27 0.59 0.35 6.49 0.1 2.84 0.08 0.71GGM+Small-world 0.19 11.83 0.49 9.66 0.22 0.61 0.32 6.55 0.09 2.88 0.07 0.77GGM+Scale-free 0.21 10.92 0.49 9.058 0.23 0.62 0.33 6.01 0.09 2.94 0.07 0.69BPMM-PM idPMACVAR+Erdos-Renyi 0.66

VAR+Small-world 0.59 5.12 0.27

VAR+Scale-free 0.61 4.77 0.31

DCR SINGLEVAR+Erdos-Renyi 0.22 9.83 0.4 3.35 0.24 0.64 0.42 7.35 0.13 3.11 0.27 0.66VAR+Small-world 0.24 10.14 0.33 3.61 0.23 0.63 0.44 7.12 0.17 3.04 0.26 0.62VAR+Scale-free 0.21 9.98 0.32 3.61 0.22 0.59 0.38 6.77 0.21 3.36 0.23 0.6

Results for V=100

Network CP Edge CP MSE F1 Network CP Edge CP MSE F1sens FP sens FP MSE F1 sens FP sens FP MSE F1BPMM-PM idPMACGGM+Erdos-Renyi

GGM+Small-world

GGM+Scale-free

DCR SINGLEGGM+Erdos-Renyi 0.33 16.14 0.41 9.39 0.22 0.63GGM+Small-world 0.31 15.88 0.4 9.66 0.27 0.59 NAGGM+Scale-free 0.34 16.82 0.39 10.08 0.27 0.64BPMM-PM idPMACVAR+Erdos-Renyi 0.73 4.41 0.29 1.18 0.14 0.77

VAR+Small-world 0.56 5.22 0.22

VAR+Scale-free 0.59 5.13 0.29 1.03 0.11 0.78

DCR SINGLEVAR+Erdos-Renyi 0.23 9.92 0.43 3.19 0.16 0.64VAR+Small-world 0.31 10.23 0.37 3.37 0.19 0.67 NAVAR+Scale-free 0.25 10.23 0.38 3.61 0.18 0.65Table 4: Results under the dynamic precision matrix estimation approaches for network andedge-level connectivity change-point estimation (Edge CP) accuracy and network change-point (Network CP) estimation accuracy for V = 40 , V = 20, and the run time was around 26minutes for this method with two covariates, with 40 individuals. Similarly, when V = 40and T = 300, the average computation time is around 80 minutes with 40 subjects withoutcovariates. The proposed method was scalable to V = 100 and T = 300, unlike the SINGLEapproach whose average computation time was around 6 hours. The total computationtime under BPMM is expected to increase with V, T, N, which is true for any method thatcomputes dynamic connectivity at the level of each individual.30igure 3: Performance of dynamic pairwise correlation (columns 1 and 2) and dynamicprecision matrix (columns 3 and 4) methods under diﬀerent number of spurious covariatesrepresented by the X-axis. Lines with diﬀerent color represent diﬀerent network structure:Green (Erdos Renyi), Red (Small World), Blue (Scale Free). The top row provides the in-formation of clustering performance (Clustering Error and Variation of Information), themiddle row demonstrates the performance of network level change points estimation (sensi-tivity and number of False Positive estimations), and the performance of edge level changepoint estimation was provided in the bottle row.31

Analysis of Task fMRI Data

We analyze a block task data involving a semantic verbal ﬂuency at Veterans Aﬀairs Centerfor Visual and Neurocognitive Rehabilitation, Atlanta. In a 12-week randomized controlledtrail, 33 elderly individuals (aged 60-80, 11 males, 22 females) were assigned to two inter-vention groups: spin aerobic exercise group (14 participants) and the non-aerobic exercisecontrol group (19 participants). During the intervention, individuals belonging to the aerobicspin group were required to do 20-45 minutes of spin aerobic exercise three times a week, ledby a qualiﬁed instructor. For control group, participants were asked to do the same amountof non-aerobic exercise per week, such as group balance and light muscle toning exercise. Amore detailed description of the data is available in Nocera et al. (2017).For each participant, fMRI scans were conducted with 6 blocks of semantic verbal ﬂuency(task) conditions with 8 scans, both pre- and post-intervention. The semantic verbal ﬂuencytask involved participants looking at diﬀerent categories (e.g. “colors”) at the center ofvideo screen and they were asked to generate and speak 8 diﬀerent objects associated withthat category (e.g. “blue”). After task block, a rest block with 3-5 TRs would appear andparticipants were required to read the word “rest” out loud. A total of 74 brain scans wereacquired using a 3T Siemens Trio scanner with a whole-brain, 1-shot gradient EPI scan(240mm FOV, 3.75 × We performed the analysis separately for the pre-intervention and post-intervention data,under both the dynamic pairwise correlations and dynamic precision matrix estimation meth-ods. We used age and gender as covariates for the pre-intervention dataset, while also usingthe type of intervention (spin or non-aerobic control) as an additional covariate for the post-intervention analysis. Our analysis is designed to: (i) investigate the clustering behavior and33nspect how these clusters diﬀer with respect to demographics and the intervention type;(ii) investigate the cluster-level network diﬀerences using network summary measures; (iii)estimate the connectivity change points and examine how well they align with the changesdictated by the block task experiment; (iv) infer nodes and edges in the network with sig-niﬁcantly diﬀerent connectivity patterns between pre- and post-intervention.Objective (i) enables us to characterize homogeneous dynamic connectivity patterns cor-responding to clusters of samples in terms of their demographic and clinical characteristics;aim (ii) will be instrumental in interpreting the cluster-level network diﬀerences that willshed light on network variations across transient network states; aim (iii) will provide insightsregarding the eﬀectiveness of the proposed approaches in terms of recovering connectivityjumps where these changes are inﬂuenced by, but often not fully aligned with, the changes inthe block task experimental design (Hindriks et al., 2016; Kundu et al., 2018); and aim (iv)will inform investigators regarding dynamic connectivity diﬀerences that are associated withthe type of intervention. For aim (ii), we were only able to report results under dynamicprecision matrix estimation, since a graph theoretic framework is necessary to compute thenetwork summary measures, which may not be feasible under a pairwise correlation analysis.

Cluster analysis: As seen from Table 6, the analysis under both idPAC and idPMAC methodsyielded 5 clusters consolidated over all time scans (using the K-means algorithm described inSection 2.3), although the size of the clusters were more equitable under the idPAC method.The pre-intervention analysis yielded clusters that were largely homogeneous with respectto gender. These clusters were also reasonably well-separated with respect to age under theidPAC analysis, whereas the age of the participants within clusters were more diverse underthe idPMAC analysis. The post-intervention analysis yielded more heterogeneous clusterswith respect to both age and gender, with only one cluster comprising all males under boththe idPAC and idPMAC analyses. This suggests a realignment of the dynamic connectivity34fter the intervention is administered, such that individuals with similar genders and age-groups have synchronous dynamic connectivity patterns pre-intervention as identiﬁed viasubgroups, but the subgroups and their composition with respect to age and gender changepost-intervention. Our post-intervention analysis also suggests that the variability acrossclusters under the idPAC method can be largely explained via the intervention type.Method idPAC idPMACCluster index 1 2 3 4 5 1 2 3 4 5Cluster features Pre-intervention Pre-interventionSize 8 6 8 7 4 3 5 17 6 2% of females 0 100 0 14 100 0 100 0 100 0Age (mean) 72.2 65.8 64.7 76.7 67.7 71.7 69 70.4 66.8 67Age(range) 69-73 60-72 60-68 74-80 66-69 63-78 62-80 60-80 60-72 66-68CP(Task-Rest) 6 3 4 4 4 4 5 5 4 3CP(Rest-Task) 3 5 2 3 4 4 4 2 4 3Post-intervention Post-interventionSize 8 4 7 11 3 3 4 9 11 6% of females 63 75 0 18 33 67 100 0 9 67Age (mean) 67.3 65 65.1 74.5. 73.7 73.7 69.3 68.6 73 62.7Age(range) 62-70 60-71 60-68 71-80 68-78 67-80 68-72 63-78 68-80 60-66CP(Task-Rest) 5 6 4 3 6 3 3 5 5 5CP(Rest-Task) 3 5 2 2 4 2 5 2 4 2Spin(%) 0 100 100 0 100 33 0 100 9 50Table 6: Results for analysis of block task fMRI experiments. Size refers to the numberof participants in each cluster, ‘CP(Task-Rest)’ and ‘CP(Rest-Task)’ denotes the clusterlevel connectivity change points that were detected within +/- 2 time scan of the change inexperimental design from task to ﬁxation, and from ﬁxation to task, respectively. ‘Spin’ refersto the percentage of individuals assigned the Spin intervention belonging to each cluster.Connectivity change point estimation: Table 6 illustrates the cluster level connectivitychange point estimation. We observed that under both the idPAC and idPMAC methods,the estimated change points were consistent with 4 or more (out of 6) changes in experimen-tal design when transitioning from task to rest, except one cluster where 3 of the connectivitychange points aligned with the experimental design. These patterns were consistent in boththe pre- and post-intervention analysis; however the number of connectivity change points35hat were strongly aligned with changes in the experimental design were (on average) greaterin the post-intervention analysis compared to the pre-intervention analysis. This suggestsa learning eﬀect of the task that was reﬂected in terms of higher concordance between theconnectivity change points and the experimental design post-intervention. On the otherhand, the cluster-level estimation of change points when transitioning from ﬁxation to taskwas (on average) less aligned with the experimental design compared to the change pointswhen transitioning from task to ﬁxation, as seen in Table 6. This is somewhat expectedsince there were only 3-5 time scans in each ﬁxation block, which made it extremely chal-lenging to detect connectivity changes when transitioning from ﬁxation to task. However,the proposed approach was still able to detect at least two, and often 3 or more connectivitychange points (out of 6) aligned with the experimental design that suggests a reasonableconcordance between connectivity jumps and experimental transitions from ﬁxation to task.In contrast, the CCPD approach detected at most one or two connectivity change points,while the DCR method was not able to detect connectivity change points at all, which makesthese results appear biologically impractical given the nature of the block task experiment.Although the changes in connectivity are not expected to be fully aligned with changes inthe experimental design (Hindriks et al., 2016), one expects a certain degree of synchronicitybetween the two. Our results indicate that this is not captured at all via existing change pointmethods especially when there are rapidly occurring transitions in the experimental design,which highlights their limitations. Hence, our analysis clearly illustrates the advantages ofpooling information across heterogeneous samples and incorporating covariate knowledge viaa mixture modeling framework, which is simply not possible using existing approaches thatrely on information from single subjects as in DCR, or that use empirical methods to poolinformation across individuals as in CCPD.Cluster level network diﬀerences: In order to investigate the diﬀerences between the networkscorresponding to the diﬀerent clusters, we examined variations in dynamic network metricsthat capture modes of information transmission in the brain. These network metrics include36he characteristic path length (CPL) that measures the length of connections between nodes,and the mean clustering coeﬃcient (MCC) that measures the clustering tendency averagedover all network nodes. Using permutation testing, we examined p-values to evaluate whichpairs of clusters exhibited signiﬁcantly diﬀerent network summary measures. None of theclusters had signiﬁcantly diﬀerent CPL values in the pre-intervention analysis, but severalpairs of clusters exhibited signiﬁcant CPL diﬀerences post-intervention. The CPL diﬀerenceswere particularly pronounced between the ﬁrst and remaining clusters, as well as the last andremaining clusters in the post-intervention analysis. These two clusters also demonstratedthe highest within cluster variability in CPL values amongst all clusters. Moreover, thenumber of pairs of clusters with signiﬁcantly diﬀerent MCC values increased from the pre-intervention to post-intervention analysis, with 8 out of 10 pairs of post-intervention clustersreporting signiﬁcantly diﬀerent MCC values compared to at least one other cluster. Hence,our results suggest greater variability in network organization between clusters in the post-intervention analysis compared to pre-intervention, which potentially reﬂects greater networkheterogeneity after the 12 week intervention was administered.Network diﬀerences pre- and post-intervention: We applied paired t-test with multiplicityadjustment in order to infer which edges were signiﬁcantly diﬀerent between pre- and post-intervention at 5% level of signiﬁcance, along with identifying which network nodes containedthe greatest number of diﬀerential edges. Since the magnitude of the pairwise correlationsand the corresponding edge strength diﬀerences were higher, we discovered higher numberof edges with diﬀerential edge strengths under the idPAC analysis. For both the idPACand idPMAC methods, the bulk of the pre- vs post-intervention connectivity diﬀerenceswere concentrated in individuals in the spin group exclusively that were not present in thecontrol group. We obtained 57 signiﬁcantly diﬀerent edges under the idPAC analysis, and38 signiﬁcantly diﬀerent edges under the idPMAC analysis, which were exclusive to the spingroup - see Figure 4. In contrast, the number of signiﬁcantly diﬀerent edges between thepre- and post-intervention networks under the idPAC analysis were 20 corresponding to both37he spin and control groups, and 7 corresponding to the control group only. Moreover theidPMAC analysis did not produce any signiﬁcant edge level diﬀerences between the pre-and post-intervention networks corresponding to both the intervention groups as well as forthe control group only. Our results suggest a considerably strong realignment in dynamicconnectivity after the 12-week intervention that were exclusive to the spin group, comparedto negligible changes in the control group.The changes between the pre-vs post intervention networks that occurred exclusivelyin the spin group under idPAC analysis were concentrated in the following brain regions:Right Angular Gyrus(8 edges), Left Precuneus(10 edges), Right Cerebellum(9 edges), RightMiddle Temporal Gyrus(11 edges), and Right Middle Temporal Gyrus(8 edges). Similarly thefollowing brain regions had the highest number of diﬀerential edges pre- vs post-interventionunder the idPMAC analysis: Right Middle Frontal Gyrus(16 edges), Right Cerebellum(6edges), Right Pars Triangularis/MFG(8 edges), and Right Middle Temporal Gyrus(7 edges).Two nodes, Right Cerebellum and Right Middle Temporal Gyrus had a large number ofsigniﬁcantly diﬀerential edges under both idPAC and idPMAC analyses, while the rightmiddle frontal gyrus had, by far, the largest number of diﬀerential edges (16) under thedynamic precision matrix analysis. In addition, we also observe that more nodes in righthemisphere of the brain have signiﬁcantly diﬀerential connectivity, which is to be expectedsince the majority of the 18 brain regions being investigated lie in the right hemisphere.The large number of diﬀerential connections with respect to the right cerebellum is be-lieved to be attributable to the generation of internal models or context speciﬁc propertiesof an object (Moberget et al., 2014), and preferential activation during a semantic challenge(D’Mello et al., 2017). The connectivity between the right cerebellum and inferior frontalregions has been noted in earlier studies (Balsters et al., 2013), with the inferior frontalregions being responsible for ordering language and codifying the motor output for syntax(Balsters et al., 2013). Moreover, the diﬀerential connectivity in the right middle temporalgyrus is along the lines of earlier ﬁndings that illustrated the role of the left temporal gyrus38igure 4: Circle plots for the edges that are signiﬁcantly diﬀerent pre- and post-interventionin spin group but not in the control group. The top and bottom panel correspond tothe results under dynamic pairwise correlation and dynamic precision matrix estimationincorporating covariates, respectively. Red and blue lines correspond to lower or higher edgestrengths in the pre-intervention network compared to post-intervention. RC1 and RC2 referto the two brain regions in the right cerebellum; RMTG1-RMTG3 refer to the three brainregions in the right middle temporal gyrus; and LP1-LP2 refer to the two regions in the leftprecuneus. The MNI coordinates for these regions are provided in the Figure legend.39s a hub for integration of sensory input into a transformation to semantic forms (Daveyet al., 2016), and the corresponding connectivity diﬀerences in the right middle temporalgyrus may be attributable to a shift in laterality of involvement (Lacombe et al. 2015) dueto aging. Finally, the large number of diﬀerential edges corresponding to the right middlefrontal gyrus is potentially associated with semantic priming in older adults (Laufer et al.,2011). Given that this region is associated with executive function (Wang et al., 2019; Jolleset al., 2013) and is well characterized as being involved in working memory tasks, it is likelyfor connectivity diﬀerences to be focused on this region since the semantic task requires acontinuous reference to working memory.

In this article, we developed a novel approach that accurately estimates a population ofsubject-level dynamic networks by pooling information across multiple subjects in an un-supervised manner under a mixture modeling framework using covariates. The proposedapproach, which is one of the ﬁrst of its kind in dynamic connectivity literature, results insigniﬁcant gains in dynamic network estimation accuracy, as illustrated via extensive numer-ical studies. The gains under the proposed method become particularly appealing comparedto existing approaches in the presence of rapid transitions in connectivity as evident fromour fMRI block task analysis. The proposed approach works best in fMRI task experimentsinvolving a group of heterogeneous individuals executing the same task protocols, and in thepresence of a carefully chosen set of covariates that are related to the dynamic network.We also illustrate the robust performance of the proposed approach in the presence of alimited number of covariates that are not related to changes in connectivity, although theperformance deteriorates as the number of spurious covariates increase. In the presence of alarge number of features that may not be necessarily related to dynamic connectivity, one canperform a screening step to exclude unimportant predictors from the analysis. This step will40nvolve examining the associations between each covariate and the dynamic connectivity es-timates obtained from the covariate naive BPMM approach, and subsequently only retainingthe covariates with signiﬁcant associations for analysis using the full model. This approach isexpected to work well as long as the screening step does not exclude any important covariatesand manages to largely ﬁlter out spurious covariates that are unrelated to the network. Infuture work, we plan to extend the proposed approach to incorporate feature selection thatautomatically identiﬁes signiﬁcant covariates that are related to the dynamic networks, anddown-weights the contribution of unimportant covariates using Bayesian shrinkage priors.In addition to identifying important connectivity changes, during the fMRI block taskexperiment, our analysis conclusively established major changes between the pre- and post-intervention networks that were exclusive to the spin group. We note that existing literaturehas established the role of cardiovascular ﬁtness in regulating aging related declines in bothlanguage and motor control (McGregor et al., 2011, 2013). However, much less is knownabout the eﬀect of exercise intervention on dynamic connectivity, particularly in older adults.Because connectivity is a fundamental aspect of neuronal communication required for high-level cognitive processes, it is important to understand the potential impact of aging and/oraerobic exercise interventions in aging on changes in brain connectivity.Further, our analysis also discovered subgroups of individuals with homologous dynamicconnectivity, where the heterogeneity within these subgroups with respect to interventionwas higher under the idPMAC method compared to the idPAC analysis. This indicatesthat dynamic pairwise correlations were more accurate in classifying participants in termsof the intervention administered. It is important to note that the separation of clusters withrespect to intervention reﬂects the distinct patterns of dynamic connectivity between the 18brain regions speciﬁed in our study that are known to be diﬀerentially activated in spin andcontrol groups (Nocera et al., 2017). However, if additional regions are included that maynot be necessarily associated with intervention type, it is entirely possible to obtain moreheterogeneous clusters that have a more equitable composition with respect to intervention41roup. This is due to the presence of network edges between regions that are not necessarilyassociated with intervention and hence behave similarly in both the spin and control groups.Future work will focus on a more general analysis involving a larger number of cannonicalregions known to be associated with the semantic language function.

Supplementary Materials

The Supplementary Materials contain additional details corresponding to the M-steps fordynamic pairwise correlations and partial correlations, as well as details for selecting thetuning parameter in (5) for change point estimation corresponding to Section 2.4.

Acknowledgements

The views expressed in this work do not necessarily reﬂect those of the National Insti-tutes of Health, Department of Veterans Aﬀairs or the United States Government. Thework was supported by NIMH award number R01MH120299 (SK), and VA research awards:IK2RX000956 (KMM); IK2RX000744 (JN).

Data and Code Availability

A portion of the data presented in this work is property of the United States Department ofVeterans Aﬀairs. Copies of the de-identiﬁed data can be made available upon written requestto the corresponding author and Department of Veterans Aﬀairs. The code for implementingthe proposed approaches are available here: https://github.com/Emory-CBIS/BPMM thics Statement Study procedures were approved by the institutional review board of Emory University,informed consent was obtained for experimentation with human subjects, and procedureswere consistent with the Declaration of Helsinki.

Appendix

Posterior Distribution for Dynamic Pairwise Correlations

Here, we derive the log-posterior distribution that is used in the EM algorithm to deriveparameter estimates. The augmented log-posterior distribution for Θ jl under (1)-(2) is:log( π ( Θ jl | Y )) ∝ log (cid:18) P ( Θ jl ) P ( Y | Θ jl ) (cid:19) = H (cid:88) h =1 T (cid:88) t =1 log( π ( γ ∗ h,jlt )) + H (cid:88) h =1 log( π ( σ γ,h ))+ H − (cid:88) h =1 T (cid:88) t =1 log( π ( β h,jlt )) + (cid:88) i =1 ...N (cid:88) t =1 ...T log (cid:18) P ( y ( i ) jt , y ( i ) lt | γ ( i ) jl,t , σ y ) × π ( γ ( i ) jl,t ) (cid:19) ∝ N (cid:88) i =1 T (cid:88) t =1 (cid:20) −

12 log (cid:26) − (cid:18) exp(2 γ ( i ) jl,t ) − γ ( i ) jl,t ) + 1 (cid:19) (cid:27) − (cid:26) ( y ( i ) jt ) + ( y ( i ) lt ) − (cid:0) exp(2 γ ( i ) jl,t ) − γ ( i ) jl,t )+1 (cid:1) y ( i ) jt y ( i ) lt σ y (cid:18) − ( exp(2 γ ( i ) jl,t ) − γ ( i ) jl,t )+1 ) (cid:19) (cid:27) − H (cid:88) h =1 (cid:26) σ γ,h ∆ ( i ) h,jlt ( γ ( i ) jl,t − γ ∗ h,jlt ) + ∆ ( i ) h,jlt log( σ γ,h ) (cid:27) + H − (cid:88) h =1 ∆ ( i ) h,jlt x T i β h,jlt − log (cid:0) H − (cid:88) r =1 e x T i β r,jlt (cid:1)(cid:21) + H − (cid:88) h =1 T (cid:88) t =1 log( π ( β h,jlt )) + H (cid:88) h =2 (cid:26)(cid:18) − λ | γ ∗ h,jl,t − γ ∗ h,jl,t − | (cid:19) − ( a σ + 1) log( σ γ,h ) − b σ σ γ,h (cid:27) , (6)where log( π ( β h,jlt )) = − β Th,jlt Σ − β β h,jlt − log( det (Σ β )) represents the logarithm of the priordistribution on the covariate eﬀects. The detailed computational steps for deriving theMAP estimates corresponding to the above posterior distribution are discussed in Section 3.43 osterior Distribution for Dynamic Precision Matrices The augmented log-posterior distribution for the model parameters can be written as log( Θ | Y (1) , . . . , Y ( N ) ) ∝ N (cid:88) i =1 T (cid:88) t =1 log (cid:18) P ( y ( i ) t | Ω ( i ) t ) V (cid:89) v =1 π ( ω ( i ) t,vv | α ) π ( ω ( i ) vt | ω ∗ ,vt , . . . , ω ∗ H,vt , σ ω, , . . . , σ ω,H ) (cid:19) + H (cid:88) h =1 H (cid:88) v =1 T (cid:88) t =1 log( π ( ω ∗ h,vt )) + H (cid:88) h =1 log( π ( σ ω,h )) ∝ N (cid:88) i =1 T (cid:88) t =1 (cid:20) log det(Ω ( i )11 ,t ) − y ( i ) (cid:48) t, − Ω ( i )11 ,t y ( i ) t, − (cid:21) + N (cid:88) i =1 T (cid:88) t =1 (cid:20) − log κ ( i )1 ,t − (cid:0) s ( i )11 ,t + α (cid:1) κ ( i )1 ,t − ω ( i ) (cid:48) t (cid:18) σ ω,h I V − + ( s ( i )11 ,t + α )Ω ( i ) − ,t (cid:19) ω ( i )1 t + 2 s ( i ) (cid:48) ,t ω ( i )1 t (cid:21) − (cid:26) H (cid:88) h =1 V (cid:88) v =1 σ ω,h ∆ ( i ) h,vt ( ω ( i ) v,t − ω ∗ h,t ) (cid:48) ( ω ( i ) v,t − ω ∗ h,t ) (cid:27) − V ( V − (cid:26) H (cid:88) h =1 ∆ ( i ) h,vt log( σ ω,h ) (cid:27) + V (cid:88) v =1 (cid:26) H − (cid:88) h =1 ∆ ( i ) h,vt ( x T i β h,t ) − log (cid:18) H − (cid:88) r =1 exp ( x T i β r,t ) (cid:19)(cid:27)(cid:21) + T (cid:88) t =1 V (cid:88) v =1 H − (cid:88) h =1 log( π ( β h,t ))+ H (cid:88) h =1 (cid:26) T (cid:88) t =1 (cid:18) − λ | ω ∗ h,t − ω ∗ h,t − | (cid:19) − ( a σ + 1) log( σ ω,h ) − b σ σ ω,h (cid:27) , (7)where log( π ( β h,t )) = − β Th,t Σ − β β h,t − log( det (Σ β )) represents the logarithm of the prior distri-bution on the covariate eﬀects. The EM algorithm to derive the MAP estimators for modelparameters is based on the expression for the above log-posterior (see Section 3). M-steps for dynamic pairwise correlations

M-step for mixture atoms:

Denote γ h,jl = ( γ h,jl, , . . . , γ h,jl,T ), ¯ γ h,jl,t = (cid:80) Ni =1 ˆ ψ ( i ) h,jlt (cid:80) Ni =1 ˆ ψ ( i ) h,jlt γ ( i ) jl,t , w h,jlt = (cid:80) Ni =1 ˆ ψ ( i ) h,jlt σ γ,h , and ¯ γ ( w ) h,jl,t = √ w h,jlt ¯ γ h,jl,t . Further denote | · | as the element-wise L norm, and denote ˜ η h,jl = (˜ η h,jl, , ˜ η h,jl, , . . . , ˜ η h,jl,T − ), ˜ η h,jl, = γ ∗ h,jl, , ˜ η hjl,t − = γ ∗ h,jl,t − γ ∗ h,jl,t − . Then, using the derivations presented in the Supplementary Materials, (cid:98) γ ∗ h,jl =arg min (cid:80) Tt =1 ( √ w h,jlt ¯ γ h,jl,t − √ w h,jlt γ ∗ h,jl,t ) + λ (cid:80) T − t =1 | γ ∗ h,jlt − γ ∗ h,jl,t − | = arg min || ¯ γ ( w ) h,jl − M h,jl ˜ η h,jl || + λ (cid:80) T − t =0 | ˜ η h,jl,t | , where the T × T matrix ˜ M h,jl has the following form˜ M h,jl =  √ w h,jl, . . . √ w h,jl, √ w h,jl, . . . √ w h,jl, √ w h,jl, . . . √ w h,jl,T √ w h,jl,T √ w h,jl,T  . The solution can be obtained using a Lasso algorithm with the penalty parameter λ beingchosen using BIC. The solutions for ˜ η h,jl can be directly used to recover the estimates for γ ∗ h,jl = ( γ ∗ h,jl, , . . . , γ ∗ h,jl,T ), which in turn yields the dynamic connectivity estimates. M-step for mixture variance:

Use the closed form solution to estimate ( h = 1 , . . . , H ):ˆ σ γ,h = (cid:18) a σ + 0 . (cid:80) Tt =1 (cid:80) Ni =1 ˆ ψ ( i ) h,jlt − (cid:19) − (cid:18) b σ + 0 . (cid:80) Tt =1 (cid:80) Ni =1 ˆ ψ ( i ) h,jlt ( γ ( i ) jl,t − ˆ γ ∗ h,jlt ) ) (cid:19) . M-step for pair-wise correlations:

The update of γ ( i ) jl,t is performed via a Newton-Raphson step. Denote the parameter estimate at f -th iteration of Newton-Raphson as γ ( i )[ f ] jl,t , and use the update for the ( f + 1)-th iteration as γ ( i )[ f +1] jl,t = γ ( i )[ f ] jl,t − a ( γ ( i )[ f ] jl,t ) a ( γ ( i )[ f ] jl,t ) , where a ( γ ( i )[ f ] jl,t ) and a ( γ ( i )[ f ] jl,t ) are expressed as: a ( γ ( i )[ f ] jl,t ) = d ( i (Θ) [ f ] ) /d ( γ ( i )[ f ] jl,t ) = exp (2 γ ( i )[ f ] jl,t ) − exp (2 γ ( i )[ f ] jl,t ) + 1 − H (cid:88) h =1 ˆ ψ ( i ) h,jlt ( γ ( i )[ f ] jl,t − γ ∗ h,jlt ) σ γ,h,jlt − ( exp (2 γ ( i )[ f ] jl,t ) − y jt + y lt ) − y jt y lt ( exp (2 γ ( i )[ f ] jl,t ) + 1)4 σ y exp (2 γ ( i )[ f ] jl,t ) , and a ( γ ( i )[ f ] jl,t ) = d ( i (Θ) [ f ]2 ) /d ( γ ( i )[ f ] jl,t ) = 4 exp (2 γ ( i )[ f ] jl,t )( exp (2 γ ( i )[ f ] jl,t ) + 1) − H (cid:88) h =1 ˆ ψ ( i ) h,jlt σ γ,h,jlt − exp (2 γ ( i )[ f ] jl,t ) ( y jt + y lt − y jt y lt ) + ( y jt + y lt + y jt y lt )2 σ y exp (2 γ ( i )[ f ] jl,t )The above iterative steps are repeated until convergence, i.e. when | γ ( i )[ f +1] jl,t − γ ( i )[ f ] jl,t | < − .45 -step for covariate eﬀects: The log-posterior log( π ( β h,jlt | − )) ∝ (cid:26) − β Th,jlt Σ − β β h,jlt −

12 log( det (Σ β )) (cid:27) + N (cid:88) i =1 (cid:26) ∆ ( i ) h,jlt x T i β h,jlt − log (cid:0) H − (cid:88) r =1 exp ( x T i β r,jlt ) (cid:1)(cid:27) ≈ − N (cid:88) i =1 w h,jlt ( z h,jlt − x T i β h,jlt ) − β Th,jlt Σ − β β h,jlt , using the expression in (6), and a quadratic approximation as in (Friedman et al., 2010)for the last step, in order to facilitate closed form updates. In the above expression, z h,jlt = x T i ˜ β h,jlt + (cid:98) ∆ ( i ) h,jlt − ˜ p h,jlt ( x i )˜ p h,jlt ( x i )(1 − ˜ p h,jlt ( x i )) , w h,jlt = ˜ p h,jlt ( x i )(1 − ˜ p h,jlt ( x i )), ˜ p h,jlt = ˜ P (∆ ( i ) h,jlt = 1 | x i ) = exp ( x T i ˇ β h,jlt )1+ (cid:80) H − h =1 exp ( x T i ˇ β h,jlt ) represents the approximated probability under the quadratic ap-proximation, ˇ β h,jlt represents the estimate of β h,jlt at previous step, and (cid:98) ∆ ( i ) h,jlt represents ex-pected probability for the i -th subject as in the E-step. The above approximate log-posteriorcan be optimized to obtain a closed form expression as (cid:98) β h,jlt = arg max β log( π ( β h,jlt | − )) =(Σ − β + (cid:80) Ni =1 w h,jl x i x T i ) − ( (cid:80) Ni =1 w h,jl z h,jl x i ) , where the notations in the expression for (cid:98) β h,jlt has been deﬁned previously. M-steps for dynamic precision matrix estimation

M-step for mixture atoms:

Deﬁne e ∗ h,t = ( ω ∗ h,t − ω ∗ h,t − ) (cid:48) = ( e ∗ h, t , . . . , e ∗ h,V t ) , t = 1 , . . . , T − E ∗ h = ( ω ∗ h , e ∗ h, , . . . , e ∗ h,T − ) (cid:48) , { e ∗ h,v (cid:48) t } represents the elements in e ∗ h,t , ¯ W h,v is a T × V − t -th row as (cid:80) Ni =1 ∆ ( i )) h,vt ω ( i ) v,t σ ω,h , ¯ W h,v ( • , v (cid:48) ) and E ∗ h ( • , v (cid:48) ) represent the v (cid:48) th columnof ¯ W h,v and E ∗ h respectively, and |·| represents element-wise L norm. Similar to the steps fordynamic pairwise correlations, the estimate for mixture atom ω ∗ h,t , h = 1 , . . . , H, t = 1 , . . . , T, V (cid:88) v =1 {|| ¯ W h,v − M ∗ h,v E ∗ h || + λ T − (cid:88) t =1 | e ∗ h,t | } = V (cid:88) v =1 V − (cid:88) v (cid:48) =1 {|| ¯ W h,v ( • , v (cid:48) ) − M ∗ h,v E ∗ h ( • , v (cid:48) ) || + λ T − (cid:88) t =1 | e ∗ h,v (cid:48) t | } , where M ∗ h,v =  √ w h,v, . . . √ w h,v, √ w h,v, . . . √ w h,v, √ w h,v, √ w h,v, . . . √ w h,v,T √ w h,v,T √ w h,v,T . . . √ w h,v,T  . The above equation can be solved using a Lasso algorithm with the penalty parameter λ being chosen using BIC. The solutions for E ∗ h are then used to recover the estimates for ω ∗ h,t . M-step for mixture variance:

Use ˆ σ ω,h = b σ +0 . (cid:80) Ni =1 (cid:80) Tt =1 (cid:80) Vv =1 ∆ ( i ) h,vt ( ω ( i ) v,t − ω ∗ h,t ) (cid:48) ( ω ( i ) v,t − ω ∗ h,t ) a σ +1+0 . V ( V − (cid:80) Tt =1 (cid:80) Ni =1 ∆ ( i ) h,vt . M-step for covariate eﬀects:

Using similar arguments as in Section 3.1, one can approx-imate the posterior as: log( π ( β h,t | − )) ≈ − N (cid:88) i =1 V (cid:88) v =1 w h,t ( z h,vt − x T i β h,t ) − β Th,t Σ − β β h,t , where z h,vt = x T i ˜ β h,t + (cid:98) ψ ( i ) h,vt − ˜ p h,t ( x i )˜ p h,t ( x i )(1 − ˜ p h,t ( x i )) , w h,t = ˜ p h,t ( x i )(1 − ˜ p h,t ( x i )), ˜ p h,t = ˜ P (∆ ( i ) h,t = 1 | x i ) = exp ( x T i ˜ β h,t )1+ (cid:80) H − h =1 exp ( x T i ˜ β h,t ) represents the approximated probability under the quadratic ap-proximation, where ˜ β h,t denotes the estimate of β h,t at previous step, and (cid:98) ψ ( i ) h,vt repre-sents expected probability for subject i as calculated in the E-step. The above approx-imate log-likelihood can be optimized to obtain a closed form expression (cid:98) β h,t = (Σ − β + V (cid:80) Ni =1 w h,t x i x T i ) − ( (cid:80) Ni =1 (cid:80) Vv =1 w h,t z h,vt x i ) , where the notations in the expression for (cid:98) β h,vt has been deﬁned previously. 47 eferences

1. Allen, E. A., Damaraju, E., Plis, S. M., Erhardt, E. B., Eichele, T., and Calhoun, V.D. (2014). Tracking whole-brain connectivity dynamics in the resting state. Cerebralcortex, 24(3), 663-676.2. Balsters, J. H., Whelan, C. D., Robertson, I. H., and Ramnani, N. (2013). Cerebellumand cognition: evidence for the encoding of higher order rules. Cerebral Cortex, 23(6),1433-1443.3. Becker, R. A., Chambers, J. M., and Wilks, A. R. (1988). The New S Language.Wadsworth & Brooks. Cole.[Google Scholar].4. Bullmore, E., and Sporns, O. (2009). Complex brain networks: graph theoreticalanalysis of structural and functional systems. Nature reviews neuroscience, 10(3),186-198.5. Chang, C., & Glover, G. H. (2010). Time–frequency dynamics of resting-state brainconnectivity measured with fMRI. Neuroimage, 50(1), 81-98.6. Cribben, I., Wager, T., and Lindquist, M. (2013). Detecting functional connectivitychange points for single-subject fMRI data. Frontiers in computational neuroscience,7, 143.7. Davey, J., Thompson, H. E., Hallam, G., Karapanagiotidis, T., Murphy, C., De Caso,I., ... and Jeﬀeries, E. (2016). Exploring the role of the posterior middle temporal gyrusin semantic cognition: Integration of anterior temporal lobe with executive processes.Neuroimage, 137, 165-177.8. D’Mello AM, Turkeltaub PE, and Stoodley CJ. (2017). Cerebellar tDCS Modu-lates Neural Circuits during Semantic Prediction: A Combined tDCS-fMRI Study.J Neuroscience;37(6):1604-1613. 48. Durante, D., Dunson, D. B., and Vogelstein, J. T. (2017), “Nonparametric Bayesmodeling of populations of networks,” Journal of the American Statistical Association,112, 1516–1530.10. Engel, J. (1988), Polytomous logistic regression. Statistica Neerlandica, 42: 233-252.11. Filippi, M., Spinelli, E. G., Cividini, C., and Agosta, F. (2019). Resting state dy-namic functional connectivity in neurodegenerative conditions: a review of magneticresonance imaging ﬁndings. Frontiers in neuroscience, 13, 657.12. Hidot, S., and Saint-Jean, C. (2010). An Expectation–Maximization algorithm for theWishart mixture model: Application to movement clustering. Pattern RecognitionLetters, 31(14), 2318-2324.13. Hindriks, R., Adhikari, M. H., Murayama, Y., Ganzetti, M., Mantini, D., Logothetis,N. K., & Deco, G. (2016). Can sliding-window correlations reveal dynamic functionalconnectivity in resting-state fMRI?. Neuroimage, 127, 242-256.14. Hutchison, R. M., Womelsdorf, T., Allen, E. A., Bandettini, P. A., Calhoun, V. D.,Corbetta, M., ... and Handwerker, D. A. (2013). Dynamic functional connectivity:promise, issues, and interpretations. Neuroimage, 80, 360-378.15. Jolles, D. D., van Buchem, M. A., Crone, E. A., & Rombouts, S. A. (2013). Func-tional brain connectivity at rest changes after working memory training. Human brainmapping, 34(2), 396-406.16. Kundu, S., Ming, J., Pierce, J., McDowell, J., & Guo, Y. (2018). Estimating dynamicbrain functional networks using multi-subject fMRI data. NeuroImage, 183, 635-649.17. Lacombe, J., Jolicoeur, P., Grimault, S., Pineault, J., and Joubert, S. (2015). Neuralchanges associated with semantic processing in healthy aging despite intact behavioralperformance. Brain and language, 149, 118-127.498. Laufer, I., Negishi, M., Lacadie, C. M., Papademetris, X., and Constable, R. T. (2011).Dissociation between the activity of the right middle frontal gyrus and the middletemporal gyrus in processing semantic priming. PloS one, 6(8), e22368.19. Lindquist, M. A., Xu, Y., Nebel, M. B., & Caﬀo, B. S. (2014). Evaluating dynamicbivariate correlations in resting-state fMRI: a comparison study and a new approach.NeuroImage, 101, 531-546.20. Lukemire, J., Kundu, S., Pagnoni, G., & Guo, Y. (2020). Bayesian joint modeling ofmultiple brain functional networks. Journal of the American Statistical Association,1-13.21. MacEachern, S. N. (1999). Dependent nonparametric processes. In ASA Proceedingsof the Section on Bayesian Statistical Science, Alexandria, VA. American StatisticalAssociation.22. McGregor, K. M., Zlatar, Z., Kleim, E., Sudhyadhom, A., Bauer, A., Phan, S., et al.(2011). Physical activity and neural correlates of aging: a combined TMS/fMRI study.Behav. Brain Res. 222, 158–168.23. Meilˇ aa