[PDF] Deep sr-DDL: Deep Structurally Regularized Dynamic Dictionary Learning to Integrate Multimodal and Dynamic Functional Connectomics data for Multidimensional Clinical Characterizations

Abstract

We propose a novel integrated framework that jointly models complementary information from resting-state functional MRI (rs-fMRI) connectivity and diffusion tensor imaging (DTI) tractography to extract biomarkers of brain connectivity predictive of behavior. Our framework couples a generative model of the connectomics data with a deep network that predicts behavioral scores. The generative component is a structurally-regularized Dynamic Dictionary Learning (sr-DDL) model that decomposes the dynamic rs-fMRI correlation matrices into a collection of shared basis networks and time varying subject-specific loadings. We use the DTI tractography to regularize this matrix factorization and learn anatomically informed functional connectivity profiles. The deep component of our framework is an LSTM-ANN block, which uses the temporal evolution of the subject-specific sr-DDL loadings to predict multidimensional clinical characterizations. Our joint optimization strategy collectively estimates the basis networks, the subject-specific time-varying loadings, and the neural network weights. We validate our framework on a dataset of neurotypical individuals from the Human Connectome Project (HCP) database to map to cognition and on a separate multi-score prediction task on individuals diagnosed with Autism Spectrum Disorder (ASD) in a five-fold cross validation setting. Our hybrid model outperforms several state-of-the-art approaches at clinical outcome prediction and learns interpretable multimodal neural signatures of brain organization.

Full PDF

DDeep sr-DDL: Deep Structurally Regularized Dynamic DictionaryLearning to Integrate Multimodal and Dynamic FunctionalConnectomics data for Multidimensional Clinical Characterizations

N.S. D’Souza a, ∗ , M.B. Nebel b,c , D. Crocetti b , J. Robinson b , N. Wymbs b,c , S.H. Mostofsky b,c,d , A. Venkataraman a a Department of Electrical and Computer Engineering, Johns Hopkins University, USA b Center for Neurodevelopmental & Imaging Research, Kennedy Krieger Institute, USA c Department of Neurology, Johns Hopkins School of Medicine, USA d Department of Psychiatry and Behavioral Science, Johns Hopkins School of Medicine, USA

Abstract

We propose a novel integrated framework that jointly models complementary information from resting-state functionalMRI (rs-fMRI) connectivity and diﬀusion tensor imaging (DTI) tractography to extract biomarkers of brain connectivitypredictive of behavior. Our framework couples a generative model of the connectomics data with a deep network thatpredicts behavioral scores. The generative component is a structurally-regularized Dynamic Dictionary Learning (sr-DDL) model that decomposes the dynamic rs-fMRI correlation matrices into a collection of shared basis networksand time varying subject-speciﬁc loadings. We use the DTI tractography to regularize this matrix factorization andlearn anatomically informed functional connectivity proﬁles. The deep component of our framework is an LSTM-ANNblock, which uses the temporal evolution of the subject-speciﬁc sr-DDL loadings to predict multidimensional clinicalcharacterizations. Our joint optimization strategy collectively estimates the basis networks, the subject-speciﬁc time-varying loadings, and the neural network weights. We validate our framework on a dataset of neurotypical individualsfrom the Human Connectome Project (HCP) database to map to cognition and on a separate multi-score predictiontask on individuals diagnosed with Autism Spectrum Disorder (ASD) in a ﬁve-fold cross validation setting. Our hybridmodel outperforms several state-of-the-art approaches at clinical outcome prediction and learns interpretable multimodalneural signatures of brain organization.

Keywords:

Dynamic Dictionary Learning, Structural Regularization, Multimodal Integration, Functional MagneticResonance Imaging, Diﬀusion Tensor Imaging, Clinical Severity

1. Introduction

Functional magnetic resonance imaging (fMRI) quanti-ﬁes the changes in blood ﬂow and oxygenation in the re-gions associated with neuronal activity. More speciﬁcally,resting state fMRI (rs-fMRI) is acquired in the absence of atask paradigm, thus allowing us to probe the spontaneousco-activation patterns in the brain. It is believed that theco-activations reﬂect the intrinsic functional connectivitybetween brain regions [Fox and Raichle (2007)]. In con-trast to fMRI, Diﬀusion Tensor Imaging (DTI) [Assaf andPasternak (2008)] assesses structural connectivity by mea-suring the diﬀusion of water molecules across neuronal ﬁ-bres in the brain. Going one step further, we can usetractography to construct detailed 3 D maps of anatomicalpathways within the brain based on the diﬀusion tensors.There is strong evidence in literature of the correspon-dence between functional and structural pathways within ∗ Corresponding author

Email address:

[email protected] (N.S. D’Souza) the brain [Skudlarski, Jagannathan, Calhoun, Hampson,Skudlarska and Pearlson (2008)], with several studies sug-gesting that this functional connectivity may be mediatedby either direct or indirect anatomical connections [Atasoy,Donnelly and Pearson (2016); Bowman, Zhang, Deradoand Chen (2012); Fukushima, Betzel, He, van den Heuvel,Zuo and Sporns (2018); Honey, Sporns, Cammoun, Gigan-det, Thiran, Meuli and Hagmann (2009)]. Thus, rs-fMRIand DTI data provide complementary information aboutfunction and structure respectively, which when integratedtogether can be used to construct a more comprehensiveview of brain organization both in health and disease. Asa result, multimodal integration has become an importanttopic of study for the characterization of neuropsychiatricdisorders such as Autism Spectrum Disorder (ASD) [Vis-sers, Cohen and Geurts (2012)], Attention Deﬁcit Hyper-activity Disorder (ADHD) [Weyandt, Swentosky and Gud-mundsdottir (2013)], and Schizophrenia [Niznikiewicz, Ku-bicki and Shenton (2003)].Traditional multimodal analyses of rs-fMRI and DTIdata have largely focused on post-hoc statistical compar-

Preprint submitted to NeuroImage August 31, 2020 a r X i v : . [ c s . L G ] A ug igure 1: Top:

For the fMRI data, we group voxels in the brain into ROIs deﬁned by a standard atlas and compute the average time coursesfor each ROI. The correlation matrix captures the synchrony in the average time courses.

Bottom

Tractography is performed on the rawDWI data to track the path of neuronal ﬁbers in the brain. Based on the parcellation scheme, we construct a map of the ﬁbre tracts betweenROIs in the brain. The same parcellation scheme is used for both modalities. isons of features extracted from the data. For example,simple statistical diﬀerences in rs-fMRI and DTI connec-tivity between subjects have been used to discover dis-rupted patterns of brain organization in Alzheimer’s dis-ease [Hahn, Myers, Prigarin, Rodenacker, Kurz, F¨orstl,Zimmer, Wohlschl¨ager and Sorg (2013)] and ProgressiveSupranuclear Palsy (PSP) [Whitwell, Avula, Master, Ve-muri, Senjem, Jones, Jack Jr and Josephs (2011)]. Ona population level, classical multivariate analysis [Goble,Coxon, Van Impe, Geurts, Van Hecke, Sunaert, Wen-deroth and Swinnen (2012) Andrews-Hanna, Snyder, Vin-cent, Lustig, Head, Raichle and Buckner (2007)] or ran-dom eﬀects models [Propper, ODonnell, Whalen, Tie, Nor-ton, Suarez, Zollei, Radmanesh and Golby (2010)] are em-ployed to independently compute and then combine fea-tures from both modalities. Despite their past success atbiomarker discovery, these techniques often fail to gener-alize at a patient-speciﬁc level. Furthermore, they oftenignore higher-order interactions between multiple subsys-tems in the brain, which is known to be critical for un-derstanding complex neuropsychiatric disorders [Kaiser,Hudac, Shultz, Lee, Cheung, Berken, Deen, Pitskel, Sug-rue, Voos et al. (2010); Koshino, Carpenter, Minshew,Cherkassky, Keller and Just (2005)]. These shortcomingshave paved the way for the development of the networkbased view of brain connectivity that simultaneously ac-counts for both inter-subject and intra-subject variability.In the case of fMRI, network-based models often groupvoxels in the brain into regions of interest (ROIs) using astandard anatomical or functional atlas. Next, the func-tional relationships between these regions are determinedbased on the synchrony between representative (often av-erage) regional time series. This information is typicallyrepresented in terms of a static functional connectivity ma- trix as shown in Fig. 1 (top). In case of DTI, tractographyis used to estimate the ﬁber tracts between the ROIs inthe brain from the voxel-level diﬀusion tensors, from whichfeatures such as the anisotropy or the number of ﬁberscan be extracted. Similar to the functional connectome,the structural connectivity matrix captures the strengthof the pairwise anatomical connection between diﬀerentROIs, as seen in Fig. 1 (bottom).Some of the simplest approaches to analyzing networkproperties borrow heavily from the ﬁeld of graph theory.For example, the works of [Bullmore and Sporns (2009);Rubinov and Sporns (2010); Sporns, Chialvo, Kaiser andHilgetag (2004)] use aggregate network measures, suchas node degree, betweenness centrality, and eigenvectorcentrality to study the organization of the brain. Thesemeasures compactly summarize the connectivity informa-tion onto a restricted set of nodes that can be mappedback to the brain. A more global network property issmall-worldedness [Bassett and Bullmore (2006)], whichdescribes an architecture of sparsely connected clustersof nodes. Complementary changes in small-worldednessin both anatomical and functional networks have beenwell documented across the literature [Park, Kim, Kimand Kim (2008); Sun, Yin, Fang, Yan, Wang, Bezeri-anos, Tang, Miao and Sun (2014)], with concurrent dis-ruptions of functional networks [Wang, Kalmar, He, Jack-owski, Chepenik, Edmiston, Tie, Gong, Shah, Jones et al.(2009)] or structural networks [Wang, Su, Zhou, Chou,Chen, Jiang and Lin (2012)] implicated in neuropsychi-atric disorders such as schizophrenia. The main limitationof these approaches is that they independently analyze thefMRI and DTI data, and as such, draw heuristic conclu-sions about the relationship between the two modalities.Community detection techniques have been widely used2or understanding the organization of complex systemssuch as the brain [Bardella, Bifone, Gabrielli, Gozziand Squartini (2016)]. Other examples include thework of [Venkataraman, Kubicki and Golland (2013)]that identiﬁes abnormal connectivity in schizophrenia,and [Venkataraman, Yang, Pelphrey and Duncan (2016)],which characterizes the social and communicative deﬁcitsassociated with autism. An alternative network topol-ogy is the hub-spoke model, used by [Venkataramanet al. (2013), Venkataraman, Kubicki and Golland (2012),Venkataraman, Duncan, Yang and Pelphrey (2015)], thattargets regions associated with a large number of alteredrs-fMRI connections. These methods, however, exclu-sively focus on functional connectivity and do not incor-porate structure. In this light, the work of [Venkatara-man, Rathi, Kubicki, Westin and Golland (2011)] pro-poses a probabilistic framework that jointly models la-tent anatomical and functional connectivity to discoverpopulation-level diﬀerences in schizophrenia. Similarly,the work of [Higgins, Kundu and Guo (2018)] uses a uniﬁedBayesian framework to identify gender-diﬀerences in mul-timodal connectivity patterns across diﬀerent age groups.While successful at combining multi-modal information forgroup diﬀerentiation, these techniques do not directly ad-dress inter-individual variability.Data-driven methods integrating structural and func-tional connectivity focus heavily on groupwise discrim-ination from the static connectomes. These methodsusually follow a two-step approach where feature se-lectors and discriminators are trained sequentially in apipeline. For example, the authors in [Wee, Yap, Zhang,Denny, Browndyke, Potter, Welsh-Bohmer, Wang andShen (2012)] combine graph theoretic features computedfrom rs-fMRI and DTI graphs with Support Vector Ma-chines (SVMs) to identify individuals with Mild Cogni-tive Impairment. Another example is the work of [Sui,He, Yu, Rogers, Pearlson, Mayer, Bustillo, Canive, Cal-houn et al. (2013)], which employs a pipeline consistingof joint-Independent Component Analysis (j-ICA) on thetwo modalities followed by Canonical Correlation Analy-sis (CCA) to combine them and distinguish schizophre-nia patients from controls. In contrast to the pipelinedapproaches, end-to-end deep learning methods combiningfeature selection and prediction are becoming ubiquitousin neuroimaging studies. These are highly successful dueto their ability to learn complex abstractions directly frominput data. As an example, the work of [Aghdam, Shar-iﬁ and Pedram (2018)] uses a Deep Belief Network (DBN)on multimodal data to disambiguate patients with AutismSpectrum Disorder from healthy controls. However, noneof the above methods tackle continuous-valued prediction,for example, quantifying a continuous level of deﬁcit.In the continuous prediction realm, the authors of[Kawahara, Brown, Miller, Booth, Chau, Grunau, Zwickerand Hamarneh (2017)] developed an end-to-end convolu-tional neural network to predict cognitive outcomes fromDTI connectomes. On the other hand, the authors of DSouza, Nebel, Wymbs, Mostofsky and Venkataraman(2019b) combine dictionary learning on the rs-fMRI corre-lations with an Artiﬁcial Neural Network (ANN) to predictclinical severity in ASD patients. While promising, thesemethods focus on a single neuroimaging modality and donot exploit complementary interactions between structuraland functional connectivity.There is now growing evidence that functional connec-tivity is a dynamic process that toggles between diﬀer-ent intrinsic states evolving over a static structural con-nectome [Cabral, Kringelbach and Deco (2017)]. Thesestates manifest over short time windows that are typicallyof the order of a tens of seconds to a few minutes. Sev-eral studies such as [Price, Wee, Gao and Shen (2014);Rashid, Damaraju, Pearlson and Calhoun (2014)] indicatethe importance of modeling this evolution for characteriz-ing neuropsychiatric disorders such as schizophrenia andAutism Spectrum Disorder (ASD). The dynamic connec-tivity among ROIs in the brain is typically captured viaa sliding window protocol, deﬁned by the window lengthand stride, as illustrated in Fig. 2. The window lengthdeﬁnes the length of the time sequence considered by eachdynamic correlation matrix, while the stride controls theoverlap in successive sliding windows. Recently, modelbased alternatives that detect dynamic changes in cor-relation between large-scale brain networks such as theDefault Mode Network, Somatosensory Network etc havebeen developed. An example is the Dynamic ConditionalCorrelation (DCC) protocol that was initially developed inthe econometrics and ﬁnance literature [Engle (2002)] andlater adapted to the study of brain organization using rs-fMRI [Lindquist (2016)]. It poses a time-varying matrixestimation problem to explicitly model the evolution ofconnectivity patterns in the brain, and has shown robust-ness in the test-retest setting [Lindquist, Xu, Nebel andCaﬀo (2014)] with rs-fMRI. Unfortunately, this methodis unstable when scaled up [Aielli (2013); Caporin andMcAleer (2013)], for example to a whole brain ROI-levelanalysis of dynamic connectivity, likely due to ill condi-tioning of the correlation matrices in the absence of ad-ditional regularization. Consequently, most dynamic con-nectivity studies continue to rely on sliding-window corre-lations as inputs. Examples include [Cai, Zille, Stephen,Wilson, Calhoun and Wang (2017)], where the authorsuse a sparse decomposition of the rs-fMRI connectomes,or [Rabany, Brocke, Calhoun, Pittman, Corbera, Wexler,Bell, Pelphrey, Pearlson and Assaf (2019)], which em-ploys a temporal clustering for ASD/control discrimina-tion. Nevertheless, these approaches focus exclusively onrs-fMRI and completely ignore structural information.We propose a deep-generative hybrid model, i.e. thedeep sr-DDL, that integrates structural and dynamic func-tional connectivity with behavior into a uniﬁed optimiza-tion framework. Our deep sr-DDL framework has twomain components, (1) a generative dictionary learningcomponent to represent the multimodal data and (2) adeep network to predict behavioral scores. Our generative3 igure 2: First, the ROI’s deﬁned by a standard atlas are used to compute regional time series. Then, a sliding window protocol deﬁned bywindow length and stride is applied to extract the dynamic patient correlation matrices. As in the static case, the dynamic matrices measurethe synchrony between regional time series, but as a function of time. component is a structurally regularized Dynamic Dictio-nary Learning (sr-DDL), which uses a DTI tractographyprior to regularize a matrix factorization of the dynamicrs-fMRI correlation matrices. Speciﬁcally, we decomposedynamic rs-fMRI correlation matrices into a collection ofshared bases, and time-varying subject speciﬁc loadings,similar to the static setup introduced in [Eavani, Sat-terthwaite, Filipovych, Gur, Gur and Davatzikos (2015)]and extended by [D’Souza, Nebel, Wymbs, Mostofsky andVenkataraman (2018); DSouza, Nebel, Wymbs, Mostof-sky and Venkataraman (2019a); DSouza et al. (2019b)].Simultaneously, these loadings are input to a deep net-work which is comprised of a Long-Short Term Memory(LSTM) module to model temporal trends and an ANNthat predicts clinical scores. Our end-to-end optimizationprocedure jointly estimates the bases, loadings, and neuralnetwork weights most predictive of the clinical proﬁle, asopposed to a modular, pipelined approach.A preliminary version of our work will appear inMICCAI 2020, and is currently available on Arxiv[D’Souza, Nebel, Crocetti, Wymbs, Robinson, Mostofskyand Venkataraman (2020)]. Here, we provide a detailedanalysis of our framework where we validate on two sepa-rate real-world datasets. The ﬁrst of these includes a sub-set of healthy adults from the publicly available HumanConnectomme Project (HCP) [Van Essen, Ugurbil, Auer-bach, Barch, Behrens, Bucholz, Chang, Chen, Corbetta,Curtiss et al. (2012)]. This helps us evaluate the eﬃcacyof our framework at predicting cognitive outcomes fromthe rs-fMRI and DTI scans. Next, we examine a a clini-cal dataset consisting of children diagnosed with AutismSpectrum Disorder (ASD). The presentation of ASD isknown to be heterogeneous with individuals exhibitinga wide spectrum of behavioral impairments in terms ofsocial reciprocity, communicative functioning, and repeti-tive/restrictive behaviours [Spitzer and Williams (1980)],quantiﬁed via clinical severity measures. We observedthat our method outperforms several state-of-the-art ap-proaches at predicting behavioral performance in unseenindividuals from their connectomics data for both datasets.In summary, our joint objective balances generalizability with interpretability, bridging the representational gap be-tween structure, function and behavior. Our experimentshighlight the potential of our deep sr-DDL framework forproviding a more holistic view of neuropsychiatric diseases.

2. Materials and Methods

Fig. 3 presents a graphical overview of our framework.We have three sets of inputs to the model for each indi-vidual namely, the dynamic individual-speciﬁc correlationmatrices, the DTI structural connectome graph (upperleft), and the set of scalar clinical scores (bottom right).We use the sliding window approach in Fig. 2 to extractdynamic rs-fMRI correlation matrices and tractographyto extract the DTI connectomes as shown in Fig. 1. TheDTI input to our model is the Graph Laplacian obtainedfrom a binary DTI adjacency matrix capturing the pres-ence/absence of a ﬁber between regions. Finally, the be-havioral scores for each individual are obtained from anexpert assessment. This score can correspond to eithercognitive outcomes or severity of symptoms in case of neu-rodevelopmental diseases.The green box in Fig. 3 describes the generative com-ponent of our framework. Here, the dynamic rs-fMRIcorrelation matrices are decomposed using a structurallyregularized dynamic dictionary learning (sr-DDL). Thecolumns in the bases subnetworks capture representativepatterns common to the cohort. The loading coeﬃcientsdiﬀer across subjects, and evolve over time. At each time-point/observation, they determine the contribution of eachbasis to the dynamic functional connectivity proﬁle of theindividual. Finally, the DTI Graph Laplacians re-weightthe decomposition to focus on the functional connectivitybetween anatomically linked regions. The gray box de-notes the deep networks part of our model. This networkcombines a Long Short Term Memory (LSTM) modulewith an Artiﬁcial Neural Network (ANN) to predict mul-tiple behavioral scores. The LSTM models the temporaltrends in the subject-speciﬁc loading coeﬃcients giving rise4 igure 3: Framework to integrate structural and dynamic functional connectivity for clinical severity prediction

Green Box:

The generativesr-DDL module. The rs-fMRI dynamic correlation matrices are decomposed into the subnetwork basis and time-varying subject-speciﬁcloadings. The DTI connectivity regularizes this decomposition.

Gray Box:

Deep LSTM-ANN module for multi-score prediction. Thesr-DDL coeﬃcients are input into the LSTM to generate a hidden representation. The predictor ANN (P-ANN) generates a time varyingestimate for the scores, while the attention ANN (A-ANN) weights the predictions across time to generate the ﬁnal clinical severity estimate. to a hidden representation. The ANN then uses this rep-resentation to predict the corresponding behavioral out-comes.

Dynamic Dictionary Learning for rs-fMRI data . We denote the set of time varying functional correlationmatrices for individual n by the set { Γ tn } T n t =1 ∈ R P × P .Here, T n denotes the number of sliding windows applied tothe rs-fMRI scan, and P is the number of ROIs in the par-cellation scheme. As seen in Fig. 3 (green box), we modelthis information using a group average basis, and subject-speciﬁc temporal loadings. The dictionary B ∈ R P × K isa concatenation of K elemental bases vectors b k ∈ R P × ,i.e. B := [ b b ... b K ], where K (cid:28) P . This ba-sis captures representative brain states which each sub-ject cycles through over the course of the scan. We fur-ther constrain the basis vectors to be orthogonal to eachother. This constraint acts as an implicit regularizer, en-suring that the learned subnetworks are uncorrelated, yetexplain the rs-fMRI data well. While the bases are sharedacross the cohort, the strength of their combination diﬀersacross individuals and varies over time. These loadingsare denoted by the set { c tn } T n t =1 and combine the basis sub-networks uniquely to best explain each subject’s functionalconnectivity. We introduce an explicit non-negativity con-straint c tnk to ensure that the positive semi-deﬁniteness of Γ tn is preserved. The complete rs-fMRI data representa-tion takes the following form: Γ tn ≈ (cid:88) k c tnk b k b Tk s.t. c nk ≥ , B T B = I K , (1) where I K is the K × K identity matrix. As seen inEq. (1), the subject-speciﬁc loading vector at time t , c tn := [ c tn ... c tnK ] T ∈ R K × models the heterogene-ity in the cohort. Denoting diag ( c tn ) as a diagonal matrixwith the K subject-speciﬁc coeﬃcients on the diagonal andoﬀ-diagonal terms set to zero, Eq. (1) can be re-written inthe following matrix form: Γ tn ≈ Bdiag ( c tn ) B T s.t. c tnk ≥ , B T B = I K (2)Finally, this matrix factorization serves to reduce the di-mensionality of the rs-fMRI data, while simultaneouslymodeling group-level and subject-speciﬁc information. Structural Regularization from DTI data . We de-note the structural connectome graph for individual n by G n ( V , E , A n ). Here V are the vertices deﬁned on the P ROIs, E are the graph edges deﬁned by the binary adja-cency matrix A n ∈ R P × P . We compute the correspondingNormalized Graph Laplacian [Banerjee and Jost (2008)] as L n = V − n ( V n − A n ) V − n , where V n = diag ( A n ) is thedegree matrix and is the vector of all ones. In the past,the spectral properties of the Graph Laplacian have madeit a popular choice as a spatial regularizer in computer vi-sion [Liu, Liang, Zhou, He, Hao, Song, Yu, Liu, Liu andJiang (2008)], genetics [Feng, Gao, Liu, Zheng and Yu(2017)] and neuroimaging [Atasoy et al. (2016); Cuingnet,Glaun`es, Chupin, Benali and Colliot (2012)]. We extendthis concept to regularizing our functional matrix decom-position by substituting the (cid:96) penalty in Eq. (2) using theWeighted Frobenius Norm || . || L n [Manton, Mahony andHua (2003); Schnabel and Toint (1983)]. Mathematically,5iven this structural regularization, the approximation er-ror of Eq. (2) takes the following form: || Γ tn − Bdiag ( c tn ) B T || L n =Tr (cid:104) ( Γ tn − Bdiag ( c tn ) B T ) L n ( Γ tn − Bdiag ( c tn ) B T ) (cid:105) (3)Here, Tr[ M ] is the trace operator, which sums the diagonalelements of the argument matrix M . Essentially, the ma-trix L n refocuses the factorization such that region pairswith an underlying anatomical connection have a greatercontribution to the approximation error than region pairswithout an anatomical connection. Based on the formu-lation in Eq. (3), the ﬁnal sr-DDL objective D ( . ) can beexpressed as follows: D ( B , { c tn } ; { Γ tn } , L n ) = (cid:88) t T n || Γ tn − Bdiag ( c tn ) B T || L n s.t. c tnk ≥ , B T B = I K (4) Deep Multiscore Prediction . As seen in the gray boxin Fig. 3, the subject-speciﬁc coeﬃcients { c tn } are in-put to an LSTM-ANN to predict the clinical scores, asparametrized by the weights Θ . The M clinical scoresfor each individual are concatenated into a vector y n :=[ y n ... y nM ] T ∈ R M × . The LSTM models the tempo-ral variations in the coeﬃcients { c tn } to generate a hiddenrepresentation { h tn } T n t =1 . From here, the Predictor ANN(P-ANN) generates a time varying estimates of the scores { ˆ y tn } T n t =1 ∈ R M × . At the same time, the Attention ANN(A-ANN) generates T n scalars from the hidden represen-tation. These are then softmax across time to obtainthe attention weights: { a tn } T n t =1 . The ﬁnal prediction isan attention-weighted average across the time estimates,which takes the following form:ˆ y n = (cid:88) t ˆ y tn a tn (5)Eﬀectively, the attention weights determine which timepoints for each subject are most relevant for behavioralprediction. Additionally, they allow us to handle rs-fMRIscans of varying durations. Mathematically, we com-pute the multi-score prediction error L ( . ) using the MeanSquared Error (MSE) loss function as follows: L ( { c tn } , y n ; Θ ) = || ˆy n − y n || F = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T n (cid:88) t =1 ˆy tn a tn − y n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F (6)At a high level, the deep network distills the temporalinformation to best predict each subject’s clinical proﬁle. Joint Objective for Multimodal Integration . Wecombine the complementary viewpoints in Eq. (4) and Eq. (6) into a single joint objective below: J ( B , { c tn } , Θ ; { Γ tn } , L n , { y n } )= (cid:88) n D ( B , { c tn } ; { Γ tn } , L n ) (cid:124) (cid:123)(cid:122) (cid:125) sr-DDL loss + λ (cid:88) n L ( Θ , { c tn } ; y n ) (cid:124) (cid:123)(cid:122) (cid:125) deep network loss = (cid:88) n (cid:88) t T n || Γ tn − Bdiag ( c tn ) B T || L n + λ (cid:88) n L ( Θ , { c tn } ; y n ) s.t. c tnk ≥ , B T B = I K (7)Here, λ is a hyperparameter than balances the tradeoﬀbetween the representation loss D ( . ) and the predictionloss L ( . ). { B , { c tn } , Θ } are the variables to optimize. Architectural Details . Our proposed ANN architectureis highlighted in the white box to the bottom left of Fig. 3.Our modeling choices carefully control for representationalcapacity and convergence of our coupled optimization pro-cedure. Since the input to the network, i.e. the coeﬃcientvector c tn is essentially low dimensional, we opt for a twolayered LSTM with the hidden layer width as 40. Boththe P-ANN and the A-ANN are fully connected neuralnetworks with two hidden layers of width 40. Since theA-ANN outputs a scalar, the width of its output layer isone, while that of the P-ANN is of size M , i.e. the num-ber of behavioral scores. We use a Rectiﬁed Linear Unit(ReLU) as the activation function for each hidden layer,as we found that this choice is robust to issues with van-ishing gradients and saturation that commonly confoundthe training of deep neural networks [Glorot, Bordes andBengio (2011)]. We employ the alternating minimization technique inorder to infer the set of hidden variables { B , { c tn } , Θ } .Namely, we optimize Eq. (7) for each output variable,while holding the other unknowns constant.We utilize the fact that there is a closed-form Pro-crustes solution for quadratic objectives of the form || M − B || F [Everson (1998)]. However, Eq. (7) is bi-quadratic in B , so it cannot be directly applied. Therefore,we adopt the strategy in [DSouza, Nebel, Wymbs, Mostof-sky and Venkataraman (2020); DSouza et al. (2019a,b)]of introducing (cid:80) n T n constraints of the form D tn = Bdiag ( c tn ). These constraints are enforced via the Aug-mented Lagrangian algorithm with corresponding con-straint variables { Λ tn } . Thus, our objective from Eq. (7)6ow becomes: J c = (cid:88) n,t T n || Γ tn − D tn B T || L n + λ (cid:88) n L ( Θ , { c tn } ; y n )+ (cid:88) n,t γT n (cid:104) Tr (cid:2) ( Λ tn ) T ( D tn − Bdiag ( c tn )) (cid:3)(cid:105) + (cid:88) n.t γT n (cid:104) || D tn − Bdiag ( c tn ) || F (cid:105) s.t. c tnk ≥ , B T B = I K (8)The Frobenius norm terms || D tn − Bdiag ( c tn ) || F regu-larize the trace constraints during the optimization. Ob-serve that Eq. (8) is convex in the set { D tn } , which al-lows us to optimize this variable via standard procedures.The constraint parameter is ﬁxed at γ = 20, based on theguidelines in the literature [Nocedal and Wright (2006)].Fig. 4 depicts our alternating minimization strategy. Wedescribe each individual block in detail below: Step 1: Closed form solution for B . Notice thatEq. (8) reduces to the following quadratic form in B : B ∗ = arg min B : B T B = I K || M − B || F (9)where M is computed as: M = (cid:88) n T n (cid:88) t ( Γ tn L n + L n Γ tn ) D tn + (cid:88) n T n (cid:104) (cid:88) t γ D tn diag ( c tn ) + γ Λ tn diag ( c tn ) (cid:105) (10)We know that B has a closed-form Procrustes solution[Everson (1998)] computed as follows. Given the singularvalue decomposition M = USV T , we have: B ∗ = UV T In essence, B spans the anatomically weighted space ofsubject-speciﬁc dynamic correlation matrices. Step 2: Updating the sr-DDL loadings { c tn } . Theobjective J c in Eq. (8) decouples across subjects. We canalso incorporate the non-negativity constraint c tnk ≥ c tn through a ReLU. Thus: c tn = ReLU ( ˆc tn ) (11)The ReLU pre-ﬁltering allows us to optimize an uncon-strained version of Eq. (8), as follows: J ˆ c = λ (cid:88) n L ( Θ , { c tn } ; y n )+ (cid:88) n,t γT n (cid:104) Tr (cid:2) ( Λ tn ) T ( D tn − Bdiag ( c tn )) (cid:3)(cid:105) + (cid:88) n.t γT n (cid:104) || D tn − Bdiag ( c tn ) || F (cid:105) (12)This optimization can be performed via the stochasticADAM algorithm [Kingma and Ba (2015)] by backprop-agating the gradients from the loss in Eq. (12) upto theinput { ˆ c t } . Experimentally, we set the initial learning rateto be 0 .

01, scaled by 0 . Θ ). After convergence, the thresholded loadings c tn = ReLU (ˆ c tn ) are used in the subsequent steps of theminimization. Step 3: Updating the Deep Network weights- Θ . Weuse backpropagation on the loss L ( · ) to solve for the un-knowns Θ . Notice that we can handle missing clinical databy dropping the contributions of the unknown value of y nm to the network loss during backpropagation. Again, we usethe ADAM optimizer [Kingma and Ba (2015)] with ran-dom initialization at the ﬁrst main iteration of alternatingminimization. We employ a learning rate of 10 − , scaledby 0 .

95 every 5 epochs, and batch-size 1. Additionally, wetrain the network only for 50 epochs to avoid overﬁtting.

Step 4: Updating the Constraint Variables { D tn , Λ tn } . Each of the primal variables { D tn } has a closedform solution given by:[ D tn ] k = KF (13) Figure 4: Alternating minimization strategy for joint optimization of Eq. (8) K = ( diag ( c n ) B T + Γ tn L n B + L n Γ tn B − γ Λ n ) and F = ( γ I K + 2 L n ) − We update the dual variables { Λ n } via gradient ascent:[ Λ tn ] k +1 = [ Λ tn ] k + η k ([ D tn ] k − Bdiag ( c n )) (14)We cycle through the primal-dual updates for { D tn } and { Λ tn } in Eq. (13-14) to ensure that the constraints D tn = Bdiag ( c tn ) are satisﬁed with increasing certainty at eachiteration. The learning rate parameter η k for the gradientascent step is selected to a guarantee suﬃcient decrease inthe objective for every iteration of alternating minimiza-tion. In practice, we initialize η to 10 − , and scale it by0 .

75 at each iteration k . Step 5: Prediction on Unseen Data . In our cross-validated setting, we must compute the sr-DDL loadings { ¯c t } ¯ Tt =1 for a new subject based on the B ∗ obtained fromthe training procedure and the new rs-fMRI correlationmatrices { ¯ Γ t } and DTI Laplacians ¯ L . As we do not knowthe score ¯y for this individual, we need remove the contri-bution L ( · ) from Eq. (8) and assume that the constraints¯ D t = B ∗ diag (¯ c t ) are satisﬁed with equality. This ef-fectively eliminates the Lagrangian terms. Essentially,the optimization for { ¯c t } now reduces to ¯ T n decoupledquadratic programming (QP) objectives Q t :¯ c ∗ t = arg min ¯ c t

12 ( ¯c t ) T ¯H¯c t + ¯f T ¯c t s.t. ¯A¯c t ≤ ¯b¯H = 2( B ∗ T ¯ LB ∗ ); ¯f = − [ I K ◦ ( B ∗ T ( ¯ Γ t ¯ L + ¯ L ¯ Γ t ) B ∗ )] ; ¯A = −I K ¯b = (15)Where ◦ is the elementwise Hadamard product. Noticethat decoupling the objective across time allows us to par-allelize this computation. Additionally, since ¯ H is positivesemi-deﬁnite, the formulation in Eq. (15) is convex, lead-ing to an eﬃcient QP solution. Finally, we estimate ¯ y viaa forward pass through the LSTM-ANN. Parameter Settings: . Our deep-generative hybrid hastwo free parameters: namely the penalty λ , which con- Figure 5: Scree Plot of the correlation matrices to corroborate theselected values for K . (L) KKI Dataset (R)

HCP Dataset. The thickline denotes the mean eigenvalue, while the shaded area indicates thestandard deviation across subjects and time points. trols the tradeoﬀ between data representation and clinicalprediction, and K , the number of networks. For our ex-periments, we chose K = 15 for both datasets based onthe knee point of the eigenspectrum of the correlation ma-trices Γ tn (see Fig. 5). The tradeoﬀ parameter is set to λ = 3 for both datasets, as we empirically found that thischoice gives a good performance on the test data withoutoverﬁtting during training. We discuss the sensitivity tothis parameter in Section 4.1. Initialization: . Our coupled optimization strategy re-quires us to initialize the basis B , coeﬃcients { c tn } , thedeep network weights Θ and the constraint variable pairs { D tn , Λ tn } . We randomly initialize the deep networkweights at the ﬁrst main iteration. We employ a soft-initialization for { B , { c tn }} by solving the dictionary ob-jective in Eq. (4) without the LSTM-ANN loss terms for20 iterations. We then initialize D tn = Bdiag ( c tn ) and Λ tn = which lie in the feasible set for our constraints.We empirically observed that this soft initialization helpsstabilize the optimization to provide improved predictiveperformance in fewer main iterations when compared witha completely random initialization.Finally, the meta-data and code used in this study areavailable on a public repository hosted on Github . We evaluate the performance of our framework againstthree diﬀerent classes of baselines, each highlighting thebeneﬁt of speciﬁc modeling choices made by our method.Our ﬁrst baseline class is a two stage conﬁguration asillustrated in Fig. 6 that combines feature extraction onthe dynamic rs-fMRI and DTI data, with a deep learn-ing predictor. These feature engineering techniques aredrawn from a set of well established statistical (Indepen-dent Component Analysis in Subsection 2.3.2) and graphtheoretic techniques (Betweenness Centrality in Subsec-tion 2.3.1), known to provide rich feature representations.The learned features are then input to the same deepLSTM-ANN network used by our method. This network istrained separately to predict the clinical outcomes. Notethat these baselines incorporate multimodal and dynamicinformation, but do not directly operate on the networkstructure of the connectomes. Our second baseline classomits the two step approach in lieu of an end-to-end con-volutional neural network based on the work of [Kawaharaet al. (2017)]. We train this model on the static rs-fMRIand DTI connectomes in tandem to predict the clinicalscores. This baseline operates directly on the correlationand connectivity matrices, but ignores the dynamic evo-lution of functional connectivity. Next, we present thecomparison of our deep sr-DDL by omitting the structuralregularization. This helps us evaluate the beneﬁt providedby the multimodal integration of DTI and rs-fMRI data. https://github.com/Niharika-SD/Deep-sr-DDL igure 6: A typical two stage baseline. We input the dynamic correlation matrices and DTI connectomes to Stage 1, which performs FeatureExtraction. This step could be a technique from machine learning, graph theory or a statistical measure. Stage 2 is a deep network thatpredicts the clinical scores Our ﬁnal baseline highlights the beneﬁt of our joint opti-mization procedure. In this experiment, we decouple theoptimization of the dynamic matrix factorization and deepnetwork in Fig. 3 similar to the two stage pipelines.

Notice that the subject-speciﬁc correlation rs-fMRI ma-trices { Γ tn } and the corresponding binary DTI adjacencymatrices A n indicate time-varying functional and anatom-ical connectivity between the ROIs respectively. There-fore, we multiply the two to generate the time-varyingmultimodal graphs whose nodes are the brain ROIs andedges are deﬁned by the temporal connectivity betweenthese ROIs. We denote the corresponding adjacency ma-trices for these graphs by { Ψ tn = A n ◦ Γ tn ∈ R P × P } , wherewe threshold each Ψ tn to remove negative values. Eachelement [ Ψ tn ] ij gives the strength of association betweentwo communicating sub-regions i and j in individual n attime t . We summarize the topology of these graphs via Betweenness Centrality ( C B ) to obtain a time-varyingestimate of brain connectivity for each ROI [Bassett andBullmore (2006); Sporns et al. (2004)]. C B ( v ) for region v is calculated as: C tB ( v ) = (cid:88) s (cid:54) = v (cid:54) = u ∈ V σ tsu ( v ) σ tsu (16) σ tsu is the total number of shortest paths from node s tonode u at time t , and σ tsu ( v ) is the number of those pathsthat pass through v . This measure quantiﬁes the num-ber of times a node acts as a bridge along the shortestpath between two other nodes and has found wide usagein characterizing small-worlded networks in brain connec-tivity [Sporns et al. (2004)]. We eﬀectively reduce thedimensionality of the connectivity features. Again, thecollection of features { C tB } are used to train an LSTM-ANN predictor from Fig. 3 with two hidden layers havingwidth 200 due to the higher input feature dimensionality. This baseline employs

Independent ComponentAnalysis (ICA) combined an the LSTM-ANN predic-tor. ICA is a statistical technique that extracts represen-tative spatial patterns from the rs-fMRI time series. It has now become ubiquitous in fMRI analysis for its ability toidentify group level diﬀerences as well as model individual-speciﬁc connectivity signatures. Essentially, ICA decom-poses multivariate signals into ‘independent’ non-Gaussiancomponents based on the data statistics.This algorithm can be extended to the multi-subjectanalysis setting via Group ICA (G-ICA). Speciﬁcally, weextract independent spatial patterns common across pa-tients, by combining the contribution of the individualtime courses. For this baseline, we ﬁrst perform G-ICAusing the GIFT toolbox [Calhoun, Liu and Adalı (2009)],and derive independent spatial maps for each subject fromtheir raw rs-fMRI scans. We then compute the averagetime courses for each spatial map considering the con-stituent voxels. This provides us with a feature representa-tion of reduced dimension equal to the number of speciﬁedmaps ( d << L ) for each individual. For our experiments,we extract 15 ICA components. These time courses areinput into the LSTM-ANN network in Fig. 3 with twohidden layers of width 40 to predict the clinical outcomes.

The BrainNet CNN [Kawahara et al. (2017)] relies onspecialized fully convolutional layers for feature extraction,and was originally used to predict cognitive and motor out-comes from DTI connectomes. Fig. 7 provides a pictorialoverview of the original architecture adapted for clinicaloutcome prediction from multimodal data. Each branchof the network accepts as input a P × P connectome, towhich it applies a cascade of two edge-edge (E-E) convolu-tional operations. This E-E operation combines individualconvolutions acting on the row and column to which theinput element belongs. It is followed by a series of edge-node (E-N) blocks that reduce the dimensionality of theintermediate outputs, followed by a node-graph (N-G) op-eration for pooling. Finally, the output clinical scores arepredicted via a fully connected artiﬁcial neural network forregression.We feed the rs-fMRI static connectomes (ˆ Γ n ) andDTI Laplacians L n into two disjoint fully convolutionalbranches with the architecture described above. We in-tegrate the learned features via concatenation and inputthem into the fully connected layers described in Fig. 7,9 igure 7: The BrainNet CNN baseline [Kawahara et al. (2017)] for severity prediction from multimodal data but with the number of outputs equal to the dimensional-ity of the clinical severity vector y n . We set the learningrate, momentum and weight decay parameters accordingto the guidelines in [Kawahara et al. (2017)]. In this baseline, we examine the eﬀect of excluding thestructural regularization provided by the DTI data fromthe joint objective in Eq. (7). The resulting objective func-tion takes the following form: J w ( B , { c tn } , Θ ; { Γ tn } , { y n } )= (cid:88) n (cid:88) t T n || Γ tn − Bdiag ( c tn ) B T || F + λ (cid:88) n L ( Θ , { c tn } ; y n ) s.t. c tnk ≥ , B T B = I K . (17)Notice that amounts to replacing the Weighted FrobeniusNorm formulation by a regular (cid:96) penalty. This allows usto adopt the alternating minimization procedure in Sec-tion 2.2 to optimize Eq. (17) with a few minor modiﬁca-tions. Speciﬁcally, instead of T n constraints per subject,we use a single constraint of the form D = B , enforcedvia a single Augmented Lagrangian Λ . This eﬀectivelyensures that the new objective has a quadratic form in B ,along with a closed form update for D . As before, we cyclethrough four individual steps, namely: • Closed form Procrustes solution for the basis B • Updating the temporal loadings { c tn } (ADAM) • Updating the Neural Network Parameters Θ (ADAM) • Augmented Lagrangian updates for the constraintvariables { D , Λ } Similar to the Deep sr-DDL, we use K = 15 networks asinputs to the LSTM-ANN network with two hidden layersof width 40 to predict the clinical outcomes. Our ﬁnal baseline examines the eﬃcacy of our coupledoptimization procedure in Section 2.2 with regards to gen-eralization onto unseen subjects. Here, we ﬁrst run thefeature extraction using the sr-DDL optimization to ex-tract the basis B and temporal loadings { c tn } . We thenuse the { c tn } as inputs to train the LSTM-ANN networkin Fig. 3 to predict the scores y n . This is akin to thetwo-stage baselines delineated in Fig. 6.Again, we use K = 15 networks with an a two layeredLSTM-ANN having hidden layer width 40

3. Experimental Results:

As a sanity check, we ﬁrst validate our optimization inSection 2.2 on synthetic data generated from the equiva-lent generative process, as captured by the graphical modelin Fig. 8. This experiment allows us to assess the be-havior of our algorithm under various noise scenarios. Asdescribed in Section 2.2, the observed variables are thetemporal correlation matrices { Γ tn } , the DTI Laplacians L n , and the clinical scores { y n } , while the latent vari-ables are the basis B , the coeﬃcients { c tn } , and the neuralnetwork weights Θ . Note that the dynamic correlationmatrices { Γ tn } are completely described by the basis B ,the coeﬃcients { c tn } and the Laplacian weighting L n . Wefurther observe that the rs-fMRI data decompositions foreach subject couple only through the shared basis and theclinical predictions through the shared network weights Θ .Conditioned on these variables, {{ Γ tn } , L n , { c tn } , Θ , y n } are independent across subjects. Fig. 8 captures these con-ditional relationships.We start by generating a basis matrix ˆ B ∈ R P × K bydrawing its entries independently from a zero mean Gaus-sian with variance one. We then use the Gram-Schmidtprocedure to compute an orthogonal basis B o = orth ( ˆ B ).Finally, we simulate corruptions to this basis via additive10 igure 8: The graphical model for generating synthetic data. Weﬁx the model parameters σ c = 4, number of subjects N at 60, andnumber networks K at 4. The dimensionality of y n is M = 3 andthe length of the scan T n = 30 for each subject. The shaded cir-cles denote observed variables, while the clear circles indicate latentvariables. Gaussian noise B = B o + N (0 , σ B ). Eﬀectively, the valueof σ B quantiﬁes the deviations of B from orthogonality,which is an assumption of our model. Note that the co-eﬃcient values in c n are independent across networks andsubjects, but not across time. Thus, for each subject, wegenerate the temporal coeﬃcients using a isotropic Gaus-sian process with zero mean, and variance σ c . These valuesare clipped at 0 to reﬂect the non-negativity in the coef-ﬁcients. The variance parameter σ c deﬁnes the scale ofthe coeﬃcients. Next, we simulate the Graph Laplacians L n for each subject based on structural connectivity priorscomputed using real-world data. Speciﬁcally, for each re-gion pair, we ﬁrst create a histogram of connectivity usingbinary adjacency matrices from the HCP database. With π L denoting the probability of a connection between ROIpairs, we sample a symmetric graph adjacency matrix A n per subject via a Bernouilli distribution with parameter π L . We then compute the corresponding Laplacians L n from A n . This choice of prior helps us generate realisticstructural connectivity proﬁles.Now, recall that our model seeks to approximatethe rs-fMRI dynamic correlation matrices by Γ tn ≈ Bdiag ( c tn ) B T . Additionally, this decomposition is reg-ularized by the individual Laplacians L n . Since we wishto evaluate the quality of this approximation, our gen-erative model simulates Γ tn by adding structured noise(parametrized by L n ) to Bdiag ( c tn ) B T . Speciﬁcally, weuse the eigenbasis X of L n to generate additive noise N = σ Γ XX T . We then compute the correlation matri-ces as Γ tn = Bdiag ( c tn ) B T + N . Note that this procedurepreserves the positive semi-deﬁniteness of the decomposi-tion. Eﬀectively, the parameter σ Γ controls the level of corruption in the observed dynamic correlation matrices.Finally, the observed variable { y n } , translates to a Gaus-sian with mean µ y n = F Θ ( { c tn } ) ∈ R M × , and variance σ y n I M . The function mapping F Θ refers to the LSTM-ANN network with the parameters Θ - which we randomlyinitialize. This is again folded to reﬂect positive values of y n . Here, σ y controls the noise in the clinical scores.There are two sources of noise for the observed variables.The ﬁrst is error in the correlation matrices Γ tn , controlledby changing σ Γ . The second case is error in the clinicalscores y n , quantiﬁed by the parameter σ y . Additionally,we are also interested in evaluating the performance undervarying levels of deviations of the basis from orthogonality.This is controlled by the parameter σ B .We evaluate the eﬃcacy of our algorithm using two sepa-rate metrics. The ﬁrst is an average inner-product measureof similarity S between each recovered network, ¯b k , andits corresponding best matched ground truth network, b k ,normalizing the latter to unit norm, that is: S = 1 K (cid:88) k | b Tk ¯b k ||| b k || . (18)The second metric is the Median Absolute Error (MAE)between the output of the trained LSTM-ANN ˆ y n and thetrue scores y n , for the score m , computed as :MAE = median( | ˆy : ,m − y : ,m | ) , (19)Fig. 9 depicts the performance of the algorithm in thesethree cases. In the each subplots, the x -axis corresponds toincreasing the levels of noise. In the ﬁrst two subplots, the y -axis indicates the similarity metric S computed for theparticular setting, while in the rightmost subplot, we plotthe MAE for predicting the three scores. All numericalresults have been aggregated over 50 independent trials.In the leftmost plot, an x -axis value close to 0 indi-cates low levels of deviation of B from orthogonality, whileincreasing values corresponds to a more severe deviationfrom the modeling assumptions. During this experiment,the values of the other free parameters in Fig. 8 wereheld constant. We observed that the MAE of the threescores remains roughly constant for all noise settings (score1—1 . ± .

09, score 2—1 . ± .

07, score 3—3 . ± . σ Γ is in-creased. The x -axis reports normalized values of σ Γ n whilethe remaining free parameters were held constant. Similarto the previous scenario, the MAE remains roughly con-stant for varying noise settings (score 1—1 . ± .

08, score2—1 . ± .

06, score 3—2 . ± . y n . Again, normalized σ y values arereported on the x-axis. For this experiment, we observedthat S = 0 . ± .

05 for varying noise levels.As expected, increased noise in the correlation matricesand deviations from orthogonality worsens recovery per-formance of the algorithm. This is reﬂected by the decay11 igure 9: Performance on synthetic experiments. ( L ): Varying the level of deviation from orthogonality ( σ Γ = 0 . σ Y = 0 . M ): Varyingthe level of noise in Γ ( σ B = 0 . σ y = 0 .

2) , ( R ): Varying the level of noise in y n under ( σ B = 0 . σ Γ = 0 .

2) Values on the x-axis have beennormalized to reﬂect a [0 −

1] range by dividing by the maximum value of the variable. We report deviations from the mean for recoveredsimilarity/MAE at each parameter setting in terms of a standard error value. The reported x -axis range reﬂects the regimes within whichthe algorithm converges to a local solution in the similarity measure along with increasing noise pa-rameters. Since the parameter σ y is held constant, we donot observe much variation in the the MAE values uponincreasing the noise. Lastly, we notice that the algorithmperforms better when the level of noise in the scores islower. This is indicated by the increasing values of MAEin the right subplot in Fig. 9. Since σ B is held constantfor this experiment, the metric S remains fairly constanteven upon increasing the noise in the scores.Taken together, our simulations indicate that the opti-mization procedure is robust in the noise regime (0 . − .

2) estimated from the real-world rs-fMRI data. In addi-tion, these experiments help us identify the stable parame-ter settings ( λ = 1 −

10) and set appropriate learning ratesfor the algorithm which guide our real world experiments.

We evaluate our deep-generative hybrid on two sepa-rate cohorts. The ﬁrst dataset is a cohort of 93 healthyindividuals from the Human Connectome Project (HCP)database [Van Essen, Smith, Barch, Behrens, Yacoub,Ugurbil, Consortium et al. (2013)] having both the rs-fMRIand DTI scans. We refer to this as the HCP dataset. Cog-nitive outcomes such as ﬂuid intelligence are believed tobe closely connected to structural (SC) and function con-nectivity (FC) in the human brain [Zimmermann, Griﬃthsand McIntosh (2018)]. Thus, jointly modeling multimodalneuroimaging and cognitive data helps exploit this funda-mental interweave and uncover the neural underpinningsof cognition. Finally, we chose to focus on a small dataset( N = 93) to demonstrate that our framework is suitablefor clinical rs-fMRI applications, many of which have lim-ited sample sizes.Our second dataset consists of 57 children with highfunctioning Autism Spectrum Disorder (ASD) acquired atthe Kennedy Krieger Institute in Baltimore, USA. Hence-forth, we refer to this as the KKI dataset. The age ofthe subjects from this cohort is 10 . ± .

26 with anIQ of 110 ± .

03. Social and communicative deﬁcits in ASD are believed to arise from aberrant interactions be-tween regions of the brain that are linked by structuraland functional connectivity [Rudie, Brown, Beck-Pancer,Hernandez, Dennis, Thompson, Bookheimer and Dapretto(2013)]. Thus, identifying these patterns plays a crucialrole in illuminating the etiological basis of the disorder.

Neuroimaging Data . As described in [Van Essen et al.(2013)], the HCP S1200 dataset was acquired on a Siemens3T scanner (TR/TE= 0 . ms/ . ms , spatial resolution= 2 × × T Achieva scanner with a single shot, par-tially parallel gradient-recalled EPI sequence with TR/TE= 2500 / ◦ , res = 3 . × . × . − . . . ms , res = 0 . × . × . b = 700 s/mm . The data was pre-processed using thestandard FDT [Jenkinson et al. (2012)] pipeline in FSLconsisting of susceptibility distortion correction, followedby corrections for eddy currents, motion and outliers.From here, tensor model ﬁtting was performed to gener-ate the transformation matrices and extract atlas basedmetrics. We used the BEDPOSTx tool in FSL [Behrens,Berg, Jbabdi, Rushworth and Woolrich (2007)] to per-form a bayesian estimation of the diﬀusion parametersat each voxel, followed by tractography using PROB-TRACKx [Behrens et al. (2007)].Our experiments rely on the Automatic AnatomicalLabelling (AAL) atlas [Tzourio-Mazoyer, Landeau, Pap-athanassiou, Crivello, Etard, Delcroix, Mazoyer and Joliot(2002)] parcellation for the rs-fMRI and DTI data. AALconsists of 116 cortical, subcortical and cerebellar regions.We employ a sliding window protocol as shown in Fig. 2.Due to the diﬀerent TR, we set the sliding window param-eters to window length = 156 and stride = 17 for the HCPdataset, and window length = 45 and stride = 5 for theKKI dataset to extract dynamic correlation matrices fromthe 116 average time courses. We discuss the sensitivityto this choice in Section 4.1. Thus, for each individual,we have correlation matrices of size 116 ×

116 based onthe Pearson’s Correlation Coeﬃcient between the averageregional time-series. Empirically, we observed a consis-tent noise component with nearly unchanging contributionfrom all brain regions and low predictive power for bothdatasets. Therefore, we subtracted out the ﬁrst eigenvec-tor contribution from each of the correlation matrices andused the residuals as the inputs { Γ n } to the algorithm andthe baselines.Each DTI connectivity matrix A n is binary, where[ A n ] ij = 1 corresponds to the presence of at least one tractbetween the regions i and j , 116 in total for AAL. For theKKI dataset, we impute the DTI connectivity for the 11individual, who do not have DTI based on the training data in each cross validation fold. Behavioral Data . For the HCP database, we examinethe Cognitive Fluid Intelligence Score (CFIS) described in[Bilker, Hansen, Brensinger, Richard, Gur and Gur (2012);Duncan (2005)], adjusted for age. This is scored based on abattery of tests measuring cognitive reasoning, considereda nonverbal estimate of ﬂuid intelligence in subjects. Thedynamic range for the score is 70 − −

30, withhigher score indicating greater impairment.The SRS scale quantiﬁes the level of social responsive-ness of a subject [B¨olte, Poustka and Constantino (2008)].Typically, these attributes are scored by parent/care-giveror teacher who completes a standardized questionnairethat assess various aspects of the child’s behavior. Con-sequently, SRS reporting tends to be more variable acrosssubjects, as compared to ADOS, since the responses areheavily biased by the parent/teacher attitudes. The SRSdynamic range is between 70 −

200 for ASD subjects, withhigher values corresponding to higher severity in terms ofsocial responsiveness.Finally, Praxis is assessed using the Florida ApraxiaBattery (modiﬁed for children) [Mostofsky, Dubey, Jerath,Jansiewicz, Goldberg and Denckla (2006)]. It assessesthe ability to perform skilled motor gestures on com-mand, by imitation, and with actual tool use. Severalstudies [Mostofsky et al. (2006), Dziuk, Larson, Apostu,Mahone, Denckla and Mostofsky (2007), Dowell, Ma-hone and Mostofsky (2009), Nebel et al. (2016)] revealthat children with ASD show marked impairments inPraxis a.k.a., developmental dyspraxia, and that impairedPraxis correlates with impairments in core autism social-communicative and behavioral features. Performance isvideotaped and later scored by two trained research-reliable raters, with total percent correctly performed ges-tures as the dependent variable of interest. Scores there-fore range from 0 − igure 10: A ﬁve-fold cross validation for evaluating performance We characterize the performance of each method using aﬁve-fold cross validation strategy, as illustrated in Fig. 10.We ﬁrst randomly split the data set into ﬁve training andtest folds. For each fold, we train our framework and thebaselines on an 80 percent training set split of the data.Then, we use the trained models to predict the clinicalscores on the held out 20 percent, which constitutes thetesting set for that fold. Each example is a part of the testset in exactly one of the 5 folds.We report two quantitative measures of performance.The ﬁrst is the Median Absolute Error (MAE), deﬁnedin Eq. (19), which quantiﬁes the absolute distance be-tween the measured and predicted scores across individ-uals. Lower MAE indicates better testing performance.The second metric is the Normalized Mutual Informa-tion (NMI), which assesses the similarity in the distri-bution of the predicted and observed score distributionsacross subjects. NMI for the score m is computed as:NMI( y : ,m , ˆy : , m ) = H ( y : ,m ) + H ( ˆy : , m ) − H ( y : ,m , ˆy : ,m )min { H ( y : ,m ) , H ( ˆy : ,m ) } Here, H ( y : ,m ) is the entropy of y : ,m and H ( y : ,m , ˆy : ,m ) isthe joint entropy between y : ,m and ˆy : ,m . NMI ranges be-tween 0 − Similarly, Fig. 11 illustrates the performance compari-son of our deep sr-DDL framework against the baselines inSection 2.3 on the HCP dataset for predicting the CFIS.Fig. 12 presents the same comparison on the KKI datasetfor multi-score prediction. In each ﬁgure, the scores pre-dicted by the algorithm are plotted on the y -axis againstthe measured ground truth score on the x -axis. The bold x = y line represents ideal performance. The red pointsrepresent the training data, while the blue points indicatethe held out testing data for all the cross validation folds.We observe that the training performance of the base-lines is good (i.e. the red points follow the x = y line)in all cases for both datasets. However, in case of testingperformance, our method outperforms the baselines in all cases. This performance gain is particularly pronouncedin the case of multiscore prediction (KKI dataset). Empir-ically, we are able to tune the baseline hyperparameters toobtain good testing performance on the KKI dataset fora single score (ADOS), but the prediction of the remain-ing scores (SRS and Praxis for the KKI dataset) suﬀers.Notice that the prediction on SRS, Praxis (KKI dataset)and CFIS (HCP dataset) hovers around the populationmean of the score in almost all cases. Finally, we noticethat omitting the structural regularization from the deepsr-DDL performs worse than our method.In contrast to the baselines, the testing predictions ofour framework follow the x = y more closely. The machinelearning, statistical and graph theoretic techniques we se-lected for a comparison are well known in literature forbeing able to robustly provide compact characterizationsfor high dimensional datasets. However, we see that ICAis unable to estimate a reliable projection of the data thatis particularly useful for behavioral prediction. Similarly,the betweenness centrality measure is unable to extractinformative topologies for brain-behavior integration. Weconjecture that the aggregate nature of this measure isuseful for capturing group-level commonalities, but fallsshort of modeling subject-speciﬁc diﬀerences. Further-more, even the BrainNet CNN, which directly exploits thegraph structure of the connectomes falls short of general-izing to multi-score prediction. Additionally, it ignoresthe dynamic information in the rs-fMRI data. In caseof the baseline where we omit the structural regulariza-tion, i.e. deep sr-DDL without DTI, we notice that themethod learns a representation of the rs-fMRI data thatgeneralizes beyond the training set, but still falls short ofthe performance when anatomical information is included.This clearly demonstrates the beneﬁt of supplementing thefunctional data with structural priors. Finally, the failureof the decoupled dynamic matrix factorization and deep-network makes a strong case for jointly optimizing theneuroimaging and behavioral representations. The basisestimated independently of behavior are not indicative ofclinical outcomes, due to which the regression performancesuﬀers. We also quantify the performance indicated inthese ﬁgures in Table 1 (HCP dataset) and Table 2 (KKIdataset) based on the MAE and NMI.Our deep sr-DDL framework explicitly optimizes fora viable tradeoﬀ between multimodal and dynamic con-nectivity structures and behavioral data representationsjointly. The dynamic matrix decomposition simultane-ously models the group information through the basis, andthe subject-speciﬁc diﬀerences through the time-varyingcoeﬃcients. The DTI Laplacians streamline this decompo-sition to focus on anatomically informed functional path-ways. The LSTM-ANN directly models the temporal vari-ation in the coeﬃcients, with its weights encoding repre-sentations closely interlinked with behavior. The limitednumber of basis elements help provide compact represen-tations explaining the connectivity information well. Theregularization and constraints ensure that the problem is14 igure 11: HCP dataset:

Prediction performance for the Cognitive Fluid Intelligence Score by the

Red Box:

Deep sr-DDL.

Black Box:

Deep sr-DDL model without DTI regularization

Light Blue Box:

Betweenness Centrality on DTI + dynamic rs-fMRI multimodal graphsfollowed by LSTM-ANN predictor

Green Box:

ICA timeseries followed by LSTM-ANN predictor

Purple Box : Branched BrainNet CNN[Kawahara et al. (2017)] on DTI and rs-fMRI static graphs

Blue Box:

Decoupled DDL factorization followed by LSTM-ANN predictor

Score Method MAE Train MAE Test NMI Train NMI Test

CFIS BC & LSTM-ANN 4.12 16.89 0.80 0.57ICA & LSTM-ANN 4.54 20.02 0.82 0.70BrainNet CNN 0.54 16.36 0.99 0.54Decoupled 3.31 17.21 0.80 0.71Without DTI regularization 0.72 16.41 0.98

Table 1:

HCP Dataset:

Performance evaluation using

Median Absolute Error (MAE) and

Subnetwork Identiﬁcation . Fig. 13 and Fig. 14 illus-trate the 15 subnetworks in B trained on the HCP and theKKI dataset respectively. Each column of the basis con-sists of a set of co-activated subregions. We plot the valuesstored in these columns onto the corresponding ROIs in theAAL atlas. The colorbar in the ﬁgure indicates subnetworkcontribution to the AAL regions. Regions storing negativevalues (cold colors) are anticorrelated with regions storingpositive ones (hot colors).Examining the subnetworks in Fig. 13, we notice thatSubnetworks 9 and 3 exhibit positive and competing con-tributions from regions of the Default Mode Network(DMN), which has been widely inferred in the resting stateliterature [Raichle (2015)] and is believed to play a critical role in consolidating memory [Sestieri, Corbetta, Romaniand Shulman (2011)], as also in self-referencing and inthe theory of mind [Andrews-Hanna (2012)]. At the sametime, Subnetworks 3 and 4 have contributions from regionsin the Frontoparietal Network (FPN). The FPN is knownto be involved in executive function and goal-oriented, cog-nitively demanding tasks [Uddin, Yeo and Spreng (2019)].Subnetworks 1, 10, and 15 are comprised of regions fromthe Medial Frontal Network (MFN), while Subnetworks 12and 6 exhibit competing contributions from these regions.The MFN and FPN are known to play a key role in de-cision making, attention and working memory [Euston,Gruber and McNaughton (2012); Menon (2011)], whichare directly associated with cognitive intelligence. Subnet-works 2, 8, 12, 6 and 3 include subcortical and cerebellarregions, while subnetworks 3 and 4 include contributionsfrom the Somatomotor Network (SMN). Taken together,these networks are believed to be important functional15 SRS D ee p s r - DD L I C A + L S T M - A NN Praxis

ADOS D e c o up l e d W i t h o u t D T I R e g u l a r i z a t i o n B r a i n N e t C NN B C + L S T M - A NN Figure 12:

KKI dataset:

Multiscore prediction performance for the (L)

ADOS, (M)

SRS, and (R)

Praxis by the

Red Box:

Deep sr-DDL

Black Box:

Model without DTI regularization

Light Blue Box:

Betweenness Centrality on DTI + dynamic rs-fMRI multimodal graphsfollowed by LSTM-ANN predictor

Green Box:

ICA timeseries followed by the LSTM-ANN predictor

Purple Box : Branched BrainNetCNN [Kawahara et al. (2017)] on DTI Laplacian and rs-fMRI static graphs

Blue Box:

Decoupled DDL factorization followed by LSTM-ANNpredictor core Method MAE Train MAE Test NMI Train NMI Test ADOS BC & LSTM-ANN 1.53 3.24 0.36 0.20ICA & LSTM-ANN 1.21 3.30 0.42 0.32BrainNet CNN 1.90 3.50 0.96 0.25Decoupled 1.34 3.93 0.68 0.29Without DTI regularization 0.13 3.27 0.99 0.26

Deep sr-DDL 0.08 2.84 0.99

Deep sr-DDL

Praxis BC & LSTM-ANN 8.10 21.10 0.53 0.79ICA & LSTM-ANN 5.20 22.02 0.76 0.49BrainNet CNN 3.78 15.15 0.95 0.19Decoupled 1.57 21.67 0.75 0.25Without DTI regularization 1.09 17.34 0.99 0.49

Deep sr-DDL 0.13 13.50 0.99 0.85

Table 2:

KKI Dataset:

Performance evaluation using

Median Absolute Error (MAE) and

Normalized Mutual Information (NMI) ﬁt, both for testing and training. Lower MAE and higher NMI score indicate better performance. We have highlighted the best performancein bold. Near misses have been underlined. connectivity biomarkers of cognitive intelligence and con-sistently appear in previous literature on the HCP dataset [Ch´en, Cao, Reinen, Qian, Gou, Phan, De Vos and Cannon(2019); Hearne, Mattingley and Cocchi (2016)].

Figure 13: Complete set of subnetworks identiﬁed by the deep sr-DDL model for the HCP database. The red and orange regions areanti-correlated with the blue and green regions. igure 14: Complete set of subnetworks identiﬁed by the deep sr-DDL model for the KKI database. The red and orange regions areanti-correlated with the blue and green regions. For the KKI dataset, in Fig. 14, Subnetwork 1 includesregions from the DMN, and the SMN. Similarly, Subnet-work 4 includes competing contributions from the SMNand DMN regions. Aberrant connectivity within the DMNand SMN regions have previously been reported in ASD[Lynch, Uddin, Supekar, Khouzam, Phillips and Menon(2013); Nebel et al. (2016)]. Subnetworks 2 and 12 exhibitcontributions from higher order visual processing areas inthe occipital and temporal lobes along with and sensori-motor regions. At the same time, Subnetworks 7 and 14exhibits competing contributions from these areas. Theseﬁndings concur with behavioral reports of reduced visual-motor integration in autism [Nebel et al. (2016)]. Subnet-works 3 and 4 exhibit anticorrelated contributions from thecentral executive control network (CEN) and insula. Sub-network 6 also exhibits CEN contributions. These regionsare believed to be essential for switching between goal-directed and self-referential behavior [Sridharan, Levitinand Menon (2008)]. Subnetwork 4 and Subnetwork 7 in-cludes prefrontal and DMN regions, along with subcorticalareas such as the thalamus, amygdala and hippocampus.The hippocampus is known to play a crucial role in theconsolidation of long and short term memory, along withspatial memory to aid navigation. Altered memory func-tioning has been shown to manifest in children diagnosedwith ASD [Williams, Goldstein and Minshew (2006)]. Thethalamus is responsible for relaying sensory and motor sig-nals to the cerebral cortex in the brain and has been impli-cated in autism-associated sensory dysfunction, a core fea-ture of ASD [Cascio, McGlone, Folger, Tannan, Baranek, Pelphrey and Essick (2008)]. Along with the amygdala,which is known to be associated with emotional responses,these areas may be crucial for social-emotional regulationin ASD. [Pouw, Rieﬀe, Stockmann and Gadow (2013)].Finally, we observed an average similarity of 0 . ± . . ± .

06 for these subnetworks across their crossvalidation runs on the HCP and KKI datasets respectively.This suggests that our deep-generative framework is ableto capture stable underlying mechanisms which robustlyexplain the diﬀerent sets of deﬁcits in ASD as well robustlyextract signatures of cognitive ﬂexibility in neurotypicalindividuals.

Decoding rs-fMRI networks dynamics . Our deep sr-DDL allows us to map the evolution of functional networksin the brain by probing the LSTM-ANN representation.Recall that our model does not require the rs-fMRI scansto be of equal length. Fig. 15 (left) illustrates the learnedattentions output by the A-ANN for the 93 subjects fromthe HCP dataset on the top and the 57 KKI subjectss atthe bottom during testing. For the KKI dataset, the pa-tients with shorter scans have been grouped in the top ofthe ﬁgure. These time-points have been blackened at thebeginning of the scan. The colorbar indicates the strengthof the attention weights. Higher attention weights denoteintervals of the scan considered especially relevant for pre-diction. Notice that the network highlights the start of thescan for several individuals, while it prefers focusing on theend of the scan for some others, especially pronounced incase of the KKI dataset. The patterns are comparativelymore diﬀused for subjects in the HCP dataset, although18 igure 15: (Left)

Learned attention weights (Right)

Variation ofnetwork strength over time on the (Top)

HCP dataset (Bottom)

KKI dataset several subjects manifest selectivity in terms of relevantattention weights. This is indicative of the underlyingindividual-level heterogeneity in both the cohorts.Next, we illustrate the variation of the network strengthfor a representative Subject from the HCP dataset andKKI dataset over the scan duration in Fig. 15 (right) at thetop and bottom respectively. Each solid colored line cor-responds to one of the 15 sub-networks in Fig. 14. Noticethat, over the scan duration, each network cycles throughphases of activity and relative inactivity. Consequently,only a few networks at each time step contribute to thepatient’s dynamic connectivity proﬁle. This parallels thetransient brain-states hypothesis in dynamic rs-fMRI con-nectivity [Allen, Damaraju, Plis, Erhardt, Eichele andCalhoun (2014)], with active states as corresponding sub-networks in the basis matrix B .

4. Discussion

Our deep-generative hybrid cleverly exploits the intrin-sic structure of the rs-fMRI correlation matrices throughthe dynamic dictionary representation to simultaneouslycapture group-level and subject-speciﬁc information. Atthe same time, the LSTM-ANN network models the tem-poral evolution of the rs-fMRI data to predict behavior.The compactness of our representation serves as a dimen-sionality reduction step that is related to the clinical scoreof interest, unlike the pipelined treatment commonly foundin the literature. Our structural regularization helps usfold in anatomical information to guide the functional de-composition. Overall, our framework outperforms a vari-ety of state-of-the-art graph theoretic, statistical and deeplearning baselines on two separate real world datasets.We conjecture that the baseline techniques fail to ex-tract representative patterns from structural and func-tional data. These techniques are quite successful at mod- elling group level information, but fail to generalize to theentire spectrum of cognitive, symptomatic or connectivitylevel diﬀerences among subjects. Consequently, they over-ﬁt the training data. Further, we demonstrate that themodel is fairly robust to the choice of hyperparameters,and provide guidelines to set these for future applicationsof our method.

Our deep sr-DDL framework has only two free hyper-parameters. The ﬁrst is the number of subnetworks in B .As described in Section 2.2.1, we use the eigen-spectrum of { Γ tn } to ﬁx this at 15 for both datasets. The second is thepenalty parameter λ , which controls the trade-oﬀ betweenrepresentation and prediction. In addition to the model,our sliding window protocol in Fig. 2 is deﬁned by twoparameters, i.e. the sliding window length and the stride.Together, these balance the context size and informationoverlap within the rs-fMRI correlation matrices { Γ tn } .In this section, we evaluate the performance of ourframework under three scenarios. Speciﬁcally, we sweep λ , the window length and the stride parameter indepen-dently, keeping the other two values ﬁxed. We use ﬁvefold cross validation with the MAE metric to quantify themulti-score prediction performance, which as shown in Sec-tion 3.2, is more challenging than single score prediction.Fig. 16 plots the performance for the three scores on theKKI dataset with MAE value for each score on the y axisand the parameter value on the x axis. The operatingpoint indicates the settings chosen in Section 3.4.We observed that our method gives stable performancefor fairly large ranges of each parameter settings. As ex-pected, low values of λ (0 . −

1) result in higher MAEvalues, likely due to underﬁtting. Similarly, higher values( >

6) result in overﬁtting to the training dataset, degrad-ing the generalization performance. Additionally, lowervalues of window lengths result in higher variance amongthe correlation values due to noise, and hence less reliableestimates of dynamic connectivity [Lindquist (2016)]. Onthe other hand, very large context windows tend to missnuances in the dynamic evolution of the scan. Empirically,we observe that a mid-range of window length 100 − −

20s to be suitable for our application.In summary, the guidelines we identiﬁed for each of theparameters are- λ ∈ (2 − ∈ (100 − ∈ (10 − igure 16: Performance of the Deep-Generative Hybrid upon varying (L): the penalty parameter λ (B): window length (R): stride. Thehighlighted yellow sections indicate a stable operating range. Our operating point is indicated by the blue arrow the results of our method are reproducible across diﬀerentpopulations. As seen in our experiments on the in Section 3.4, ourmethod is able to extract key predictive resting statebiomarkers from healthy and autistic populations. Thiscould potentially be useful for developing and testing theeﬃcacy of behavioral therapies to improve treatment op-tions for the 1 in every 68 children diagnosed with ASD. Atthe same time, our deep sr-DDL makes minimal assump-tions. Provided we have access to a valid set of structuraland functional connectivity measures and clinical scores,this analysis can be easily adapted to other neurologicaldisorders and even predictive network models outside themedical realm. Overall, these ﬁndings greatly broaden thescope of our method for future applications.We recognize that our model is simplistic in its assump-tions, particularly in the formulation of the sr-DDL objec-tive. More concretely, the DTI priors guide a data-drivenclassical rs-fMRI matrix decomposition in a regulariza-tion framework. This deliberate modelling choice conve-niently preserves interpretability in the basis and simpliﬁesthe inference procedure, while making minimal assump-tions about the underlying brain organization. In recentyears, graph neural networks have shown great promise inbrain connectivity research due to their ability to capturesubtle interactions between communicating brain regionswhile exploiting the underlying hierarchy of brain orga-nization. Consequently, they are emerging as importanttools to probe complex pathologies in brain functioningand diagnose neurodevelopmental disorders [Anirudh andThiagarajan (2019); Parisot, Ktena, Ferrante, Lee, Guer-rero, Glocker and Rueckert (2018)]. In the future, we areexploring end-to-end graph convolutional networks thatmodel the evolution of rs-fMRI signals on the underlyinganatomical DTI graphs. In light of our current and futureexplorations, we hope to inch closer to a loftier goal ofimproving personalized healthcare.

5. Conclusion

We have introduced a novel deep-generative frameworkto integrate complementary information from the func-tional and structural neuroimaging domains, which simul-taneously maps to behavior. Our unique structural regu-larization elegantly injects anatomical information into thers-fMRI functional decomposition, thus providing us withan interpretable brain basis. Our deep network (LSTM-ANN) not only models the temporal variation among in-dividuals, but also helps isolate key dynamic resting-statesignatures, indicative of clinical/cognitive impairments.Our coupled optimization procedure ensures that we learneﬀectively from limited training data while generalizingwell to unseen subjects. Finally, our framework makes veryfew assumptions and can potentially be applied to studyother neuropsychiatric disorders (eg. ADHD, Schizophre-nia) as an eﬀective diagnostic tool.

Acknowledgements . This work has generously beensupported by the National Science Foundation CRCNSaward 1822575 and CAREER award 1845430, the Na-tional Institute of Mental Health (R01 MH085328-09, R01MH078160-07, K01 MH109766 and R01 MH106564), theNational Institute of Neurological Disorders and Stroke(R01NS048527-08), and the Autism Speaks foundation.

References

Aghdam, M.A., Shariﬁ, A., Pedram, M.M., 2018. Combination ofrs-fmri and smri data to discriminate autism spectrum disordersin young children using deep belief network. Journal of digitalimaging 31, 895–903.Aielli, G.P., 2013. Dynamic conditional correlation: on propertiesand estimation. Journal of Business & Economic Statistics 31,282–299.Allen, E.A., Damaraju, E., Plis, S.M., Erhardt, E.B., Eichele, T.,Calhoun, V.D., 2014. Tracking whole-brain connectivity dynamicsin the resting state. Cerebral cortex 24, 663–676.Andrews-Hanna, J.R., 2012. The brains default network and itsadaptive role in internal mentation. The Neuroscientist 18, 251–270.Andrews-Hanna, J.R., Snyder, A.Z., Vincent, J.L., Lustig, C., Head,D., Raichle, M.E., Buckner, R.L., 2007. Disruption of large-scalebrain systems in advanced aging. Neuron 56, 924–935. nirudh, R., Thiagarajan, J.J., 2019. Bootstrapping graph convo-lutional neural networks for autism spectrum disorder classiﬁca-tion, in: ICASSP 2019-2019 IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP), IEEE. pp.3197–3201.Assaf, Y., Pasternak, O., 2008. Diﬀusion tensor imaging (dti)-basedwhite matter mapping in brain research: a review. Journal ofmolecular neuroscience 34, 51–61.Atasoy, S., Donnelly, I., Pearson, J., 2016. Human brain networksfunction in connectome-speciﬁc harmonic waves. Nature commu-nications 7, 10340.Banerjee, A., Jost, J., 2008. On the spectrum of the normalized graphlaplacian. Linear algebra and its applications 428, 3015–3022.Bardella, G., Bifone, A., Gabrielli, A., Gozzi, A., Squartini, T., 2016.Hierarchical organization of functional connectivity in the mousebrain: a complex network approach. Scientiﬁc reports 6, 32060.Bassett, D.S., Bullmore, E., 2006. Small-world brain networks. Theneuroscientist 12, 512–523.Behrens, T.E., Berg, H.J., Jbabdi, S., Rushworth, M.F., Woolrich,M.W., 2007. Probabilistic diﬀusion tractography with multipleﬁbre orientations: What can we gain? Neuroimage 34, 144–155.Bilker, W.B., Hansen, J.A., Brensinger, C.M., Richard, J., Gur,R.E., Gur, R.C., 2012. Development of abbreviated nine-itemforms of the ravens standard progressive matrices test. Assess-ment 19, 354–369.B¨olte, S., Poustka, F., Constantino, J.N., 2008. Assessing autistictraits: cross-cultural validation of the social responsiveness scale(srs). Autism Research 1, 354–363.Bowman, F.D., Zhang, L., Derado, G., Chen, S., 2012. Determin-ing functional connectivity using fmri data with diﬀusion-basedanatomical weighting. NeuroImage 62, 1769–1779.Bullmore, E., Sporns, O., 2009. Complex brain networks: graphtheoretical analysis of structural and functional systems. NatureReviews Neuroscience 10, 186.Cabral, J., Kringelbach, M.L., Deco, G., 2017. Functional connec-tivity dynamically evolves on multiple time-scales over a staticstructural connectome: Models and mechanisms. NeuroImage160, 84–96.Cai, B., Zille, P., Stephen, J.M., Wilson, T.W., Calhoun, V.D.,Wang, Y.P., 2017. Estimation of dynamic sparse connectivitypatterns from resting state fmri. IEEE transactions on medicalimaging 37, 1224–1234.Calhoun, V.D., Liu, J., Adalı, T., 2009. A review of group ica forfmri data and ica for joint inference of imaging, genetic, and erpdata. Neuroimage 45, S163–S172.Caporin, M., McAleer, M., 2013. Ten things you should know aboutthe dynamic conditional correlation representation. Econometrics1, 115–126.Cascio, C., McGlone, F., Folger, S., Tannan, V., Baranek, G.,Pelphrey, K.A., Essick, G., 2008. Tactile perception in adultswith autism: a multidimensional psychophysical study. Journalof autism and developmental disorders 38, 127–137.Ch´en, O.Y., Cao, H., Reinen, J.M., Qian, T., Gou, J., Phan, H.,De Vos, M., Cannon, T.D., 2019. Resting-state brain informationﬂow predicts cognitive ﬂexibility in humans. Scientiﬁc reports 9,1–16.Ciric, R., Rosen, A.F., Erus, G., Cieslak, M., Adebimpe, A., Cook,P.A., Bassett, D.S., Davatzikos, C., Wolf, D.H., Satterthwaite,T.D., 2018. Mitigating head motion artifact in functional connec-tivity mri. Nature protocols 13, 2801–2826.Cox, R.W., 1996. Afni: software for analysis and visualizationof functional magnetic resonance neuroimages. Computers andBiomedical research 29, 162–173.Cuingnet, R., Glaun`es, J.A., Chupin, M., Benali, H., Colliot, O.,2012. Spatial and anatomical regularization of svm: a generalframework for neuroimaging data. IEEE transactions on patternanalysis and machine intelligence 35, 682–696.Dowell, L.R., Mahone, E.M., Mostofsky, S.H., 2009. Associationsof postural knowledge and basic motor skill with dyspraxia inautism: implication for abnormalities in distributed connectivityand motor learning. Neuropsychology 23, 563. D’Souza, N.S., Nebel, M.B., Crocetti, D., Wymbs, N., Robinson,J., Mostofsky, S., Venkataraman, A., 2020. A deep-generativehybrid model to integrate multimodal and dynamic connectivityfor predicting spectrum-level deﬁcits in autism. arXiv preprintarXiv:2007.01931 .D’Souza, N.S., Nebel, M.B., Wymbs, N., Mostofsky, S., Venkatara-man, A., 2018. A generative-discriminative basis learning frame-work to predict clinical severity from resting state functional mridata, in: International Conference on Medical Image Computingand Computer-Assisted Intervention, Springer. pp. 163–171.Duncan, J., 2005. Frontal lobe function and general intelligence:why it matters. Cortex: A Journal Devoted to the Study of theNervous System and Behavior .Dziuk, M., Larson, J.G., Apostu, A., Mahone, E., Denckla, M.,Mostofsky, S., 2007. Dyspraxia in autism: association with mo-tor, social, and communicative deﬁcits. Developmental Medicine& Child Neurology 49, 734–739.DSouza, N., Nebel, M., Wymbs, N., Mostofsky, S., Venkataraman,A., 2020. A joint network optimization framework to predict clin-ical severity from resting state functional mri data. NeuroImage206, 116314.DSouza, N.S., Nebel, M.B., Wymbs, N., Mostofsky, S., Venkatara-man, A., 2019a. A coupled manifold optimization framework tojointly model the functional connectomics and behavioral dataspaces, in: International Conference on Information Processing inMedical Imaging, Springer. pp. 605–616.DSouza, N.S., Nebel, M.B., Wymbs, N., Mostofsky, S., Venkatara-man, A., 2019b. Integrating neural networks and dictionary learn-ing for multidimensional clinical characterizations from functionalconnectomics data, in: International Conference on Medical Im-age Computing and Computer-Assisted Intervention, Springer.pp. 709–717.Eavani, H., Satterthwaite, T.D., Filipovych, R., Gur, R.E., Gur,R.C., Davatzikos, C., 2015. Identifying sparse connectivity pat-terns in the brain using resting-state fmri. Neuroimage 105, 286–299.Engle, R., 2002. Dynamic conditional correlation: A simple class ofmultivariate generalized autoregressive conditional heteroskedas-ticity models. Journal of Business & Economic Statistics 20, 339–350.Euston, D.R., Gruber, A.J., McNaughton, B.L., 2012. The role ofmedial prefrontal cortex in memory and decision making. Neuron76, 1057–1070.Everson, R., 1998. Orthogonal, but not orthonormal, procrustesproblems. Advances in computational Mathematics 3.Feng, C.M., Gao, Y.L., Liu, J.X., Zheng, C.H., Yu, J., 2017. Pcabased on graph laplacian regularization and p-norm for gene se-lection and clustering. IEEE transactions on nanobioscience 16,257–265.Fox, M.D., Raichle, M.E., 2007. Spontaneous ﬂuctuations in brainactivity observed with functional magnetic resonance imaging. Na-ture reviews neuroscience 8, 700.Fukushima, M., Betzel, R.F., He, Y., van den Heuvel, M.P., Zuo,X.N., Sporns, O., 2018. Structure–function relationships duringsegregated and integrated network states of human brain func-tional connectivity. Brain Structure and Function 223, 1091–1106.Glorot, X., Bordes, A., Bengio, Y., 2011. Deep sparse rectiﬁer neuralnetworks, in: Proceedings of the fourteenth international confer-ence on artiﬁcial intelligence and statistics, pp. 315–323.Goble, D.J., Coxon, J.P., Van Impe, A., Geurts, M., Van Hecke,W., Sunaert, S., Wenderoth, N., Swinnen, S.P., 2012. The neuralbasis of central proprioceptive processing in older versus youngeradults: an important sensory role for right putamen. Human brainmapping 33, 895–908.Hahn, K., Myers, N., Prigarin, S., Rodenacker, K., Kurz, A., F¨orstl,H., Zimmer, C., Wohlschl¨ager, A.M., Sorg, C., 2013. Selectivelyand progressively disrupted structural connectivity of functionalbrain networks in alzheimer’s diseaserevealed by a novel frame-work to analyze edge distributions of networks detecting disrup-tions with strong statistical evidence. Neuroimage 81, 96–109.Hearne, L.J., Mattingley, J.B., Cocchi, L., 2016. Functional brain etworks related to individual diﬀerences in human intelligence atrest. Scientiﬁc reports 6, 32328.Higgins, I.A., Kundu, S., Guo, Y., 2018. Integrative bayesian analysisof brain functional networks incorporating anatomical knowledge.Neuroimage 181, 263–278.Honey, C., Sporns, O., Cammoun, L., Gigandet, X., Thiran, J.P.,Meuli, R., Hagmann, P., 2009. Predicting human resting-statefunctional connectivity from structural connectivity. Proceedingsof the National Academy of Sciences 106, 2035–2040.Jenkinson, M., Beckmann, C.F., Behrens, T.E., Woolrich, M.W.,Smith, S.M., 2012. Fsl. Neuroimage 62, 782–790.Kaiser, M.D., Hudac, C.M., Shultz, S., Lee, S.M., Cheung, C.,Berken, A.M., Deen, B., Pitskel, N.B., Sugrue, D.R., Voos, A.C.,et al., 2010. Neural signatures of autism. Proceedings of the Na-tional Academy of Sciences , 201010412.Kawahara, J., Brown, C.J., Miller, S.P., Booth, B.G., Chau, V.,Grunau, R.E., Zwicker, J.G., Hamarneh, G., 2017. Brainnetcnn:Convolutional neural networks for brain networks; towards pre-dicting neurodevelopment. NeuroImage 146, 1038–1049.Kiar, G., Roncal, W.G., Mhembere, D., Bridgeford, E., Burns, R.,Vogelstein, J., 2016. ndmg: Neurodatas mri graphs pipeline. Zen-odo .Kingma, D.P., Ba, J.L., 2015. Adam: A method for stochastic opti-mization .Koshino, H., Carpenter, P.A., Minshew, N.J., Cherkassky, V.L.,Keller, T.A., Just, M.A., 2005. Functional connectivity in an fmriworking memory task in high-functioning autism. Neuroimage 24,810–821.Lindquist, M., 2016. Dynamic connectivity: Pitfalls and promises .Lindquist, M.A., Xu, Y., Nebel, M.B., Caﬀo, B.S., 2014. Evaluatingdynamic bivariate correlations in resting-state fmri: a comparisonstudy and a new approach. NeuroImage 101, 531–546.Liu, Y., Liang, M., Zhou, Y., He, Y., Hao, Y., Song, M., Yu, C., Liu,H., Liu, Z., Jiang, T., 2008. Disrupted small-world networks inschizophrenia. Brain 131, 945–961.Lord, C., Risi, S., Lambrecht, L., Cook, E.H., Leventhal, B.L., DiLa-vore, P.C., Pickles, A., Rutter, M., 2000. The autism diagnosticobservation schedule-generic: A standard measure of social andcommunication deﬁcits associated with the spectrum of autism.Journal of autism and developmental disorders 30, 205–223.Lynch, C.J., Uddin, L.Q., Supekar, K., Khouzam, A., Phillips, J.,Menon, V., 2013. Default mode network in childhood autism:posteromedial cortex heterogeneity and relationship with socialdeﬁcits. Biological psychiatry 74, 212–219.Manton, J.H., Mahony, R., Hua, Y., 2003. The geometry of weightedlow-rank approximations. IEEE Transactions on Signal Processing51, 500–514.Menon, V., 2011. Large-scale brain networks and psychopathology:a unifying triple network model. Trends in cognitive sciences 15,483–506.Mostofsky, S.H., Dubey, P., Jerath, V.K., Jansiewicz, E.M., Gold-berg, M.C., Denckla, M.B., 2006. Developmental dyspraxia is notlimited to imitation in children with autism spectrum disorders.Journal of the International Neuropsychological Society 12, 314–326.Muschelli, J., Nebel, M.B., Caﬀo, B.S., Barber, A.D., Pekar, J.J.,Mostofsky, S.H., 2014. Reduction of motion-related artifacts inresting state fmri using acompcor. Neuroimage 96, 22–35.Nebel, M.B., Eloyan, A., Nettles, C.A., Sweeney, K.L., Ament, K.,Ward, R.E., Choe, A.S., Barber, A.D., Pekar, J.J., Mostofsky,S.H., 2016. Intrinsic visual-motor synchrony correlates with socialdeﬁcits in autism. Biological psychiatry 79, 633–641.Nebel, M.B., Joel, S.E., Muschelli, J., Barber, A.D., Caﬀo, B.S.,Pekar, J.J., Mostofsky, S.H., 2014. Disruption of functional orga-nization within the primary motor cortex in children with autism.Human brain mapping 35, 567–580.Niznikiewicz, M.A., Kubicki, M., Shenton, M.E., 2003. Recent struc-tural and functional imaging ﬁndings in schizophrenia. CurrentOpinion in Psychiatry 16, 123–147.Nocedal, J., Wright, S., 2006. Numerical optimization. SpringerScience & Business Media. Parisot, S., Ktena, S.I., Ferrante, E., Lee, M., Guerrero, R., Glocker,B., Rueckert, D., 2018. Disease prediction using graph convo-lutional networks: Application to autism spectrum disorder andalzheimers disease. Medical image analysis 48, 117–130.Park, C.h., Kim, S.Y., Kim, Y.H., Kim, K., 2008. Comparison of thesmall-world topology between anatomical and functional connec-tivity in the human brain. Physica A: statistical mechanics andits applications 387, 5958–5962.Penny, W.D., Friston, K.J., Ashburner, J.T., Kiebel, S.J., Nichols,T.E., 2011. Statistical parametric mapping: the analysis of func-tional brain images. Elsevier.Pouw, L.B., Rieﬀe, C., Stockmann, L., Gadow, K.D., 2013. The linkbetween emotion regulation, social functioning, and depression inboys with asd. Research in Autism Spectrum Disorders 7, 549–556.Price, T., Wee, C.Y., Gao, W., Shen, D., 2014. Multiple-networkclassiﬁcation of childhood autism using functional connectivity dy-namics, in: International Conference on Medical Image Comput-ing and Computer-Assisted Intervention, Springer. pp. 177–184.Propper, R.E., ODonnell, L.J., Whalen, S., Tie, Y., Norton, I.H.,Suarez, R.O., Zollei, L., Radmanesh, A., Golby, A.J., 2010. Acombined fmri and dti examination of functional language later-alization and arcuate fasciculus structure: eﬀects of degree versusdirection of hand preference. Brain and cognition 73, 85–92.Rabany, L., Brocke, S., Calhoun, V.D., Pittman, B., Corbera, S.,Wexler, B.E., Bell, M.D., Pelphrey, K., Pearlson, G.D., Assaf,M., 2019. Dynamic functional connectivity in schizophrenia andautism spectrum disorder: Convergence, divergence and classiﬁ-cation. NeuroImage: Clinical 24, 101966.Raichle, M.E., 2015. The brain’s default mode network. Annualreview of neuroscience 38, 433–447.Rashid, B., Damaraju, E., Pearlson, G.D., Calhoun, V.D., 2014. Dy-namic connectivity states estimated from resting fmri identify dif-ferences among schizophrenia, bipolar disorder, and healthy con-trol subjects. Frontiers in human neuroscience 8, 897.Rubinov, M., Sporns, O., 2010. Complex network measures of brainconnectivity: uses and interpretations. Neuroimage 52, 1059–1069.Rudie, J.D., Brown, J., Beck-Pancer, D., Hernandez, L., Dennis,E., Thompson, P., Bookheimer, S., Dapretto, M., 2013. Alteredfunctional and structural brain network organization in autism.NeuroImage: clinical 2, 79–94.Schnabel, R.B., Toint, P.L., 1983. Forcing sparsity by projectingwith respect to a non-diagonally weighted frobenius norm. Math-ematical Programming 25, 125–129.Sestieri, C., Corbetta, M., Romani, G.L., Shulman, G.L., 2011.Episodic memory retrieval, parietal cortex, and the default modenetwork: functional and topographic analyses. Journal of Neuro-science 31, 4407–4420.Skudlarski, P., Jagannathan, K., Calhoun, V.D., Hampson, M.,Skudlarska, B.A., Pearlson, G., 2008. Measuring brain connec-tivity: diﬀusion tensor imaging validates resting state temporalcorrelations. Neuroimage 43, 554–561.Smith, S.M., Beckmann, C.F., Andersson, J., Auerbach, E.J., Bi-jsterbosch, J., Douaud, G., Duﬀ, E., Feinberg, D.A., Griﬀanti,L., Harms, M.P., et al., 2013. Resting-state fmri in the humanconnectome project. Neuroimage 80, 144–168.Spitzer, R.L., Williams, J.B., 1980. Diagnostic and statistical man-ual of mental disorders, in: American Psychiatric Association,Citeseer.Sporns, O., Chialvo, D.R., Kaiser, M., Hilgetag, C.C., 2004. Or-ganization, development and function of complex brain networks.Trends in cognitive sciences 8, 418–425.Sridharan, D., Levitin, D.J., Menon, V., 2008. A critical role for theright fronto-insular cortex in switching between central-executiveand default-mode networks. Proceedings of the National Academyof Sciences 105, 12569–12574.Sui, J., He, H., Yu, Q., Rogers, J., Pearlson, G., Mayer, A.R.,Bustillo, J., Canive, J., Calhoun, V.D., et al., 2013. Combinationof resting state fmri, dti, and smri data to discriminate schizophre-nia by n-way mcca+ jica. Frontiers in human neuroscience 7, 235. un, Y., Yin, Q., Fang, R., Yan, X., Wang, Y., Bezerianos, A., Tang,H., Miao, F., Sun, J., 2014. Disrupted functional brain connectiv-ity and its association to structural connectivity in amnestic mildcognitive impairment and alzheimers disease. PloS one 9.Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F.,Etard, O., Delcroix, N., Mazoyer, B., Joliot, M., 2002. Auto-mated anatomical labeling of activations in spm using a macro-scopic anatomical parcellation of the mni mri single-subject brain.Neuroimage 15, 273–289.Uddin, L.Q., Yeo, B.T., Spreng, R.N., 2019. Towards a universaltaxonomy of macro-scale functional human brain networks. Braintopography , 1–17.Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub,E., Ugurbil, K., Consortium, W.M.H., et al., 2013. The wu-minnhuman connectome project: an overview. Neuroimage 80, 62–79.Van Essen, D.C., Ugurbil, K., Auerbach, E., Barch, D., Behrens, T.,Bucholz, R., Chang, A., Chen, L., Corbetta, M., Curtiss, S.W.,et al., 2012. The human connectome project: a data acquisitionperspective. Neuroimage 62, 2222–2231.Venkataraman, A., Duncan, J.S., Yang, D.Y.J., Pelphrey, K.A.,2015. An unbiased bayesian approach to functional connectomicsimplicates social-communication networks in autism. NeuroImage:Clinical 8, 356–366.Venkataraman, A., Kubicki, M., Golland, P., 2012. From brainconnectivity models to identifying foci of a neurological disorder,in: International Conference on Medical Image Computing andComputer-Assisted Intervention, Springer. pp. 715–722.Venkataraman, A., Kubicki, M., Golland, P., 2013. From connec-tivity models to region labels: identifying foci of a neurologicaldisorder. IEEE transactions on medical imaging 32, 2078–2098.Venkataraman, A., Rathi, Y., Kubicki, M., Westin, C.F., Golland,P., 2011. Joint modeling of anatomical and functional connectivityfor population studies. IEEE transactions on medical imaging 31,164–182.Venkataraman, A., Wymbs, N., Nebel, M.B., Mostofsky, S., 2017.A uniﬁed bayesian approach to extract network-based functionaldiﬀerences from a heterogeneous patient cohort, in: InternationalWorkshop on Connectomics in Neuroimaging, Springer. pp. 60–69.Venkataraman, A., Yang, D.Y.J., Pelphrey, K.A., Duncan, J.S.,2016. Bayesian community detection in the space of group-levelfunctional diﬀerences. IEEE transactions on medical imaging 35,1866–1882.Vissers, M.E., Cohen, M.X., Geurts, H.M., 2012. Brain connectivityand high functioning autism: a promising path of research thatneeds reﬁned models, methodological convergence, and strongerbehavioral links. Neuroscience & Biobehavioral Reviews 36, 604–625.Wang, F., Kalmar, J.H., He, Y., Jackowski, M., Chepenik, L.G.,Edmiston, E.E., Tie, K., Gong, G., Shah, M.P., Jones, M., et al.,2009. Functional and structural connectivity between the perigen-ual anterior cingulate and amygdala in bipolar disorder. Biologicalpsychiatry 66, 516–521.Wang, Q., Su, T.P., Zhou, Y., Chou, K.H., Chen, I.Y., Jiang, T.,Lin, C.P., 2012. Anatomical insights into disrupted small-worldnetworks in schizophrenia. Neuroimage 59, 1085–1093.Wee, C.Y., Yap, P.T., Zhang, D., Denny, K., Browndyke, J.N., Pot-ter, G.G., Welsh-Bohmer, K.A., Wang, L., Shen, D., 2012. Iden-tiﬁcation of mci individuals using structural and functional con-nectivity networks. Neuroimage 59, 2045–2056.Weyandt, L., Swentosky, A., Gudmundsdottir, B.G., 2013. Neu-roimaging and adhd: fmri, pet, dti ﬁndings, and methodologicallimitations. Developmental neuropsychology 38, 211–225.Whitwell, J.L., Avula, R., Master, A., Vemuri, P., Senjem, M.L.,Jones, D.T., Jack Jr, C.R., Josephs, K.A., 2011. Disrupted thala-mocortical connectivity in psp: a resting-state fmri, dti, and vbmstudy. Parkinsonism & related disorders 17, 599–605.Williams, D.L., Goldstein, G., Minshew, N.J., 2006. The proﬁle ofmemory function in children with autism. Neuropsychology 20,21.Zimmermann, J., Griﬃths, J.D., McIntosh, A.R., 2018. Unique map-ping of structural and functional connectivity on cognition. Jour- nal of Neuroscience 38, 9658–9667.un, Y., Yin, Q., Fang, R., Yan, X., Wang, Y., Bezerianos, A., Tang,H., Miao, F., Sun, J., 2014. Disrupted functional brain connectiv-ity and its association to structural connectivity in amnestic mildcognitive impairment and alzheimers disease. PloS one 9.Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F.,Etard, O., Delcroix, N., Mazoyer, B., Joliot, M., 2002. Auto-mated anatomical labeling of activations in spm using a macro-scopic anatomical parcellation of the mni mri single-subject brain.Neuroimage 15, 273–289.Uddin, L.Q., Yeo, B.T., Spreng, R.N., 2019. Towards a universaltaxonomy of macro-scale functional human brain networks. Braintopography , 1–17.Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub,E., Ugurbil, K., Consortium, W.M.H., et al., 2013. The wu-minnhuman connectome project: an overview. Neuroimage 80, 62–79.Van Essen, D.C., Ugurbil, K., Auerbach, E., Barch, D., Behrens, T.,Bucholz, R., Chang, A., Chen, L., Corbetta, M., Curtiss, S.W.,et al., 2012. The human connectome project: a data acquisitionperspective. Neuroimage 62, 2222–2231.Venkataraman, A., Duncan, J.S., Yang, D.Y.J., Pelphrey, K.A.,2015. An unbiased bayesian approach to functional connectomicsimplicates social-communication networks in autism. NeuroImage:Clinical 8, 356–366.Venkataraman, A., Kubicki, M., Golland, P., 2012. From brainconnectivity models to identifying foci of a neurological disorder,in: International Conference on Medical Image Computing andComputer-Assisted Intervention, Springer. pp. 715–722.Venkataraman, A., Kubicki, M., Golland, P., 2013. From connec-tivity models to region labels: identifying foci of a neurologicaldisorder. IEEE transactions on medical imaging 32, 2078–2098.Venkataraman, A., Rathi, Y., Kubicki, M., Westin, C.F., Golland,P., 2011. Joint modeling of anatomical and functional connectivityfor population studies. IEEE transactions on medical imaging 31,164–182.Venkataraman, A., Wymbs, N., Nebel, M.B., Mostofsky, S., 2017.A uniﬁed bayesian approach to extract network-based functionaldiﬀerences from a heterogeneous patient cohort, in: InternationalWorkshop on Connectomics in Neuroimaging, Springer. pp. 60–69.Venkataraman, A., Yang, D.Y.J., Pelphrey, K.A., Duncan, J.S.,2016. Bayesian community detection in the space of group-levelfunctional diﬀerences. IEEE transactions on medical imaging 35,1866–1882.Vissers, M.E., Cohen, M.X., Geurts, H.M., 2012. Brain connectivityand high functioning autism: a promising path of research thatneeds reﬁned models, methodological convergence, and strongerbehavioral links. Neuroscience & Biobehavioral Reviews 36, 604–625.Wang, F., Kalmar, J.H., He, Y., Jackowski, M., Chepenik, L.G.,Edmiston, E.E., Tie, K., Gong, G., Shah, M.P., Jones, M., et al.,2009. Functional and structural connectivity between the perigen-ual anterior cingulate and amygdala in bipolar disorder. Biologicalpsychiatry 66, 516–521.Wang, Q., Su, T.P., Zhou, Y., Chou, K.H., Chen, I.Y., Jiang, T.,Lin, C.P., 2012. Anatomical insights into disrupted small-worldnetworks in schizophrenia. Neuroimage 59, 1085–1093.Wee, C.Y., Yap, P.T., Zhang, D., Denny, K., Browndyke, J.N., Pot-ter, G.G., Welsh-Bohmer, K.A., Wang, L., Shen, D., 2012. Iden-tiﬁcation of mci individuals using structural and functional con-nectivity networks. Neuroimage 59, 2045–2056.Weyandt, L., Swentosky, A., Gudmundsdottir, B.G., 2013. Neu-roimaging and adhd: fmri, pet, dti ﬁndings, and methodologicallimitations. Developmental neuropsychology 38, 211–225.Whitwell, J.L., Avula, R., Master, A., Vemuri, P., Senjem, M.L.,Jones, D.T., Jack Jr, C.R., Josephs, K.A., 2011. Disrupted thala-mocortical connectivity in psp: a resting-state fmri, dti, and vbmstudy. Parkinsonism & related disorders 17, 599–605.Williams, D.L., Goldstein, G., Minshew, N.J., 2006. The proﬁle ofmemory function in children with autism. Neuropsychology 20,21.Zimmermann, J., Griﬃths, J.D., McIntosh, A.R., 2018. Unique map-ping of structural and functional connectivity on cognition. Jour- nal of Neuroscience 38, 9658–9667.