[PDF] Distance Correlation Based Brain Functional Connectivity Estimation and Non-Convex Multi-Task Learning for Developmental fMRI Studies

Abstract

Resting-state functional magnetic resonance imaging (rs-fMRI)-derived functional connectivity patterns have been extensively utilized to delineate global functional organization of the human brain in health, development, and neuropsychiatric disorders. In this paper, we investigate how functional connectivity in males and females differs in an age prediction framework. We first estimate functional connectivity between regions-of-interest (ROIs) using distance correlation instead of Pearson's correlation. Distance correlation, as a multivariate statistical method, explores spatial relations of voxel-wise time courses within individual ROIs and measures both linear and nonlinear dependence, capturing more complex information of between-ROI interactions. Then, a novel non-convex multi-task learning (NC-MTL) model is proposed to study age-related gender differences in functional connectivity, where age prediction for each gender group is viewed as one task. Specifically, in the proposed NC-MTL model, we introduce a composite regularizer with a combination of non-convex ℓ 2,1−2 and ℓ 1−2 regularization terms for selecting both common and task-specific features. Finally, we validate the proposed NC-MTL model along with distance correlation based functional connectivity on rs-fMRI of the Philadelphia Neurodevelopmental Cohort for predicting ages of both genders. The experimental results demonstrate that the proposed NC-MTL model outperforms other competing MTL models in age prediction, as well as characterizing developmental gender differences in functional connectivity patterns.

Full PDF

aa r X i v : . [ q - b i o . Q M ] S e p Distance Correlation Based Brain FunctionalConnectivity Estimation and Non-Convex Multi-TaskLearning for Developmental fMRI Studies

Li Xiao, Biao Cai, Gang Qu, Julia M. Stephen, Tony W. Wilson,Vince D. Calhoun,

Fellow, IEEE , and Yu-Ping Wang,

Senior Member, IEEE

Abstract —Resting-state functional magnetic resonance imaging(rs-fMRI)-derived functional connectivity patterns have beenextensively utilized to delineate global functional organizationof the human brain in health, development, and neuropsychi-atric disorders. In this paper, we investigate how functionalconnectivity in males and females di ﬀ ers in an age predictionframework. We ﬁrst estimate functional connectivity betweenregions-of-interest (ROIs) using distance correlation instead ofPearson’s correlation. Distance correlation, as a multivariatestatistical method, explores spatial relations of voxel-wise timecourses within individual ROIs and measures both linear andnonlinear dependence, capturing more complex information ofbetween-ROI interactions. Then, a novel non-convex multi-tasklearning (NC-MTL) model is proposed to study age-related genderdi ﬀ erences in functional connectivity, where age prediction foreach gender group is viewed as one task. Speciﬁcally, in theproposed NC-MTL model, we introduce a composite regularizerwith a combination of non-convex ℓ , − and ℓ − regularizationterms for selecting both common and task-speciﬁc features.Finally, we validate the proposed NC-MTL model along withdistance correlation based functional connectivity on rs-fMRI ofthe Philadelphia Neurodevelopmental Cohort for predicting agesof both genders. The experimental results demonstrate that theproposed NC-MTL model outperforms other competing MTLmodels in age prediction, as well as characterizing developmentalgender di ﬀ erences in functional connectivity patterns. Index Terms —Brain development, distance correlation, featureselection, functional connectivity, multi-task learning.

I. I ntroduction F UNCTIONAL magnetic resonance imaging (fMRI) is amodern neuroimaging technique that characterizes brainfunction and organization through hemodynamic changes [1]–[3]. In recent decades, the fMRI-derived functional connectomehas attracted a great deal of interest for providing new insightsinto individual variations in behavior and cognition [4]–[7].The connectome is deﬁned as a network architecture of func-tional connectivity between brain regions-of-interest (ROIs). It

This work was supported in part by NIH under Grants R01GM109068,R01MH104680, R01MH107354, R01AR059781, R01EB006841,R01EB005846, R01MH103220, R01MH116782, R01MH121101,P20GM130447, P20GM103472, and in part by NSF under Grant 1539067.L. Xiao, B. Cai, G. Qu, and Y.-P. Wang are with the Department ofBiomedical Engineering, Tulane University, New Orleans, LA 70118, (e-mail:[email protected]).J. M. Stephen is with the Mind Research Network, Albuquerque, NM 87106.T. W. Wilson is with the Department of Neurological Sciences, Universityof Nebraska Medical Center, Omaha, NE 68198.V. D. Calhoun is with the Tri-Institutional Center for Translational Researchin Neuroimaging and Data Science (TReNDS), Georgia State University,Georgia Institute of Technology, Emory University, Atlanta, GA 30030. facilitates the understanding of fMRI brain activation patterns,and acts like a “ﬁngerprint” to distinguish individuals from thepopulation [8]–[10].Recently, brain developmental fMRI studies have shown thatthe human brain undergoes important changes of functionalconnectome across the lifespan [11]–[13]. For instance, Fair etal. [11] demonstrated that the organization of several functionalmodules shifts from a local anatomical emphasis in children toa more distributed architecture in young adults, which might bedriven by an abundance of short-range functional connectionsthat tend to weaken over age as well as long-range functionalconnections that tend to strengthen over age. Accordingly, therehas been a surge in work focusing on predicting an individual’sage from functional connectivity [14]–[16], in order to poten-tially aid in diagnosis and prognoses of developmental disor-ders and neuropsychiatric diseases. However, considering thatchanges of age-related functional connectivity get complicatedfrom childhood to senescence, there still remains a challenge ofunderstanding the developmental trajectories of brain functionmore accurately. In this paper, we address this challenge in twoways: 1) by reﬁning the estimation of functional connectivityto explore the intrinsic relationships between ROIs; and 2) bydeveloping an advanced machine learning model to handle veryhigh-dimensional functional connectivity data.The majority of previous developmental fMRI work is basedon the conventional functional connectivity analysis, in whichthe Pearson’s correlation between two ROI-wise time courses iscomputed as functional connectivity between the correspondingROIs, and each ROI-wise time course is the average of thetime courses of all constituent voxels within the ROI. Althoughthis approach provides straightforward estimates of functionalconnectivity, only linear dependence between ROIs is detected,and important information on the underlying true connectivitymay be lost when averaging all voxel-wise time courses withinan ROI. Therefore, in this paper we utilize distance correlation[17], [18] to quantify functional connectivity as also studiedin [19], [20], for better uncovering the complex interactionsbetween ROIs. Di ﬀ erent from Pearson’s correlation, distancecorrelation is a measure of both linear and nonlinear depen-dence between two random vectors of arbitrary dimensions.By regarding an ROI and its constituent voxels as a randomvector and the components of the vector, respectively, we candirectly perform on voxel-wise time courses within each ROIto compute distance correlation between ROIs. In such a way,distance correlation based functional connectivity can preservespatial information of all voxel-wise time courses within each This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer beaccessible.

ROI and improve characterization of between-ROI interactionscompared with Pearson’s correlation. We tested their predictivepower from resting-state fMRI (rs-fMRI) of the PhiladelphiaNeurodevelopmental Cohort (PNC) [21] for each gender groupseparately. The experimental results demonstrate that distancecorrelation based functional connectivity better predicted agesof both males and females (aged 8 −

22 years old) than Pearson’scorrelation based functional connectivity.Furthermore, multiple studies have documented the presenceof gender di ﬀ erences in brain development relevant to socialand behavioral domains during childhood through adolescence[22]–[25]. For example, evidences suggest that females showbetter verbal working memory and social cognition than males,while males perform better than females on spatial orientationand motor coordination [26]–[28]. Inspired by the observationsin these studies, in this paper we propose a novel non-convex multi-task learning (NC-MTL) model to investigateage-related gender di ﬀ erences in an age prediction framework,where age prediction tasks for both genders from functionalconnectivity are jointly analyzed. Speciﬁcally, we consider ageprediction for each gender group as one task, and select age-related common and gender-speciﬁc functional connectivityfeatures underlying brain development. To do so, we introducea composite of the non-convex ℓ , − and ℓ − regularizers inour NC-MTL model. The two regularizers have been recentlyused, respectively, in [29] and [30]–[32], and shown to beimproved alternatives to the classical ℓ , and ℓ regularizerswidely used in previous MTL models [33]–[39]. Thus, theuse of the ℓ , − term induces group sparsity for selectingcommon features shared by all tasks, and the use of the ℓ − term enables us to select task-speciﬁc features. In addition,from a machine learning point of view, adding some properregularization term in our NC-MTL model is beneﬁcial to avoidover-ﬁtting, especially in the high-dimensional feature but lowsample-size scenarios. To validate the e ﬀ ectiveness of our NC-MTL model, we conducted multiple experiments to jointlypredict ages of both genders using functional connectivity fromrs-fMRI of the PNC [21]. The experimental results show thatour NC-MTL model signiﬁcantly outperformed other previousMTL models, and can characterize the developmental genderdi ﬀ erences in functional connectivity patterns.The remainder of this paper is organized as follows. InSection II, we ﬁrst introduce distance correlation and apply it tomeasure functional connectivity. Then, we present the proposedNC-MTL model and its optimization algorithm. In Section III,we provide details of the experimental results and comparisons,followed by a discussion on the discovered gender di ﬀ erencesin functional connectivity during brain development as well asthe limitations and future research directions. In Section IV,we conclude this paper.Throughout this paper, we use uppercase boldface, lowercaseboldface, and normal italic letters to denote matrices, vectors,and scalars, respectively. The superscript T denotes the matrixtranspose. h A , B i stands for the inner product of two matrices A and B , and equals the trace of A T B . Let R denote the set of realnumbers. For the sake of clarity, we summarize the frequentlyused notations and corresponding descriptions in Table I. TABLE I: Notations and descriptions.

Notation Description W ij The ( i , j )-th element of a matrix W . w i The i -th column of a matrix W . w i The i -th row of a matrix W . w i The i -th element of a vector w . ∂ f The set of sub-gradients of a function f . ∇ f The gradient of a di ﬀ erentiable function f . ℓ p k w k p = ( P i | w i | p ) / p or k W k p = ( P i , j | W ij | p ) / p . ℓ , p k W k , p = ( P i k w i k p ) / p , and k W k , = k W k . k W k F The Frobenius norm of a matrix W , and k W k F = k W k , . W ( k ) , w ( k ) , w ( k ) W , w , w at the k -th iteration in an iterative algorithm. II. M ethods

In this section, we ﬁrst brieﬂy introduce distance correlation[17], [18], and compare it with Pearson’s correlation in terms ofapplication for measuring functional connectivity. Afterwards,we propose an innovative non-convex multi-task learning (NC-MTL) model as well as its optimization algorithm. At the end,we validate the proposed NC-MTL model on synthetic data.

A. Functional connectivity measured by distance correlation

In contrast with Pearson’s correlation, which is a widely usedmeasure of linear dependence between two random variables,distance correlation has recently been proposed for measuringand testing general (i.e., both linear and nonlinear) dependencebetween two random vectors of arbitrary dimensions. Tworandom vectors are independent if and only if the distancecorrelation between them is zero [17]. However, we cannot saythat two random variables with Pearson’s correlation being zeroare independent, because they are very likely to be nonlinearlydependent. Hence, distance correlation can generally capturemore complex relationships than Pearson’s correlation.Let { a i } ni = and { b i } ni = be n paired samples from two randomvectors a ∈ R p and b ∈ R q , where the dimensions p and q arearbitrarily large and not necessarily required to be equal. Theunbiased (sample) distance correlation between a and b is thendeﬁned as follows [18].1) Calculate the Euclidean distance matrices A ∈ R n × n and B ∈ R n × n whose elements are A i j = k a i − a j k and B i j = k b i − b j k for 1 ≤ i , j ≤ n , respectively.2) Calculate the U-centered distance matrices b A ∈ R n × n with b A i j =  A i j − P nl = A il n − − P nk = A kj n − + P nk , l = A kl ( n − n − , i , j , , i = j , (1)for 1 ≤ i , j ≤ n and b B ∈ R n × n accordingly.3) Deﬁne the distance covariance (dCov) bydCov( a , b ) = P i , j b A i j b B i j n ( n − . (2)4) Deﬁne the distance correlation (dCor) bydCor( a , b ) = s dCov( a , b ) √ dCov( a , a )dCov( b , b ) (3)if dCov( a , b ) >

0, and otherwise 0.Without loss of generality, by regarding a and b as apair of ROIs consisting of p and q voxels, respectively, and Fig. 1: An illustration of the di ﬀ erence between dCor based functionalconnectivity and pCor based functional connectivity. At the top, eachblue dot denotes an ROI; in the middle, each heatmap shows all voxel-wise time courses within the corresponding ROI; at the bottom, eachline plot represents an ROI-wise time course calculated by averagingall voxel-wise time courses within the corresponding ROI. { a i } ni = and { b i } ni = as the corresponding voxel-wise time courseswithin them over a total of n time points, we can computethe distance correlation, i.e., dCor( a , b ), to quantify functionalconnectivity between them [19], [20]. As all voxel-wise timecourses within an ROI are utilized by treating each voxelas one variable, dCor is a multivariate measure of functionalconnectivity. By comparison, Pearson’s correlation (pCor) is aunivariate measure of functional connectivity, where each ROIis ﬁrst reduced to one dimension by averaging voxel-wise timecourses within it to yield one ROI-wise time course, and thenfunctional connectivity between a pair of ROIs is measured bythe pCor between their ROI-wise time courses. The di ﬀ erencebetween the two functional connectivity methods is illustratedin Fig. 1. It has been demonstrated in [19], [20] that dCor basedfunctional connectivity is capable of preserving the voxel-levelinformation, resulting in improved characterization of between-ROI interactions, while averaging all voxel-wise time courseswithin each ROI in pCor based functional connectivity mightlose important information on the underlying true connectivity.Of note, “univariate” and “multivariate” here are used to referto the number of variables within an ROI [19]. B. Novel non-convex multi-task learning (NC-MTL)

We assume that there are M learning tasks for the data ina d -dimensional feature space. In the i -th task for 1 ≤ i ≤ M ,we have a training dataset { X i , y i } , where X i ∈ R n i × d is the datamatrix with n i training subjects as row vectors, each consistingof d features, and y i ∈ R n i is the corresponding label vector. Let w i ∈ R d denote the weights of all features to linearly regressthe labels y i on X i in the i -th task. Then, an MTL model for thedata can be formulated by the following optimization problem:min W M X i = k y i − X i w i k + α Ω ( W ) , (4)where W = [ w , w , · · · , w M ] ∈ R d × M is the weight matrix offeatures on all tasks, Ω ( W ) is the sparsity regularizer imposed for feature selection, and α > ﬀ between residual error and sparsity.Through solving (4), we obtain a sparse weight matrix W ∗ toevaluate the relationship between features and labels, therebyselecting the most discriminative features across all tasks. Notethat if the number of tasks equals 1, i.e., M =

1, then W = w ∈ R d becomes the weight vector on one task, and (4) representssingle-task learning (STL).A classical MTL model is to select common features sharedby all tasks based on a group sparsity regularizer, i.e., Ω ( W ) = k W k , , in (4). The ℓ , regularizer, extending the ℓ regularizerin STL to MTL, penalizes every row of W as a whole, andenforces sparsity among the rows. As the ℓ , regularizer leadsto a combinatorially NP-hard optimization problem, its severalapproximations, such as the ℓ , p regularizer ( k W k , p ) with 0 < p ≤

1, have been studied. Remarkably, the ℓ , regularizer hasbeen proposed as a convex approximation to the ℓ , regularizer[40]–[42], and MTL in (4) becomesmin W M X i = k y i − X i w i k + α k W k , , (5)which performs well and can be easily optimized. On the otherhand, as ℓ , p with 0 < p < ℓ , than ℓ , , the ℓ , p regularizer with 0 < p < ℓ , regularizer forfeature selection [43]–[45]. However, due to the non-convexityand non-Lipschitz continuity of the ℓ , p regularizer with 0 < p <

1, it is more challenging to solve the optimization problemin MTL. To this end, the non-convex but Lipschitz continuous ℓ , − regularizer has recently been investigated in [29], whichextends the ℓ − regularizer in STL [30]–[32] to MTL, i.e.,min W M X i = k y i − X i w i k + α k W k , − , (6)where k W k , − , k W k , − k W k , = k W k , − k W k F and it isready to verify k W k , − ≥ k W k F ≤ k W k , . The ℓ , − regularizer has been shown to not only achieve better featureselection performance, but also result in an easier optimizationproblem because of the non-Lipschitz continuity.As we mentioned above, all of the ℓ , p with 0 < p ≤ ℓ , − regularizers are approximations to the ℓ , regularizer inMTL. So, they can achieve the group sparsity and only selectcommon features shared by all tasks, but fail to consider task-speciﬁc features (i.e., features shared by a subset of tasks). Toextract both common and task-speciﬁc features in MTL, weintroduce a composite of the ℓ , − and ℓ − regularizers, andobtain the following NC-MTL modelmin W M X i = k y i − X i w i k + α k W k , − + β k W k − , (7)i.e.,min W M X i = k y i − X i w i k + α k W k , + β k W k − ( α + β ) k W k F , (8)where k W k − , k W k − k W k F is used to enforce the sparsityamong all elements in W and we immediately have k W k − ≥ k W k F ≤ k W k . It is worth noting that, the ﬁrst term ℓ , − Fig. 2: An illustration of the proposed NC-MTL model in (8). Theleft-hand side shows the input datasets { X i , y i } Mi = , and the right-handside shows the sparsity pattern of the learned weight matrix W . of the composite regularizer in (7) achieves the group sparsityto select common features shared by all tasks, while the secondterm ℓ − contributes to selecting task-speciﬁc features. Thetwo terms are improved alternatives to ℓ , and ℓ respectively,which have been used in several existing MTL models (see,e.g., [33]–[39]). Hyperparameters α, β > C. Optimization algorithm for NC-MTL

Let us consider the proposed NC-MTL model in (8), whoseobjective function, denoted as h ( W ), is non-convex and thesubtraction of two convex functions f ( W ) and g ( W ), i.e.,min W h ( W ) : = f ( W ) − g ( W ) (9)with f ( W ) = M X i = k y i − X i w i k + α k W k , + β k W k and (10) g ( W ) = ( α + β ) k W k F . (11)A well-known scheme for addressing such a non-convex opti-mization problem is ﬁrst to linearize g ( W ) using its 1st-orderTaylor-series expansion at the current solution W ( k ) , and thenadvance to a new one W ( k + by solving a convex optimizationsubproblem in the framework of ConCave-Convex Procedure(CCCP) [46].More speciﬁcally, the CCCP algorithm can solve the aboveproblem (9) with the following iterations. W ( k + = arg min W f ( W ) − (cid:16) g ( W ( k ) ) + h W − W ( k ) , S ( k ) i (cid:17) = arg min W f ( W ) − h W , S ( k ) i , (12)where S ( k ) ∈ ∂ g ( W ( k ) ). Following the deﬁnition of sub-gradient,i.e., for any W , g ( W ) ≥ g ( W ( k ) ) + h W − W ( k ) , S ( k ) i , we obtain h ( W ( k ) ) = f ( W ( k ) ) − g ( W ( k ) ) ≥ f ( W ( k + ) − (cid:16) g ( W ( k ) ) + h W ( k + − W ( k ) , S ( k ) i (cid:17) ≥ f ( W ( k + ) − g ( W ( k + ) = h ( W ( k + ) . (13)Therefore, the objective function values { h ( W ( k ) ) } ∞ k = are mono-tonically decreasing. Moreover, from the formula of the objec-tive function h ( W ) in (8), { h ( W ( k ) ) } ∞ k = are bounded below by Algorithm 1

CCCP for solving the proposed NC-MTL in (8)

Input:

Datasets { X i , y i } Mi = ; hyperparameters α, β > Initialize k = W (0) = ; repeat W ( k + : = arg min W M X i = k y i − X i w i k + α k W k , + β k W k − h W , S ( k ) i , (14)where S ( k ) ∈ ∂ g ( W ( k ) ) is taken as S ( k ) =  ( α + β ) k W ( k ) k − F W ( k ) , W ( k ) , , , W ( k ) = ; (15) k : = k + until convergence. Output:

The optimal solution W ⋆ .zero, and they thus converge. We can obtain a local optimal W ⋆ of (8) by iteratively solving (12); see Algorithm 1 for details.We next use the accelerated proximal gradient (APG) algo-rithm [47] to solve the convex subproblem (12) or (14), whoseobjective function is the summation of two convex functions,i.e., φ ( W ) (di ﬀ erentiable) and ϕ ( W ) (non-di ﬀ erentiable) with φ ( W ) = M X i = k y i − X i w i k − h W , S ( k ) i and (16) ϕ ( W ) = α k W k , + β k W k . (17)Speciﬁcally, we iteratively update W as follows. W ( t + = arg min W Λ l ( W , W ( t ) ) , (18)where Λ l ( W , W ( t ) ) = φ ( W ( t ) ) + h W − W ( t ) , ∇ φ ( W ( t ) ) i + l k W − W ( t ) k F + ϕ ( W ), and l is a variable step size. In matrix calculus,the gradient of a scalar-valued function φ ( W ) with respect to W can be written as a vector whose components are the gradientsof φ with respect to every column of W . Therefore, we obtain ∇ φ ( W ( t ) ) = [ ∇ φ ( w ( t )1 ) , ∇ φ ( w ( t )2 ) , · · · , ∇ φ ( w ( t ) M )], and ∇ φ ( w ( t ) i ) for1 ≤ i ≤ M can be easily calculated as ∇ φ ( w ( t ) i ) = X Ti ( X i w ( t ) i − y i ) − s ( k ) i , (19)where w ( t ) i and s ( k ) i represent the i -th columns of W ( t ) and S ( k ) ,respectively. Based on simple calculation, we can equivalentlyrewrite Λ l ( W , W ( t ) ) as Λ l ( W , W ( t ) ) = φ ( W ( t ) ) − l k∇ φ ( W ( t ) ) k F + l k W − W ( t ) + l ∇ φ ( W ( t ) ) k F + ϕ ( W ). Then, after ignoring the itemsindependent of W in (18), the update procedure becomes W ( t + = arg min W k W − V ( t ) k F + l ϕ ( W ) , (20)where V ( t ) = W ( t ) − l ∇ φ ( W ( t ) ). Clearly, (20) is in fact, W ( t + = prox l ϕ ( V ( t ) ) , (21)where prox l ϕ stands for the proximal operator [48] of the scaledfunction l ϕ . Algorithm 2

APG for solving the subproblem in (14)

Input:

Datasets { X i , y i } Mi = ; hyperparameters α, β > Initialize t = , θ (0) = , l = , σ = . , W (0) = W (1) = ; repeat calculate Q ( t ) by (25); l = l t − ; while φ ( W ( t + ) + ϕ ( W ( t + ) > Λ l ( W ( t + , Q ( t ) ), where W ( t + is calculated by (20), do l = σ l ; end while l t = l ; t : = t + until convergence. Output:

The optimal solution W ⋆ .Owing to the separability of W on its rows in (20), we candecouple (20) into the following optimization problem for eachrow independently, i.e., for 1 ≤ i ≤ d , w ( t + , i = arg min w i k w i − v ( t ) , i k + l α k w i k + l β k w i k = prox l τ ( v ( t ) , i ) , (22)where w ( t + , i , w i , and v ( t ) , i represent the i -th rows of W ( t + , W ,and V ( t ) , respectively, and τ ( w i ) = α k w i k + β k w i k is a functionof vector w i . Letting τ ( w i ) = β k w i k and τ ( w i ) = α k w i k , wehave, from [37], prox l τ ( v ( t ) , i ) = prox l τ (prox l τ ( v ( t ) , i )). It is wellknown that both prox l τ and prox l τ have closed-form solutions[48], i.e., r = prox l τ ( u ) with r i = (cid:16) − l β | u i | (cid:17) u i , if | u i | ≥ l β, , otherwise , (23)where r i and u i represent the i -th elements of vectors r and u ,respectively, andprox l τ ( u ) = (cid:16) − l α k u k (cid:17) u , if k u k ≥ l α, , otherwise . (24)Therefore, based on (22)–(24), we can obtain the closed-formsolution of W ( t + in (20). To accelerate the proximal gradientmethod, we introduce an auxiliary variable as Q ( t ) = W ( t ) + θ ( t − − θ ( t ) ( W ( t ) − W ( t − ) , (25)and perform the gradient descent procedure with respect to Q ( t ) instead of W ( t ) , where the coe ﬃ cient θ ( t ) is updated by θ ( t ) = + p + θ ( t − ) . (26)The pseudo-code of the APG algorithm for solving (14) isshown in Algorithm 2. D. Testing the proposed NC-MTL on synthetic data

We demonstrate the e ﬀ ectiveness of the proposed NC-MTLmodel in (8) ﬁrst on synthetic data through a comparison withother competing MTL models. We simulated a dataset with M =

10 tasks and d =

100 features, and each task has 40samples. We randomly selected 6 features as common features

MTL_I MTL_II MTL_III MTL_IV NC-MTL2.533.544.55 r m s e Fig. 3: Comparison of the rmse performance of all ﬁve MTL models,where box plots show the rmse results with the error bars representingthe 25-th and 75-th percentiles, respectively, and the mean values areindicated by • . shared by all 10 tasks and 4 features as task-speciﬁc features foreach task. The weights of the selected features were generatedfrom the uniform distribution U (1 ,

3) and the weights of theremaining features were zero (see Fig. 4(a)). The elements ofthe inputs X i ∈ R × for 1 ≤ i ≤

10 were generated fromthe Gaussian distribution N (0 , y i ∈ R were calculated as y i = X i w i + ǫ i , in whichthe elements of noise vectors ǫ i ∈ R were generated from N (0 , . ℓ regularizer to enforcefeature sparsity in MTL, i.e., Ω ( W ) = k W k in (4), whichis Lasso in MTL with all tasks sharing the same sparsityparameter.2) MTL II [40]: In the model, the ℓ , regularizer is used toinduce the group sparsity in MTL, i.e., Ω ( W ) = k W k , in(4), for selecting common features shared by all tasks.3) MTL III [29]: The model applies the ℓ , − regularizer inMTL, i.e., Ω ( W ) = k W k , − in (4), which is an improvedalternative to the ℓ , regularizer for feature selection.4) MTL IV [33]: In the model, the ℓ , and ℓ regularizersare adopted in MTL, i.e., Ω ( W ) = k W k , + βα k W k in (4),to select common and task-speciﬁc features, respectively.In Fig. 3, we present the average prediction performance ofthe ﬁve MTL models, which was quantiﬁed using root meansquare error (rmse) for all the test samples of 10 tasks over 10times 5-fold nested cross-validation (CV). The regularizationparameters in the MTL models were tuned from the range of { . , . , , , , , , , , , } . In Fig. 4(b)-(f),the average of the learned weight matrices over all runs of CVis shown for each MTL model. We can observe from Figs. 3and 4 that the proposed NC-MTL model extracted the mostaccurate features and achieved the best performance.III. E xperimental R esults A. Data acquisition and preprocessing

In this study, data were taken from the Philadelphia Neurode-velopmental Cohort (PNC) [21], which is a collaborative study (a) (b) -0.500.511.522.5 (c) -0.500.511.522.5 (d) -0.500.511.522.5 (e) -0.500.511.522.5 (f) -0.500.511.522.5

Fig. 4: (a) The ground-truth weight matrix W ∈ R × . (b)-(f) Theaverage of the learned weight matrices over all runs of CV for eachof the ﬁve MTL models (i.e., MTL I, MTL II, MTL III, MTL IV,NC-MTL), respectively. of child development between the Brain Behavior Laboratoryat the University of Pennsylvania and the Center for AppliedGenomics at the Children’s Hospital of Philadelphia. The PNCcontained nearly 900 participants (8 −

22 years old) with multi-modal neuroimaging and genetics datasets. Our analyses werelimited to 715 subjects who underwent rs-fMRI scans and hadminimal head movement with a mean frame-wise displacementbeing less than 0 .

25 mm. The demographic characteristics ofthe subjects are shown in Table II. During the resting-statescan, subjects were instructed to stay awake, keep eyes open,ﬁxate on the displayed crosshair, and remain still.

TABLE II: Demographic characteristics of the subjects in this study;std denotes the standard deviation.

Male FemaleNumber of subjects 319 396Age (range; mean ± std) 8 . − .

75 8 . − . . ± .

14 15 . ± . All rs-fMRI datasets were acquired on the same 3T SiemensTIM Trio whole-body scanner using a single-shot, interleavedmulti-slice, gradient-echo, EPI sequence (TR / TE = / = ◦ , FOV = ×

192 mm , matrix = × = × × / spm / ),which include motion correction, co-registration, spatial nor-malization to standard MNI space, and temporal smoothingwith a 3 mm FWHM Gaussian kernel. The inﬂuences ofhead motion were regressed out, and functional time courseswere further band-pass ﬁltered with a passband of 0 . − . Fig. 5: The Power atlas with an a priori assignment of ROIs to di ﬀ erentfunctional modules. ROIs of the same color belong to the same moduleand ROIs’ colors indicate module memberships, where ROIs assignedto 10 key functional modules were visualized and the others (assignedto cerebellar and unsorted) not. these ROIs (227 out of 264) were assigned to 10 pre-deﬁnedfunctional modules, i.e., sensory-motor network (SMT), de-fault mode network (DMN), visual network (VIS), cingulo-opercular network (COP), fronto-parietal network (FPT), dorsalattention network (DAT), ventral attention network (VAT),auditory network (AUD), salience network (SAL), and sub-cortical network (SBC), which were utilized for localizationanalyses and visualized with BrainNet Viewer [50] in Fig. 5.A functional connectivity matrix (264 × B. Comparison between univariate and multivariate functionalconnectivity for age prediction

In this subsection, we utilized whole-brain functional con-nectivity (i.e., a total of 34716 functional connectivity for eachsubject) to predict subjects’ ages based on a linear support vec-tor regression (SVR). For comparison, two di ﬀ erent methodsintroduced in Section II-A were adopted to construct functionalconnectivity, i.e., dCor and pCor based functional connectivity,respectively. The SVRs (implemented in LIBSVM with defaultparameters [51]) were trained and tested using 5-fold CV, andthe 5-fold CV procedure was repeated 10 times to reduce thee ﬀ ects of CV sampling bias and provide reliable performance.We reported the average prediction performance (mean ± std),which was quantiﬁed by both correlation coe ﬃ cient (cc) andrmse between the predicted and observed ages of the subjectsin the test sets over all runs of CV.Fig. 6 illustrates the average dCor and pCor based functionalconnectivity patterns across subjects for each gender group. InFig. 6, the average dCor based functional connectivity shown inthe upper triangle of a matrix heatmap is clearly stronger thanthe average pCor based functional connectivity shown in thelower triangle. The age prediction performance for each gendergroup is presented in Fig. 7. Speciﬁcally, for the female group,cc and rmse results using dCor based functional connectivitywere 0 . ± . . ± . . ± . . ± . TABLE III: The comparison of regression performance of the male group and the female group by di ﬀ erent predictive models. Model Males Femalescc (mean ± std) rmse (mean ± std) cc (mean ± std) rmse (mean ± std)SVR 0 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . Average functional connectivity patterns: dCor vs. pCor

50 100 150 200 250 (a) Females

50 100 150 200 250 (b) Males

Fig. 6: The average functional connectivity patterns estimated by dCor(upper triangle of a matrix heatmap) and pCor (lower triangle) acrosssubjects for each gender group. based functional connectivity were also better than those usingpCor based functional connectivity, i.e., 0 . ± . . ± . . ± . . ± . pCor dCor0.520.540.560.580.60.62 cc pCor dCor 2.52.552.62.652.7 r m s e (a) Age prediction for females pCor dCor0.630.640.650.660.670.680.690.7 cc pCor dCor 2.252.32.352.42.45 r m s e (b) Age prediction for males Fig. 7: The prediction performance in terms of cc and rmse for eachgender group. Blue box plots exhibit cc results for the left y -axis, andmagenta box plots exhibit rmse results for the right y -axis, where • and ∗ indicate the corresponding mean values. on dCor based functional connectivity to jointly analyze ageprediction tasks for both genders. C. Results of the proposed NC-MTL for age prediction

In this subsection, with the use of dCor based functional con-nectivity, we compared the age prediction performance of ourNC-MTL model with ﬁve other predictive models, i.e., SVR foreach gender group separately, and four MTL models (MTL I,MTL II, MTL III, MTL IV) as mentioned before. We used10 times 5-fold nested CV to tune the hyperparameters as wellas to obtain the best average performance in all experiments.All regularization parameters (also called hyperparameters) inthe ﬁve MTL models were chosen by a grid search within their P r e d i c t e d a g e s (a) Males P r e d i c t e d a g e s (b) Females Fig. 8: The two scatter plots illustrate the relationships between thepredicted and observed ages of males and females, respectively, wherethe predicted ages were obtained by the proposed NC-MTL model.Each green dot represents one subject. Each red solid line representsthe best-ﬁt line of the green dots, and its 95% conﬁdence interval isindicated by two dashed lines. respective ranges; that is, α, β ∈ { − , − , − , − , , } .Prior to training the predictive models, simple feature ﬁlteringwas conducted. More speciﬁcally, we discarded the dCor basedfunctional connectivity features for which the p -values of thecorrelation with ages of males and females in the training setwere both greater than or equal to 0 .

01. For each gender group,the remaining features of training subjects were normalized tohave zero mean and unit norm, and the mean and norm valuesof training subjects were used to normalize the correspondingfeatures of testing subjects. We performed the mean-centeringon ages of training subjects and then used the mean age valueof training subjects to normalize ages of testing subjects.The detailed age prediction results are summarized in TableIII. The accuracy of the proposed NC-MTL model was alwayssuperior to those of other predictive models, indicating that ourNC-MTL model had better prediction performance. It suggeststhat the composite regularizer by combining ℓ , − and ℓ − regularization terms, introduced in our NC-MTL model, wasmore e ﬀ ective in identifying discriminative features associatedwith ages through selecting both common and gender-speciﬁcfeatures. Moreover, as shown in Table III, the ﬁve MTL modelsall achieved better prediction performance than the STL model (i.e., SVR), which demonstrates that joint analysis of multipletasks, while exploiting commonalities and / or di ﬀ erences acrosstasks, can result in improved prediction accuracy, compared tolearning these tasks independently. For the proposed NC-MTLmodel, we present the relationships between the predicted andobserved ages of males and females in Fig. 8, respectively.In the objective function (7) of our NC-MTL model, thereare two regularization parameters (i.e., α and β ). They balancethe relative contributions of the common and task-speciﬁc fea-ture selection, respectively. We then studied the e ﬀ ect of theseregularization parameters on the age prediction performance.As shown in Fig. 9, the parameters α and β were combinedto obtain the age prediction performance of the proposed NC-MTL model, which ﬂuctuates when changing the values of theparameters. Fig. 9: The cc results of both genders based on the proposed NC-MTLmodel with di ﬀ erent values of α and β . D. Discriminative functional connectivity and gender di ﬀ er-ences detected by the proposed NC-MTL In this subsection, based on the proposed NC-MTL model,we investigated the most discriminative functional connections(functional connectivity features) with potential biological sig-niﬁcance relevant to gender di ﬀ erences in brain development.Speciﬁcally, the proposed NC-MTL model in (7) generated twoweight vectors (i.e., w and w , one for each gender group) offunctional connectivity features. With respect to each gendergroup, we averaged the absolute values of the weights of eachfeature over all runs of CV as the weight of the correspondingfunctional connectivity. The larger the weight of the functionalconnectivity feature is, the more discriminative the functionalconnectivity feature is.For ease of visualization, we identiﬁed the top 150 mostdiscriminant age-related functional connections for each gendergroup, and Fig. 10 only shows the most discriminant within-and between-module functional connections for the 10 pre-deﬁned functional modules. As shown in Fig. 10, SMT, DMN,VIS, and FPT are important functional modules detected forboth genders. The numbers of identiﬁed functional connectionsbetween SMT and DMN, between FPT and DMN, and withinFPT are larger for males. The numbers of identiﬁed functionalconnections between SMT and AUD, within VIS, and betweenSMT and VIS are larger for females. Functional brain activityspanning the frontoparietal regions were involved in comparingheading direction [52], and functional connections between theright FPT and DMN were increased in better navigators [53]. (a) SMT COP AUD DMN VIS FPT SAL SBC VAT DATSMTCOPAUDDMNVISFPTSALSBCVATDAT (b)(c)

SMT COP AUD DMN VIS FPT SAL SBC VAT DATSMTCOPAUDDMNVISFPTSALSBCVATDAT (d)

Fig. 10: The visualization of the most discriminative (among 150) age-related functional connections between and within the 10 functionalmodules for each gender group, i.e., (a)-(b) males and (c)-(d) females.The left are brain plots showing sagittal views of the functional graphin anatomical space, where node colors indicate module membership.The right are matrix plots showing the total numbers of within- andbetween-module connections.

For females higher connectivity existed between sensory andattention systems, while for males higher connectivity betweensensory, motor, and default mode systems were observed [54].Recent evidence indicates that functional connectivity patternsof the auditory system and many other (e.g., visual and motor)brain systems were related to language-related activation [55].Therefore, these ﬁndings in this paper were consistent withthe previous results that males have better spatial orientationand motor coordination skills, and females have better visuallanguage and verbal working memory skills.

E. Limitations and future work

In this paper, we estimated functional connectivity betweenROIs using distance correlation rather than Pearson’s correla-tion. Distance correlation is a multivariate statistical method,which is able to measure both linear and nonlinear dependencebetween ROIs, and hence captures more complex information.However, like Pearson’s correlation, distance correlation cannotexclude the e ﬀ ects of several other controlling or confoundingROIs when computing pairwise correlations. Therefore, in ourfollow-up study, it is interesting to measure functional con-nectivity by partial distance correlation [56], [57], which is anextension of distance correlation, and can calculate conditionaldependence between ROIs. Furthermore, the proposed NC-MTL model achieved satisfactory prediction performance, butwe can further improve it in our future work. For example, inour NC-MTL model, we can impose additional constraints thate ﬀ ectively utilize di ﬀ erent pieces of information inherent in thedata, including feature-feature relation, label-label relation, and subject-subject relation [58]. As deep neural networks haverecently received growing attention and shown outstandingperformance in various applications, it is also interesting toextend the composite regularizer in our NC-MTL model intoa multi-task deep learning framework. On the other hand, itwill be important to apply our NC-MTL model to evaluatedi ﬀ erences in brain functional connectivity patterns acrossdi ﬀ erent populations, e.g., disease conditions, or developmentalstages in behavior and cognition.IV. C onclusion In this paper, we ﬁrst demonstrated that multivariate func-tional connectivity estimates can provide more powerful infor-mation between ROIs than univariate functional connectivityestimates. The experimental results on the PNC data showedthat dCor based functional connectivity better predicted indi-viduals’ ages than pCor based functional connectivity. Next, weproposed a novel NC-MTL model by introducing a compositeregularizer that combines the ℓ , − and ℓ − terms, which areimproved alternatives to the classical ℓ , and ℓ , respectively;as a result, it promises improved extraction of common andtask-speciﬁc features. Results showed improved performanceof the proposed NC-MTL model over several competing onesfor predicting ages from functional connectivity patterns usingrs-fMRI of the PNC, where age prediction for each gendergroup was treated as one task. In addition, we detected bothcommon and gender-speciﬁc age-related functional connectiv-ity patterns to characterize the e ﬀ ects of gender and age onbrain development. R eferences [1] G. H. Glover, “Overview of functional magnetic resonance imaging,” Neurosurg. Clin. N. Am. , vol. 22, pp. 133-139, 2011.[2] J. Xu et al. , “Large-scale functional network overlap is a general propertyof brain functional organization: Reconciling inconsistent fMRI ﬁndingsfrom general-linear-model-based analyses,”

Neurosci. Biobehav. Rev. , vol.71, pp. 83-100, 2016.[3] B. B. Biswal et al. , “Toward discovery science of human brain function,”

Proc. Natl. Acad. Sci. , vol. 107, no. 10, pp. 4734-4739, 2010.[4] V. D. Calhoun, T. Eichele, and G. Pearlson, “Functional brain networks inschizophrenia: A review,”

Front. Hum. Neurosci. , vol. 3, pp. 1-12, 2009.[5] X. Shen et al. , “Using connectome-based predictive modeling to predictindividual behavior from brain connectivity,”

Nat. Protoc. , vol. 12, no. 3,pp. 506-518, 2017.[6] S. Gao, A. S. Greene, R. T. Constable, and D. Scheinost, “Combining mul-tiple connectomes improves predictive modeling of phenotypic measures,”

NeuroImage , vol. 201, pp. 116038, 2019.[7] B. Jie, D. Zhang, W. Gao, Q. Wang, C.-Y. Wee, and D. Shen, “Integrationof network topological and connectivity properties for neuroimagingclassiﬁcation,”

IEEE Trans. Biomed. Eng. , vol. 61, no. 2, pp. 576-589,2014.[8] E. S. Finn et al. , “Functional connectome ﬁngerprinting: Identifyingindividuals using patterns of brain connectivity,”

Nat. Neurosci. , vol. 18,no. 11, pp. 1664, 2015.[9] Z. Cui et al. , “Individual variation in functional topography of associationnetworks in youth,”

Neuron , vol. 106, no. 2, pp. 340-353, 2020.[10] B. Cai et al. , “Reﬁned measure of functional connectomes for improvedidentiﬁability and prediction,”

Hum. Brain Mapp. , vol. 40, pp. 4843-4858,2019.[11] D. A. Fair et al. , “Functional brain networks develop from a ‘local todistributed’ organization,”

PLoS Comput. Biol. , vol. 5, no. 5, pp. e1000381,2009.[12] L. Wang, L. Su, H. Shen, and D. Hu, “Decoding lifespan changes of thehuman brain using resting-state functional connectivity MRI,”

PLoS ONE ,vol. 7, no. 8, pp. e44530, 2012.[13] A. Qiu, A. Lee, M. Tan, and M. K. Chung, “Manifold learning on brainfunctional networks in aging,”

Med. Image Anal. , vol. 20, no. 1, pp. 52-60,2015. [14] N. U. F. Dosenbach et al. , “Prediction of individual brain maturity usingfMRI,” Science , vol. 329, no. 5997, pp. 1358-1361, 2010.[15] T. B. Meier et al. , “Support vector machine classiﬁcation and char-acterization of age-related reorganization of functional brain networks,”

NeuroImage , vol. 60, no. 1, pp. 601-613, 2012.[16] A. N. Nielsen, D. J. Greene, C. Gratton, N. U. F. Dosenbach, S.E. Petersen, and B. L. Schlaggar, “Evaluating the prediction of brainmaturity from functional connectivity after motion artifact denoising,”

Cereb. Cortex , vol. 29, no. 6, pp. 2455-2469, 2019.[17] G. J. Sz´ekely, M. L. Rizzo, and N. K. Bakirov, “Measuring and testingdependence by correlation of distances,”

Ann. Statist. , vol. 35, no. 6, pp.2769-2794, 2007.[18] G. J. Sz´ekely and M. L. Rizzo, “The distance correlation t -test ofindependence in high dimension,” J. Multivariate Anal. , vol. 117, pp. 193-213, 2013.[19] L. Geerligs, Cam-CAN, and R. N. Henson, “Functional connectivity andstructural covariance between regions of interest can be measured moreaccurately using multivariate distance correlation,”

NeuroImage , vol. 135,pp. 16-31, 2016.[20] K. Yoo, M. D. Rosenberg, S. Noble, D. Scheinost, R. T. Constable,and M. M. Chun, “Multivariate approaches improve the reliability andvalidity of functional connectivity and prediction of individual behaviors,”

NeuroImage , vol. 197, pp. 212-223, 2019.[21] T. D. Satterthwaite et al. , “Neuroimaging of the Philadelphia neurode-velopmental cohort,”

NeuroImage , vol. 86, pp. 544-553, 2014.[22] A. Etchell et al. , “A systematic literature review of sex di ﬀ erences inchildhood language and brain development,” Neuropsychologia , vol. 114,pp. 19-31, 2018.[23] V. J. Schmithorst and S. K. Holland, “Sex di ﬀ erences in the developmentof neuroanatomical functional connectivity underlying intelligence foundusing Bayesian connectivity analysis,” NeuroImage , vol. 35, no. 1, pp.406-419, 2007.[24] X.-N. Zuo et al. , “Growing together and growing apart: Regional andsex di ﬀ erences in the lifespan developmental trajectories of functionalhomotopy,” J. Neurosci. , vol. 30, no. 45, pp. 15034-15043, 2010.[25] G. Alarc´on, A. Cservenka, M. D. Rudolph, D. A. Fair, and B. J. Nagel,“Developmental sex di ﬀ erences in resting state functional connectivity ofamygdala sub-regions,” NeuroImage , vol. 115, pp. 235-244, 2015.[26] T. D. Satterthwaite et al. , “Linked sex di ﬀ erences in cognition andfunctional connectivity in youth,” Cereb. Cortex , vol. 25, no. 9, pp. 2383-2394, 2015.[27] X. Zhu, H. Li, and Y. Fan, “Parameter-free centralized multi-task learningfor characterizing developmental sex di ﬀ erences in resting state functionalconnectivity,” in Proc. AAAI Conf. Artif. Intell. , pp. 2660-2667, 2018.[28] R. C. Gur et al. , “Age group and sex di ﬀ erences in performance on acomputerized neurocognitive battery in children age 8-21,” Neuropsychol-ogy , vol. 26, no. 2, pp. 251-265, 2012.[29] Y. Shi, J. Miao, Z. Wang, P. Zhang, and L. Niu, “Feature selection with ℓ , − regularization,” IEEE Trans. Neural Netw. Learn. Syst. , vol. 29, no.10, pp. 4967-4982, 2018.[30] E. Esser, Y. Lou, and J. Xin, “A method for ﬁnding structured sparsesolutions to nonnegative least squares problems with applications,”

SIAMJ. Imag. Sci. , vol. 6, no. 4, pp. 2010-2046, 2013.[31] P. Yin, Y. Lou, Q. He, and J. Xin, “Minimization of ℓ − for compressedsensing,” SIAM J. Sci. Comput. , vol. 37, no. 1, pp. A536-A563, 2015.[32] Y. Lou, S. Osher, and J. Xin, “Computational aspects of constrained L − L minimization for compressed sensing,” in Modelling, Computation andOptimization in Information Systems and Management Sciences . Cham,Switzerland: Springer, pp. 169-180, 2015.[33] H. Wang et al. , “Sparse multi-task regression and feature selection toidentify brain imaging predictors for memory performance,” in

Proc. IEEEInt. Conf. Comput. Vis. , pp. 557-562, 2011.[34] S. Tabarestani et al. , “A distributed multitask multimodal approach forthe prediction of Alzheimer’s disease in a longitudinal study,”

NeuroImage ,vol. 206, pp. 116317, 2020.[35] L. Brand, K. Nichols, H. Wang, L. Shen, and H. Huang, “Joint multi-modal longitudinal regression and classiﬁcation for Alzheimer’s diseaseprediction,”

IEEE Trans. Med. Imag. , vol. 39, no. 6, pp. 1845-1855, 2020.[36] L. Xiao, J. M. Stephen, T. W. Wilson, V. D. Calhoun, and Y.-P. Wang, “Amanifold regularized multi-task learning model for IQ prediction from twofMRI paradigms,”

IEEE Trans. Biomed. Eng. , vol. 67, no. 3, pp. 796-806,2020.[37] J. Zhou, J. Liu, V. A. Narayan, and J. Ye, “Modeling disease progressionvia fused sparse group lasso,” in

Proc. ACM SIGKDD Conf. Knowl.Discovery Data Mining , pp. 1095-1103, 2012.[38] J. Wang, Q. Wang, H. Zhang, J. Chen, S. Wang, and D. Shen, “Sparsemultiview task-centralized ensemble learning for ASD diagnosis based on age- and sex-related functional connectivity patterns,”

IEEE Trans.Cybern. , vol. 49, no. 8, pp. 3141-3154, 2019.[39] X. Hao et al. , “Multi-modal neuroimaging feature selection with con-sistent metric constant for diagnosis of Alzheimer’s disease,”

Med. ImageAnal. , vol. 60, pp. 101625, 2020.[40] A. Argyriou and T. Evgeniou, “Multi-task feature learning,” in

Proc. Adv.Neural Inf. Process. Syst. , pp. 41-48, 2007.[41] F. Nie, H. Huang, X. Cai, and C. Ding, “E ﬃ cient and robust featureselection via joint ℓ , -norms minimization,” in Proc. Adv. Neural Inf.Process. Syst. , pp. 1813-1821, 2010.[42] C. Zu, B. Jie, M. Liu, S. Chen, D. Shen, and D. Zhang, “Label-alignedmulti-task feature learning for multimodal classiﬁcation of Alzheimer’sdisease and mild cognitive impairment,”

Brain Imaging Behav. , vol. 10,pp. 1148-1159, 2016.[43] M. Zhang, C. Ding, Y. Zhang, and F. Nie, “Feature selection at thediscrete limit,” in

Proc. AAAI Conf. Artif. Intell. , pp. 1355-1361, 2014.[44] H. Peng and Y. Fan, “A general framework for sparsity regularized featureselection via iteratively reweighted least square minimization,” in

Proc.AAAI Conf. Artif. Intell. , pp. 2471-2477, 2017.[45] X. Du, Y. Yan, P. Pan, G. Long, and L. Zhao, “Multiple graph unsuper-vised feature selection,”

Signal Process. , vol. 120, pp. 754-760, 2016.[46] A. L. Yuille and A. Rangarajan, “The concave-convex procedure,”

NeuralComput. , vol. 15, no. 4, pp. 915-936, 2003.[47] Y. Nesterov, “A method of solving a convex programming problem withconvergence rate O (1 / k ),” Sov. Math. Doklady , vol. 27, no. 2, pp. 372-376, 1983.[48] N. Parikh and S. Boyd, “Proximal algorithms,”

Found. Trends Optim. ,vol. 1, no. 3, pp. 123-231, 2014.[49] J. D. Power et al. , “Functional network organization of the human brain,”

Neuron , vol. 72, no. 4, pp. 665-678, 2011.[50] M. Xia, J. Wang, and Y. He, “BrainNet Viewer: A network visualizationtool for human brain connectomics,”

PloS one , vol. 8, no. 7, pp. e68910,2013.[51] C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support vectormachines,”

ACM Trans. Intell. Syst. Technol. , vol. 2, no. 27, pp. 1-27,2011.[52] H. Burte, B. O. Turner, M. B. Miller, and M. Hegarty, “The neural basisof individual di ﬀ erences in directional sense,” Front. Hum. Neurosci. , vol.12, pp. 410, 2018.[53] S. C. Izen, E. R. Chrastil, and C. E. Stern, “Resting state connectivitybetween medial temporal lobe regions and intrinsic cortical networkspredicts performance in a path integration task,”

Front. Hum. Neurosci. ,vol. 12, pp. 415, 2018.[54] G. Kohls et al. , “The nucleus accumbens is involved in both the pursuit ofsocial reward and the avoidance of social punishment,”

Neuropsychologia ,vol. 51, no. 11, pp. 2062-2069, 2013.[55] J. R. Binder, J. A. Frost, T. A. Hammeke, R. W. Cox, S. M. Rao, andT. Prieto, “Human brain language areas identiﬁed by functional magneticresonance imaging,”

Science , vol. 342, no. 6158, pp. 585-589, 2013.[56] G. Sz´ekely and M. L. Rizzo, “Partial distance correlation with methodsfor dissimilarities,”

Ann. Statist. , vol. 42, no. 6, pp. 2382-2412, 2014.[57] J. Fang et al. , “Fast and accurate detection of complex imaging geneticsassociations based on greedy projected distance correlation,”

IEEE Trans.Med. Imag. , vol. 37, no. 4, pp. 860-870, 2018.[58] X. Zhu, H.-I. Suk, L. Wang, S. Lee, and D. Shen, “A novel relational reg-ularization feature selection method for joint regression and classiﬁcationin AD diagnosis,”