[PDF] Functional random effects modeling of brain shape and connectivity

Abstract

We present a statistical framework that jointly models brain shape and functional connectivity, which are two complex aspects of the brain that have been classically studied independently. We adopt a Riemannian modeling approach to account for the non-Euclidean geometry of the space of shapes and the space of connectivity that constrains trajectories of co-variation to be valid statistical estimates. In order to disentangle genetic sources of variability from those driven by unique environmental factors, we embed a functional random effects model in the Riemannian framework. We apply the proposed model to the Human Connectome Project dataset to explore spontaneous co-variation between brain shape and connectivity in young healthy individuals.

Full PDF

FFunctional random eﬀects modeling of brain shapeand connectivity

Eardi Lila ∗ and John A. D. Aston † Department of Biostatistics, University of Washington Statistical Laboratory, DPMMS, University of CambridgeSeptember 15, 2020

Abstract

We present a statistical framework that jointly models brain shape and functionalconnectivity, which are two complex aspects of the brain that have been classicallystudied independently. We adopt a Riemannian modeling approach to account forthe non-Euclidean geometry of the space of shapes and the space of connectivity thatconstrains trajectories of co-variation to be valid statistical estimates. In order todisentangle genetic sources of variability from those driven by unique environmentalfactors, we embed a functional random eﬀects model in the Riemannian framework.We apply the proposed model to the Human Connectome Project dataset to explorespontaneous co-variation between brain shape and connectivity in young healthyindividuals.

Human brains diﬀer in their structural and functional organization (Gilmore et al., 2018).While there is a long history of trying to relate either structural or functional brain featuresto human aspects, such as behavioral and cognitive variables (for recent examples, see e.g.Xia et al., 2018; Zhang et al., 2019), more recently, increasing attention has been drawnon the problem of understanding how brain structure and function are related with eachother (Bullmore and Sporns, 2009).In this work, we introduce a statistical framework that allows us to estimate patternsof co-variation between brain structure and function, while disentangling co-variation dueto genetic and environmental factors. We describe the brain structural organization ofan individual with a surface encoding the brain shape , that is the geometry of the highlyconvoluted outermost layer of the brain, called the cerebral cortex. We describe the brainfunctional organization of an individual with a network that has spatial nodes locatedon the cerebral cortex. The strength of the network edges is estimated by a measure ofpairwise statistical dependence (e.g. correlation) between the neuronal activity associated ∗ [email protected] † [email protected] a r X i v : . [ s t a t . M E ] S e p .. ... ... Figure 1: In the central panel, we show the subject-speciﬁc surfaces encoding the geometryof the cerebral cortex, as reconstructed from the MRI scans. Moreover, we can see thefMRI time-series, describing the neuronal activity of a dense set of 64K points on thecerebral cortex. The color map on the brain surfaces describes a parcellation atlas, whichdeﬁnes 68 regions of the brain in correspondence across subjects. Within each region, anaverage time-series is computed. These are then used to compute the 68 ˆ

68 covariancematrices on the right panel, describing the functional connectivity of each subject. Onthe left panel, a representation of only the shape of the brain surfaces. The shapes onthe left panel and the covariances on the right panel are the object-data of our statisticalanalysis.with the network nodes. The estimated network is a representation of the subject’s brainfunctional connectivity . Figure 1 provides an illustration of this setting.We apply the proposed methodology to 1003 young adults in the Human ConnectomeProject (HCP) dataset (Glasser et al., 2013) with the aim of exploring spontaneous modesof genetically-driven and environmentally-driven co-variation in the brain structure andfunction of healthy individuals.From a methodological perspective, our work can be contextualized within the Ob-ject Data Analysis framework (Marron and Alonso, 2014) as both shapes and connectivitynetworks are complex data objects, living on functional non-Euclidean spaces, where clas-sical Functional Data Analysis approaches (Ramsay and Silverman, 2005) fail to preservethe geometry of these spaces. In order to enforce physiologically valid shape trajec-tory estimates, we represent brain shapes through diﬀeomorphic deformation functionsof the ambient space. We represent brain connectivity by means of covariance functions,which must be non-negative deﬁnite. Our approach tackles diﬀeomorphic constraints andnon-negative deﬁniteness constraints in a Riemannian framework, i.e. by tangent spacemapping through Riemannian logarithmic maps.Within the proposed Riemannian modeling framework, we deﬁne a multi-variable trait variance component model that exploits the relatedness structure among individuals todisentangle co-variation in shape and connectivity that is due to genetic sources and envi-ronmental sources. The proposed model can be regarded as an extension of the classical2ingle-trait and bivariate-trait variance component models in Amos (1994); Almasy et al.(1997) and is formulated as a multivariate mixed eﬀects model on the Karhunen–Lo`evebasis coeﬃcients of the tangent space coordinates. If instead, the aim of the analysis wasto estimate joint modes of variation in shape and connectivity, regardless of their sources,a Canonical Correlation Analysis (CCA) (Hotelling, 1936; Johnson and Wichern, 2002)or an Angle-based Joint and Individual Variation Explained (AJIVE) analysis could havebeen performed on the tangent space coordinates (Lock et al., 2013; Feng et al., 2018;Carmichael et al., 2019).Mixed eﬀects models have been successfully extended to the setting of functional datain a linear space, to account for non-parametric ﬁxed and random eﬀects (see, e.g., Shiet al., 1996; Guo, 2002; Wu and Zhang, 2002; Qin, 2005; Morris and Carroll, 2006; Chenand Wang, 2008; Zhou et al., 2008; Liu et al., 2017; Scheipl et al., 2015). Here, we employa mixed eﬀects model to account for a genetic dependence structure across subjects.Functional models that incorporate genetic information, without explicitly relying on amixed eﬀects model have also been formulated. For instance, Luo et al. (2019) propose amodel that is able to dissect genetic and environmental eﬀects of functional data in a twinstudy design. The proposed model applies to linear functional data and is formulatedas a functional structural equation model. An extension to functional data over two-dimensional domains and living in a linear function space, such as cortical thickness datamapped onto a spherical domain, has been proposed in Risk and Zhu (2019). In thenon-Euclidean framework of our analysis, the variance component model approach allowsfor more ﬂexible relatedness structure among individuals, possibly estimated from SingleNucleotide Polymorphism (SNP) data (see e.g. Dahl et al., 2016).The HCP dataset, which motivates this work, includes Magnetic Resonance Imaging(MRI) and resting-state functional MRI (fMRI) scans. The MRI scans are used to re-construct surface models of the cerebral cortex geometry. The time-variant fMRI signalsare used to estimate a spatial covariance structure on the cerebral cortex, describing howthe diﬀerent parts of the cerebral cortex co-activate in time, namely an estimate of thefunctional connectivity. An illustration of the MRI and fMRI components of the data isprovided in Figure 1. Moreover, the pedigree of the HCP cohort is available and includesmonozygotic twins, dizygotic twins, full siblings, half-siblings, and unrelated individuals.This family structure is what allows the variance component model to disentangle geneticand environmental co-variation between brain shape and connectivity.

Statistical analysis of shapes and covariances

In the applied literature, brain shape is usually modeled by using a few descriptors ofshape, such as cortical volume or the area of pre-deﬁned sets of brain regions (Im et al.,2008; Hazlett et al., 2017), however, more comprehensive approaches to brain shapeanalysis require the deﬁnition of a non-parametric shape representation model. A non-exhaustive list of shape analysis methodologies, based on discrete representations, includeslandmark-based shape representations (see e.g. Dryden and Mardia, 2016), skeletal shaperepresentations (Pizer et al., 2013), dihedral angles representations (Eltzner et al., 2018)and projective shape spaces (Mardia and Patrangenaru, 2005). In the continuous settingof curves and surfaces, global parametrizing functions have been adopted to representthese objects.Representing curves and surfaces with their parametrizing functions, equipped withan L norm, leads to unnatural trajectories in the space of shapes. Instead, a success-3ul approach consists of equipping the space of parametrizing functions with an ElasticRiemannian metric and deﬁning parametrization-invariant representations. The resultingspace and associated metric lead to naturally looking geodesic trajectories in the shapespaces of curves (Kurtek et al., 2012; Su et al., 2014) and surfaces (Kurtek et al., 2011;Jermyn et al., 2012, 2017).Of particular importance to this work is a class of representation models for surfacesthat do not require the computation of parametrizing functions. These represent surfaceswith diﬀeomorphic deformation functions of the ambient space R (see e.g. Vaillant et al.,2004; Charon and Trouv´e, 2014; Arguill`ere et al., 2016; Younes, 2010). Such an approachis well suited to the neuroimaging setting because constraining the deformation functionsto be diﬀeomorphic results in a shape space that contains anatomically plausible shapesand excludes, for instance, self-intersecting surfaces. Statistical analysis can then beperformed on the non-linear manifold of diﬀeomorphic functions by exploiting tangentspace expansion to ﬁnd a linear representation of the data. Modeling shapes throughdeformations of the ambient space requires that features that are not related to shape,such translation and rigid rotations, be removed from the data. This is achieved byclassical Procrustes superimposition (Dryden and Mardia, 2016).In a similar spirit to shape analysis, the statistical analysis of samples that are covari-ances also involves a non-Euclidean type analysis. The neuroimaging community has oftenapproached the problem by performing multivariate analysis on vectorizations of the co-variances (or the vectorization of their upper or lower triangular part) (Smith et al., 2015;Xia et al., 2018). However, such an approach fails to guarantee that linear extrapolationsof the data belong to the space of covariances, i.e. that they are positive semi-deﬁniteobjects. In other words, there may not exist a signal with the estimated extrapolated ‘co-variance’. Such an issue can be overcome by deﬁning an appropriate Riemannian metricon the space of covariances.To this purpose, diﬀerent metrics have been proposed. For instance, Pennec et al.(2006) introduce an aﬃne invariant Riemannian metric, while Arsigny et al. (2006) in-troduce a log-Euclidean metric based on the matrix exponential and matrix logarithmfunctions. Dryden et al. (2009) introduce a metric that can deal with rank deﬁcient co-variance matrices, and its extension to covariance operators has been proposed in Pigoliet al. (2014). As shown in Dryden et al. (2009), diﬀerent metrics lead to diﬀerent geodesictrajectories in the space of covariances. While these are easy to visualize for lower dimen-sional covariances, in our high dimensional setting, these diﬀerences are more diﬃcult toappreciate. Therefore our choice of the metric is mostly driven by computational eﬃciencyarguments, and in particular by close form solutions of the geodesic mean. Statisticalanalysis can then be performed on tangent space projections of the covariances, which arecomputed through the Riemannian logarithmic map, and oﬀer a convenient parametriza-tion with respect to a linear space. We then use the tangent coordinates in the space ofshapes and the space of covariances to ﬁnd maximally associated modes of variation inshape and connectivity while decomposing the genetic and unique environmental variancecontributions.The rest of the paper is organized as followed. In Section 2, we introduce the Rieman-nian modeling framework and the variance component model. In Section 3 we presentthe implementation details of the proposed model. We then apply the proposed model tothe HCP dataset and present the results in Section 4. We ﬁnally give some concludingremarks in Section 5 and perform simulations validating the variance component modelin the appendix. 4 Mathematical description of the model

Consider a sample of n pairs of observations tp S i , C i q : i “ , . . . , n u . Here, p S i Ă R q aretwo-dimensional surfaces, embedded in R , representing the cerebral cortex geometries.The functions p C i : S i ˆ S i Ñ R q are covariance functions representing the associatedfunctional connectivity on the subject’s cerebral cortex. Moreover, we assume statisticalrelatedness between the n samples; in our application these are due to family-based geneticassociations.The aim of this section is the introduction of our statistical framework for the analysisof tp S i , C i q : i “ , . . . , n u . As previously mentioned, both the space of brain geometriesand that of brain connectivity are non-Euclidean spaces, introducing additional challengesin the deﬁnition and estimation of the co-variation structure. In Section 2.1, we ﬁrst givea brief conceptual description of the Riemannian approach to modeling the shape andconnectivity space, and then follow by introducing the variance component model. Wedetail our choices of the representation models and metrics, for the shape and connectivityspaces, in Section 2.2 and Section 2.3.Figure 2: This is an illustration of the proposed statistical analysis framework. In ouranalysis, we represent shapes with diﬀeomorphic deformations of the ambient space. Thepictorial representation of the space of diﬀeomorphic functions and that of covarianceshighlights their non-Euclidean structure. We rely on tangent space expansion to derivelinear parametrizations of these non-Euclidean spaces. The linear tangent coordinates ofshape p v Si q and connectivity p v Ci q are then jointly used to deﬁne a variance componentmodel that exploits the kinship structure among the samples to estimate pairs of tangentcoordinates that are highly correlated due to genetic factors p v SG , v CG q or environmentalfactors p v SE , v CE q . The shape and covariance non-linear trajectories associated with theestimated pairs of tangent coordinates can ﬁnally be computed to display the results interms of elements of the shape and connectivity spaces.5 .1 Functional random eﬀects modeling of shape and connec-tivity Let S Ă R be a template surface. We assume there exists a one-to-one correspondencebetween each of the points on S and those on p S i q . The role of the template is two-fold here. The template is a surface representing an average geometric shape of thepopulation that allows us to model shapes as functions that are R deformations of thereference template. Moreover, the template plays the role of a common reference domainwhere the subject-speciﬁc covariance functions can be mapped onto.For a ﬁxed template, we represent each surface S i with an associated deformationfunction γ i : R Ñ R such that γ i p S q “ S i . These deformations are diﬀeomorphicfunctions, i.e. they are smooth one-to-one functions with smooth inverse. The spaceof deformations is formally equipped with a Riemannian metric and the diﬀeomorphicfunctions are projected onto the tangent space centered at the identity function. A setof tangent space coordinates v S , . . . , v Sn is then used to represent the surfaces S , . . . , S n .We assume that each function v Si can be expressed in terms of a common basis expansion v Si “ ÿ j “ A Si,j ψ Sj , where ψ Sj is the j th basis element and A Si,j is the coeﬃcient of the i th sample associatedwith the j th basis.The covariance functions p C i q also belong to a non-Euclidean space, which is the cone ofpositive semi-deﬁnite covariance functions. Moreover, they are deﬁned on sample-speciﬁcspatial domains p S i q . The previously deﬁned deformation functions p γ i q can be used tomap a covariance C i onto the template S , deﬁning C i p x, y q : “ C i p γ ´ i p x q , γ ´ i p y qq , with x, y P S . This leads to a set of ‘spatially normalized’ covariance functions C i : S ˆ S Ñ R . At this point, the cone of positive semi-deﬁnite covariance functions can be equippedwith a Riemannian metric. Note that, as detailed in Section 2.3, this step is in practicepreceded by a dimension reduction step that we omit here to keep the notation simple.The data p C i q are then projected onto the tangent space centered at the geodesic meanand the associated tangent space coordinates v C , . . . , v Cn are used to represent C , . . . , C n .As for the shape tangent coordinates, we assume p v Ci q can be expressed in terms of acommon basis expansion v Ci “ ÿ j “ A Ci,j ψ Cj , with ψ Cj the j th basis element and A Ci,j the coeﬃcient of the i th sample associated withthe j th basis.A simple approach to controlling for a set of known confounding variables p z i P R l q consists of modeling directly E “ v Si | z i ‰ and E “ v Ci | z i ‰ though a functional regression analysisor by modeling E “ A Si,j | z i ‰ and E “ A Ci,j | z i ‰ by means of a multivariate regression analysis.The expected values E “ A Si,j | z i ‰ and E “ A Ci,j | z i ‰ are related to E “ v Si | z i ‰ and E “ v Ci | z i ‰ bythe following equations: E “ v Si | z i ‰ “ ÿ j “ E “ A Si,j | z i ‰ ψ Sj , E “ v Ci | z i ‰ “ ÿ j “ E “ A Ci,j | z i ‰ ψ Cj The estimated eﬀects of the confounders can then be removed from the tangent spacecoordinates. 6n practice, we choose appropriate truncation levels p S and p C , and rely on the ﬁnite-dimensional approximations v Si « p S ÿ j “ A Si,j ψ Sj , v Ci « p C ÿ j “ A Ci,j ψ Cj . While any basis in the space of the tangent coordinates is a valid choice, in our applicationwe rely on the functional principal component basis, due to its well known best linearapproximation property.Before introducing our variance component model, we brieﬂy recall the matrix nor-mal distribution MVN, which generalizes the multivariate normal distribution to matrix-valued random variables. A n ˆ p random matrix X has a matrix normal distributionMVN p M, U, V q if and only if vec p X q has a multivariate normal N p vec p M q , V b U q , wherevec is the column-wise vectorization operator, and b denotes the Kronecker product. Thematrix normal distribution is characterized by three parameters that are a n ˆ p meanmatrix M , a n ˆ n matrix U (modeling covariance between the rows of the random matrix)and a p ˆ p matrix V (modeling covariance between the columns of the random matrix).Consider now the n ˆ p S matrix p A S q ij “ A Si,j and the n ˆ p C matrix p A C q i,j “ A Ci,j .We propose a joint model for shape and connectivity in terms of a n ˆ p p C ` p S q matrix A that contains both the features described in A S and A C , i.e. A : “ “ A S , A C ‰ . If at this stage we were interested in understanding co-variation between brain shapeand connectivity, we could perform CCA on the scores matrices A S and A C , or alterna-tively, an AJIVE analysis on the tangent vectors (Feng et al., 2018; Carmichael et al.,2019). Nevertheless, the estimated joint variation components would be an average of theco-variation that is due to genetic and environmental factors. Therefore, our next step isdeﬁning a model that separates genetic and environmental variability.Let K n be a n ˆ n matrix of relatedness coeﬃcients, that is, a correlation structurebetween the n subjects encoding genetic relatedness. We assume this is known, and thatin practice can be computed from a pedigree or from genetic data. Let I n be the n ˆ n identity matrix, which as opposed to K n , encodes ‘unrelatedness’ between subjects. Letthe p p S ` p C q ˆ p p S ` p C q matrices Σ G and Σ E denote respectively the unknown geneticand environmental covariance structure across the columns of A . We model A as A | G, E “ XB ` G ` EG „ MVN p n ˆp p S ` p C q , K n , Σ G q ,E „ MVN p n ˆp p S ` p C q , I n , Σ E q , (1)with X denoting a n ˆ s design matrix and B a s ˆ p p S ` p C q matrix of ﬁxed eﬀects.The proposed model exploits a known covariance structure K n , across the samples, toadditively decompose the covariance structure across the traits into two components, thatare Σ G and Σ E . In our application, K n is chosen to reﬂect the pairwise family relatednessacross samples. Therefore, we refer to Σ G as the covariance component of the traits thatis due to additive genetic contributions, while we refer to Σ E as the covariance structurethat is due to unique environmental contributions. In practice, Σ G and Σ E are estimatedwith a Restricted Maximum Likelihood (REML) approach, as detailed in Section 3.The model proposed is a multi-variable trait extension of the polygenic quantitativetrait variance component models (Amos, 1994; Almasy et al., 1997). This is also related7o multivariate linear mixed models applied in genome-wide association studies (Zhouand Stephens, 2014; Dahl et al., 2016). In our model, the multivariate traits are tangentspace descriptors of brain shape and connectivity.The matrices Σ G and Σ E can be written asΣ G “ „ Σ S,SG Σ S,CG Σ C,SG Σ C,CG  , Σ E “ „ Σ S,SE Σ S,CE Σ C,SE Σ C,CE  , with Σ S,SG a p S ˆ p S matrix, Σ C,CG a p C ˆ p C matrix, Σ S,CG a p S ˆ p C matrix, and Σ S,CG a p C ˆ p S . The environmental components are deﬁned similarly. These matrices repre-sent the covariance between and within low-dimensional representations of the tangentspace coordinates of shape and connectivity. In this form, they are not themselves veryinformative. Therefore, we propose to look at maximally correlated genetic and environ-mental modes of co-variation between the geometry of the cerebral cortex and associatedconnectivity by computing p θ SG , θ CG q “ " arg max θ,η θ T Σ S,CG η : θ T Σ S,SG θ “ , η T Σ C,CG η “ * and p θ SE , θ CE q “ " arg max θ,η θ T Σ S,CE η : θ T Σ S,SE θ “ , η T Σ C,CE η “ * , where θ SG , θ SE are p S -dimensional vectors, while θ CG , θ CE are p C -dimensional vectors. Con-straints of the type θ T θ “ , η T η “ p θ SG , θ CG q and p θ SE , θ CE q represent the maximally correlated modes of co-variation in shape and connec-tivity that are due to genetic and non-genetic factors. Subsequent modes of co-variationcan also be estimated by maximizing the same objective functions while imposing orthog-onality constraints with respect to the previously computed components.We ﬁrst compute the tangent space coordinates associated with these modes of co-variation, i.e. ` v SG , v CG ˘ : “ ˜ p S ÿ j “ θ Sj,G ψ Sj , p C ÿ j “ θ Cj,G ψ Cj ¸ , ` v SE , v CE ˘ : “ ˜ p S ÿ j “ θ Sj,E ψ Sj , p C ÿ j “ θ Cj,E ψ Cj ¸ . Finally, the co-variation structure between brain shape and connectivity, due to geneticfactors, can be visualized by computing the surfaces and covariance functions, in theirrespective curved manifolds, that are associated with the pairs of elements p´ cv SG , ´ cv CG q and p cv SG , cv CG q , for an appropriate choice of a positive constant c . Analogously, we canvisualize the environmental co-variation structure between brain shape and connectivity,by computing the surfaces and covariance functions that are associated with the pairs ofelements p´ cv SE , ´ cv CE q and p cv SE , cv CE q . A pictorial representation of the described analysisis given in Figure 2. Typically, statistical shape analysis is performed on suitable parametric shape descriptorsof the data. Here, we brieﬂy review two popular nonparametric approaches. The ﬁrstone consists of representing a surface with a global parametrization function. This is fol-lowed by the introduction of a Riemannian metric on the space parametrizing functions.8 Karcher mean can then be computed, where the eﬀect of shape-preserving transfor-mations is removed from the data. Statistical analysis is performed by computing linearrepresentations of the shapes on the tangent space centered at the Karcher mean (see e.g.Jermyn et al., 2017, and references therein). Jermyn et al. (2012) introduce the square-root normal ﬁeld representation of shapes. One can prove that such a representation,when equipped with the L metric, is associated with a speciﬁc instance of the ‘Elasticmetric’ on the space of shapes, which measures diﬀerences in shapes as a weighted sumof bending and stretching terms.An alternative approach consists of modeling shapes as objects resulting from defor-mations of R applied to a template shape. A Riemannian metric can then be introducedon the space of deformations. In our application setting, such an approach allows us to in-corporate topological constraints in the statistical framework and constrain our statisticalestimate to be valid shapes.In detail, we introduce a diﬀeomorphic operator ϕ mapping a suﬃciently smoothHilbert space p V , } ¨ } V q onto a group G of diﬀeomorphic functions. Formally, the space V is the tangent space of G at the identity map, and ϕ is the associated exponential map.We deﬁne G as follows. Let t v t P V : t P r , su be a time-dependent V -valued processsuch that ş } v t } V dt ă 8 . Then, the solution φ v : r , s ˆ R Ñ R , at time t “

1, of theOrdinary Diﬀerential Equation (ODE) B φ v B t p t, x q “ v t ˝ φ v p t, x q t P r , s , x P R , (2)with initial condition φ v p , x q “ x , is a smooth diﬀeomorphic deformation of R (see, e.g.,Younes, 2010). The group G consists of all the solutions of equation (2).Equation (2) allows us to parameterize a diﬀeomorphic deformation φ v p , ¨q (and there-fore a shape φ v p , S q that preserves the topology of S ) with a time-variant vector-ﬁeld t v t P V : t P r , su . We then model the space of the time-variant vector-ﬁelds by deﬁning t v t : t P r , su to be a time-variant vector ﬁeld which minimizes the quantity ş t } v t } V dt ,for a given initial vector ﬁeld v (Miller et al., 2006). Finally, the diﬀeomorphic operatoris deﬁned to be ϕ v p x q : “ φ v p , x q , where v P V is the initial vector ﬁeld generating t v t : t P r , su , and φ v is the solution of the ODE (2) for the computed t v t : t P r , su .With the notation of the previous section, we deﬁne γ i : “ ϕ v Si with i “ , . . . , n . Theinitial vector ﬁelds p v Si q are estimated by minimizing a penalized mismatching functional,the details of which are left to Section 3. What is important from a statistical perspectiveis that a surface S i , which belongs to a curved space, can now be represented by anelement v Si of the linear function space V , where v Si is such that S i “ ϕ v Si p S q . As mentioned earlier, the covariance functions p C i q belong to the cone of positive semi-deﬁnite covariance functions and they are deﬁned on sample-speciﬁc spatial domains p S i q .The maps p ϕ v Si q , introduced in the previous section, deﬁne a canonical way to transport C i onto the template S , deﬁning C i p x, y q : “ S i p ϕ ´ v i p x q , ϕ ´ v i p y qq , with x, y P S . Therefore,in this section, we will focus on modeling C i .The manifold of positive semi-deﬁnite covariance functions can be equipped with aRiemannian metric. Note that equipping such a space with the L distance of the embed-ding space results in variations around the mean that do not belong to the space of validestimates. This has motivated the use of diﬀerent metrics in the statistical literature.9ovariance functions associated with densely observed functional data on two-dimensionaldomains have the additional issue of being voluminous. Therefore the ﬁrst step in theiranalysis is generally the expansion onto a ﬁnite functional basis t b j u Ă L p S q , i.e. C i p x, y q « K ÿ j “ K ÿ l “ C Kjl b j p x q b l p y q , where p C Ki q jl is the K ˆ K covariance matrix associated with the covariance function C i .The functions p b j q are estimated from the data. A popular choice in neuroimagingis a set of piecewise constant functions that are constant within connected regions ofthe cortical surface. This set of connected regions, also known as brain parcellation,deﬁnes functional sub-units on the cortical surface. Alternatively, such a basis is estimatedfrom an Independent Component Analysis of the fMRI data, deﬁning either spatially ortemporally independent ‘weighted’ parcellations of the brain. Such a step allows us toachieve a ﬁrst dimension reduction of the data.A Riemannian metric can then be deﬁned on the space of K ˆ K symmetric positive-deﬁnite matrices by deﬁning a smoothly varying scalar product x A, B y C on the tangentspace centered at C , which is the linear space of K ˆ K symmetric matrices. Such aRiemannian metric deﬁnes a geodesic distance on the space of covariance matrices thatis given by the length of the shortest curve connecting any two covariances. Pennec et al.(2006) introduce an aﬃne invariant Riemannian metric for positive-deﬁnite matrices thatinduces the distance d Riem p C , C q “ } log p C ´ { C C ´ { q} F , where C ´ { “ V D ´ { V T ,with C “ V DV T denoting its spectral decomposition. A further option is the Choleskydistance d chol p C , C q “ } chol p C q ´ chol p C q} F , where L “ chol p C q denotes the Choleskydecomposition of a positive-deﬁnite matrix C “ LL T . In Arsigny et al. (2006), the log-Euclidean distance of two positive-deﬁnite matrices C and C is deﬁned as d log p C , C q “} log p S q ´ log p S q} F , where } ¨ } F is the Frobenius norm and log p¨q the matrix logarithm,i.e. log p C q “ V log p D q V T , with log p D q denoting the diagonal matrix whose entries arethe logarithms of the entries of D .We model covariance matrices by equipping the space of covariances with the log-Euclidean metric deﬁned in Arsigny et al. (2006). As already mentioned, for such achoice of the Riemannian metric, the geodesic distance is given by d p C , C q “ } log p C q ´ log p C q} F . The Fr´echet mean of p C Ki q is then deﬁned as F “ arg inf M ř ni “ d p C Ki , M q . Given C , asymmetric positive-deﬁnite matrix, and L a symmetric matrix, the Riemannian exponen-tial and logarithmic maps have the formExp F p L q “ exp p log p F q ` B L log p F qq , Log F p C q “ B log p C q´ log p F q exp p log p F qq , where B L log p F q and B V exp p L q are respectively the diﬀerential of the matrix logarithmand the diﬀerential of the matrix exponential (Arsigny et al., 2006). The tangent repre-sentation v Ci , of C Ki , is then deﬁned on the (linear) tangent space centered at F from thesymmetric matrix Log F p C Ki q . 10 Estimation

Shape representation

The surfaces p S i q need ﬁrst to be aligned; i.e. one-to-one correspondence needs to beestablished between subject-speciﬁc dense sets of landmarks p x p i q l q Ă S i . In practice,the aligned landmarks are the vertices of the triangulated surfaces approximating theidealized surface S i . In order to avoid burdening the notation, we do not distinguishbetween idealized surfaces and associated triangulated surfaces, as this will be clear fromthe context.The problem of image registration is common to any population analysis of images,however, the choice of the registration model generally depends on the speciﬁc application(Zitov´a and Flusser, 2003). In our application setting, this step is generally performed byﬁrst inﬂating the surfaces to a spherical domain, and then aligning the maps projected ontothe spherical domain in order to increase a measure of structural/functional ‘coherence’across subjects, while minimizing the amount of distortion introduced by the registration(Fischl et al., 1999; Yeo et al., 2010; Robinson et al., 2014, 2018). A registration modelthat instead is able to deal with maps mapped on general manifold domains has beenproposed in Lila and Aston (2019).Given the n sets of aligned landmarks, a template S represented by the vertices t x p q l u Ă S is estimated by means of a Procrustes analysis. Such an analysis allowsus to estimate the template while removing translation, size, and rigid rotations fromthe surfaces p S i q . Then, the shape tangent space coordinates p v Si q , associated with thesurfaces p S i q , are computed by solving a minimization problem of the form v Si “ arg min v i P V ÿ l } ϕ v i p x p q l q ´ x p i q l } R ` λ } v i } V , i “ , . . . , n, (3)where the empirical term measures the similarity of the deformed template ϕ v i S with S i .The term } v i } V can be intuitively understood as a regularizing term that measures the‘energy’ associated with the deformation. The constant λ controls the trade-oﬀ betweenthe empirical and regularizing term.To obtain an unbiased estimate of the template S , with respect to the deﬁnedmetric on the group of diﬀeomorphic functions G , we could update the template with S Ð ϕ ¯ v p S q , where ¯ v : “ n ´ ř ni “ v Si . Subsequently, we could recompute t v Si u by solving(3) for the newly estimated template. These steps can then be iterated until convergence.Such a procedure is however prohibitive for computational reasons. Therefore, we ﬁx thetemplate to be the one resulting from the Procrustes analysis.The space V is modeled as a Reproducing Kernel Hilbert Space (RKHS) with a kernelthat is a ﬁnite sum of Gaussian kernels of the type K σ p x, y q “ exp p´ } x ´ y } R σ q I . Theminimization problem is approached with a BGFS optimization scheme. We adopt theimplementation in the MATLAB package fshapetk (Charlier et al., 2015, 2017). Functional connectivity

Given the K ˆ K subject-speciﬁc covariance matrices p C Ki q (see Figure 1 for an illustra-tion), we ﬁrst compute the least-square Fr´echet mean estimate. Given our choice of the11iemannian metric this has the closed-form solution (Arsigny et al., 2006) F “ exp n n ÿ i “ log C Ki + . The tangent space coordinates, in a matrix form, are given by V Ci “ Log F p C Ki q . Inpractice, to circumvent stability issues related to the numerical computation of the dif-ferential of the matrix exponential and logarithm, we perform tangent expansion aroundthe identity, i.e. work with the coordinates V Ci “ log p C i q ´ log p F q . Finally, the tangent space coordinates v Ci , deﬁned in Section 2.3, are given by v Ci “ vec Sym ` V Ci ˘ , where vec Sym p L q “ p l , , . . . , l K,K , ? l , , . . . , ? l K ´ ,K q T denotes a con-venient vectorization operation for the space of symmetric matrices equipped with theFrobenius norm. Mixed Eﬀects Model

We reformulate model (1), by deﬁning two n ˆ p p S ` p C q independent random matrices U „ MVN p n ˆp p S ` p C q , I n , I p S ` p C q , V „ MVN p n ˆp p S ` p C q , I n , I p S ` p C q with K n , Σ G , Σ E such that K n “ K n p K n q T , Σ G “ Σ G p Σ G q T , and Σ E “ Σ E p Σ E q T . Then,we can rewrite model (1) as A | U, V “ XB ` K n U p Σ G q T ` V p Σ E q T , thanks to the fact that K n U p Σ G q T „ MVN p n ˆp p S ` p C q , K n , Σ G q , V p Σ E q T „ MVN p n ˆp p S ` p C q , I n , Σ E q . In practice, the kinship matrix K n is rank deﬁcient, which is in fact a desirable propertyas it means that there are highly correlated samples that, intuitively, make it possible todisentangle the genetic and environmental covariance. When two samples are maximallycorrelated, as for instance in the case of monozygotic twins, the number of rows of U canbe reduced by one, and the two samples can be modeled with the same random eﬀect.The matrices U and V are then vectorized to a multivariate normal vector with inde-pendent samples, and the matrices K , p Σ G q T , and p Σ E q T are re-structured accordingly.Finally, the unknown covariance Σ G and Σ E are ﬁnally estimated optimizing the REMLcriterion with respect to the parametrizing matrices Σ G and Σ E . The proposed modelhas been implemented in R as a wrapper of the function lmer in the package lme4 (Bateset al., 2015). In our application, Σ G and Σ E are unstructured covariance matrices. Nev-ertheless, the proposed model can be easily extended to handle correlation structure witha known pattern. 12 Statistical analysis & Results

Data & Preprocessing

This study focuses on the analysis of all 1003 healthy adult subjects, from the S1200 HCPdata release, that have complete resting-state fMRI scans. The structural MRI imageshave been acquired at a resolution of 0.7mm isotropic and the resting-state fMRI imageshave been acquired at a spatial resolution of 2.0mm isotropic and a temporal resolutionof 0.7s. Resting-state fMRI data were acquired in four runs of 15 minutes, over two days.During the fMRI scans, the subjects were not performing any explicit tasks. Furtherdetails on the acquisition process can be found in Glasser et al. (2013); Smith et al.(2013). An extensive set of subject traits, such as behavioral and demographic covariates,are also provided. The HCP dataset includes multiple members of the same families,resulting in a familial relatedness structure across samples.The MRI and fMRI data have been pre-processed with the minimal pre-processingHCP pipeline (Glasser et al., 2013). In particular, white, pial, and midthickness surfacesof the cerebral cortex are reconstructed. We use the midthickness surfaces, which aresurfaces that interpolate the midpoints between the white and pial surfaces, to describethe anatomy of the cerebral cortex and we refer to them as cortical surfaces. The fourresting-state fMRI runs have been pre-processed to remove artifactual components in thedata (Smith et al., 2013). The fMRI signal associated with neuronal activation on thecerebral cortex has been extracted and mapped onto the cortical surfaces, resulting in thedata illustrated in Figure 1.

Spatial normalization

The cortical surfaces are given in the form of two closed triangulated surfaces of 32Kvertices, describing respectively the geometry of the left and right hemispheres. The 64Kvertices have been brought in correspondence across subjects thanks to the application ofa multi-modal surface alignment algorithm (Robinson et al., 2014, 2018), which enablessurface alignment based on both anatomical and functional features. Instead, earlierapproaches have been driven exclusively by geometric features (see e.g. Fischl et al.,1999).In the setting of joint shape and functional modeling, the importance of using the func-tional component of the data in the alignment procedure has been shown, for instance,in Lila and Aston (2019). In the cited work, the authors propose a functional manifoldsurface alignment model embedded in the statistical analysis to improve the surface align-ment. Here, we rely on the multi-modal spherical alignment of Robinson et al. (2018),where extensive hyper-parameters tuning and validation have already been performed forthe HCP dataset in question.In the next section, statistical shape analysis is performed on the cortical surfacesby treating the surface vertices as anatomical landmarks, as these are in geometric andfunctional correspondence across subjects. Note that the deﬁned surfaces cannot yet beregarded as shapes (Dryden and Mardia, 2016) as non-physiological features, such astranslation and rotations, are still present in the data.13 hape Analysis

In order to project the surfaces in the shape-space – which is, to remove Euclidean similar-ity transformations from the data – we perform Generalized Procrustes Analysis (Drydenand Mardia, 2016) on the anatomical landmarks. This has the eﬀect of removing trans-lation, rigid rotation, and scale from the data, while iteratively estimating an averageshape in the shape-space. Translation and rotation are discarded as these componentsdo not have a physiological meaning. Scale, instead, is a positive scalar that does havea physiological meaning and its log-transformation l i will be incorporated in the ﬁnalanalysis. We denote with S i the brain surfaces projected onto the shape-space and with S the estimated template average shape.While features of the data that are non-descriptive of shape have been now removed,the landmark description of shapes does not guarantee that linear statistical models of thedata, such as PCA, generate topologically valid shape trajectories. In our application, avalid shape trajectory is one that, for instance, does not lead to self-intersecting surfaces(see e.g. Vaillant et al., 2004, for an illustration of the issue).Therefore, we represent each surface S i with a vector ﬁeld v Si belonging to the linearspace V . The vector ﬁeld v Si is computed by solving the minimization problem in (3), inpractice leading to a function v Si such that S i « ϕ v Si p S q , where ϕ is the diﬀeomorphicdeformation operator deﬁned in Section 2.2. In practice, two separate vector ﬁelds, onefor each hemisphere, are estimated independently. We do not denote them separately assuch a choice has computational advantages, but no other practical implications.The RKHS space V is deﬁned by a kernel that is the sum of six isotropic Gaussiankernels in R with standard deviations t , , , , . , . u . The penalty coeﬃcient in (3) ischosen to be λ “ ´ . These hyper-parameters have been selected by experimenting ona small subset of the full cohort. Computations have been performed on a cluster whereeach node is equipped with an Intel Xeon E5-2650 2.2GHz 12-core processor with 96GBRAM and four Nvidia P100 GPUs. The minimization algorithm takes approximately40 minutes for each subject and uses only one GPU. The representing 1003 vector ﬁeldsare thus computed in parallel on several nodes, greatly reducing the computation timeneeded.We identify a set of demographic confounding variables (e.g. height, weight, age, . . . ),which are demeaned and their squares are included in the analysis (when the confounderis a continuous variable). We then regress the confounders out of the RKHS coeﬃcientsrepresenting p v Si q and the log-transformed size coeﬃcients p l i q from the Procrustes Anal-ysis.We perform functional PCA analysis on t v Si u leading to the representation v Si « ¯ v S ` ř p S j “ A Si,j ψ Sj . In contrast to the idealized basis expansion in Section 2.2, we have a non-zero mean term ¯ v S “ n ´ ř ni “ v Si . In fact, due to the prohibitive computational costsincurred in computing v Si , we do not iteratively re-estimate the template until the meanterm ¯ v S becomes negligible, as noted in Section 3. We select the truncation level p S “ i th subject is given by p S scalarcoeﬃcients that are A Si, , . . . , A Si,p S , and the log-transformed size coeﬃcient l i from theProcrustes analysis. 14 onnectivity Analysis For each vertex of a surface S i , we have four times series (one for each run) describingthe resting-state neuronal activation of that location. Thanks to the anatomical andfunctional correspondence of the surface vertices across subjects, we can equivalentlyperform our analysis on the common template surface S . We adopt an atlas of thetemplate S that assigns a label to each vertex of the template, deﬁning a parcellation.For each run, and within each region of the parcellation, we compute a robust spatiallyaveraged time-series that represents the brain activity of the entire region of a speciﬁcsubject.Many approaches have been proposed to deﬁne parcellation atlases (Fischl et al.,2004; Desikan et al., 2006; Power et al., 2011; Yeo et al., 2011; Van Essen et al., 2012;Wig et al., 2014; Gordon et al., 2016; Glasser et al., 2016). See Arslan et al. (2018)for a recent systematic review. Over the years they have tremendously improved intheir granularity and ability to incorporate multi-modal imaging to deﬁne parcellationsof the cortical surface. In this work, we rely on the popular Desikan-Killiany parcellation(Desikan et al., 2006), which deﬁnes 68 cortical surface regions. More recent parcellations,as the one proposed in Glasser et al. (2016), deﬁne up to 360 regions. In this work, weopted for a courser parcellation to mitigate the eﬀect of misregistration, which is the eﬀectof small errors in the surface alignment step.For each subject, the 68 ˆ ˆ

68 covariance matrix is computed for each run and the average covariance across the fourruns, C Ki , is used to represent the i th subject functional connectivity. These covariancematrices are sometimes referred to as networks, or connectomes, as they quantify thefunctional connectivity between network nodes that are regions of the cortical surface.Figure 3: On the left panel, we plot the empirical covariance of the normalized n ˆ p ` p S ` p C q data matrix of size, shape, and connectivity descriptors. On the middle andright panel, we can ﬁnd the latent covariance components that are due to genetic andenvironmental factors, as recovered by the variance component model proposed.We perform connectivity analysis in the log-Euclidean framework. As detailed in Sec-tion 2.3, we compute the matrix logarithms of C Ki and compute the associated coeﬃcientswith respect to a Frobenius-orthogonal basis on the space of symmetric matrices. Thisleads to the computation of a set of representing tangent space coordinates t v Ci u that are15ectors of dimension 2346, which is the number of entries of the upper triangular part ofthe 68 ˆ

68 covariance matrices.An extensive set of confounding variables is identiﬁed, including all the confoundingvariables used in the shape analysis step. In addition, we introduce variables that aremore directly related to the acquired blood-oxygen-level-dependent signal, such as averagesubject head motion, and systolic/diastolic blood pressure (see Smith et al., 2015, for acomplete list). The confounding variables are regressed out of the connectivity tangentcoordinates t v Ci u (we do not rename the deconfounded tangent space coordinates).We perform PCA on the deconfounded tangent space coordinates t v Ci u , leading to thebasis expansion v Ci « ¯ v Ci ` ř p C j “ A Ci,j ψ Cj . We truncate the basis expansion at p C “ i th subject, connectivity isﬁnally represented by 10 coeﬃcients that are A Ci, , . . . , A Ci,p C . Family relatedness

The statistical model proposed in Section 2.1 relies on the presence of genetic related-ness between the subjects to disentangle genetic and environmental contributions to thecovariation patterns between brain shape and connectivity. Of the 1003 subjects, 1001had family relatedness information. The dataset consists of unrelated individuals, fullsiblings, half-siblings, dizygotic twins, and monozygotic twins. Speciﬁcally, there are 429families, with a number of members that range from 1 to 6.The relatedness matrix K n , in the multivariable trait model (1), represents pairwiseexpected covariance, between subjects, that is due to familial relatedness. We estimate thematrix K n as K n “ n , with Φ n the matrix of kinship coeﬃcients (Almasy et al., 1997;Lange, 2002). We use Solar ( ) to computethe kinship matrix from the HCP family structure data. In larger population studies, anempirical genetic relatedness matrix could be computed from SNP data (see e.g. Kanget al., 2010). Joint random-eﬀects modeling

In the previous steps of the analysis, for the i th subject, we have derived a vector of scalarvariables l i , A Si, , . . . , A Si,p S , A Ci, , . . . , A Ci,p C , with the ﬁrst variable being a descriptor of brain size, the subsequent p S variables beingdescriptors of shape, and the ﬁnal p C variables being descriptors of connectivity. Theempirical covariance matrix Σ, computed from these variables, describes the pairwise ﬁrst-order dependencies between such descriptors, hence the co-variation structure betweenshape and connectivity that is due to both genetic and environmental contributions. Weinstead want to leverage the familial relatedness matrix K n and apply the joint mixedmodel in Section 2.1 to disentangle the additive covariance components Σ G and Σ E , whichare respectively due to genetic and environmental contributions.The covariances Σ G and Σ E are estimated by optimizing the REML criterion associatedwith model (1), without a ﬁxed eﬀects term. For our choice of the truncation levels( p S “ p C “ p S and p C , as the descriptors could start capturing smaller scale variations in shape andconnectivity, whose modes of covariation could be driven by small and unavoidable errorsin the surface alignment steps (for an illustration of the issue, see Section S3 in thesupplementary material for Lila and Aston, 2019). Results

In Figure 3, we show respectively the estimates of Σ (the empirical covariance struc-ture), Σ G (the covariance structure due to genetic contributions) and Σ E (the covariancestructure due to unique environmental contributions) of the 1 ` p S ` p C descriptors ofsize, shape, and connectivity. We can see that the portion of variability explained by theadditive genetic contributions is larger for size and shape than for connectivity.We measure the heritability, i.e. the portion of shape and connectivity variabilityexplained by additive genetic contributions as h “ trace p Σ G q trace p Σ G ` Σ E q . Our heritability estimate is h “ .

61, which means we estimate that 61% of the sourceof variability in the data (post-PCA) is due to genetic contributions.In Figure 4, we show the results of a linear regression model, between brain sizeand connectivity, formulated using the estimated genetic and environmental covariancebetween the respective descriptors. After the linear model has been ﬁtted, we display thevariation in connectivity associated with a ˘ σ variation in size, where σ is the standard17igure 5: Illustration of the shape and connectivity main modes of co-variation thatare due to genetic and environmental contributions. Speciﬁcally, we display changes inshape and connectivity that are most correlated with respect to the estimated covariancestructure Σ G (on the left) and Σ E (on the right). The main modes of co-variation displayan association between a global change in shape and a global change in connectivitylevels, with a contrasting behavior in the genetic and environmental components. Largervariations in connectivity levels are displayed between communities rather than withincommunities.Figure 6: On the left, the shape conﬁguration identiﬁed by the tangent vector ´ σv SG . Onthe right, the shape conﬁguration identiﬁed by the tangent vector ` σv SG . The tangentvector v SG represents shape changes, due to genetic contributions, in the main mode of co-variation between shape and connectivity. The ﬁgure highlights the ability of the proposedframework to capture non-trivial variations in the brain shape, such as the formation ofa sulcus. 18eviation of the log-transformed size variable. The results do not display large associatedvariations in connectivity. However, when running the analysis without removing theconfounders from the size and connectivity descriptors, we have observed that a generalincrease in connectivity is associated with an increase in brain size, both in the geneticand environmental components.In Figure 5, we display the results of the CCA between shape descriptors and con-nectivity descriptors. Speciﬁcally, we compute the tangent coordinates p´ cv SG , ´ cv CG q and p cv SG , cv CG q , for the genetic contribution, and p´ cv SE , ´ cv CE q and p cv SE , cv CE q , for the environ-mental contribution, as detailed in Section 2.1. We then plot the shape and connectivityconﬁgurations identiﬁed by these tangent coordinates. In practice, we choose c “ σ , with σ the standard deviation of the scores associated with the modes of co-variation. Ourmain modes of genetic and environmental co-variation highlight an association betweena global variation in brain shape and a global change in brain functional connectivitylevels that seem to show a contrasting behavior in the genetic and environmental modesof co-variation.The proposed approach to modeling shape and connectivity allows us to capture bothglobal and local variations. A more meticulous exploration of the results demonstrates aclear advantage of the adopted approach to shape modeling. In particular, in Figure 6,we show local shape changes that are associated with the mode of variation ˘ σv SG . Wecan see non-trivial shape variations involving the formation of a sulcus. Capturing suchﬁne-grained variations would in general not be possible with the simple shape descriptors(e.g. surface area) classically adopted in the neuroimaging literature.In the supplementary material, we include more informative video plots that show themodes of co-variation as continuous shape and connectivity trajectories, associated withthe tangent space directions p tv SG , tv CG q and p tv SE , tv CE q , for all t P r´ σ, σ s . Moreover, weinclude the .vtk ﬁles of these trajectories, which allow for a more careful exploration ofthe results, by using a data visualization application, e.g. Paraview . In this work, we propose a statistical Riemannian approach for the analysis of samples thatare brain shapes and brain connectivity. In the proposed framework, we embed a variancecomponent model that exploits family relatedness structure among samples to separatebrain shape and connectivity co-variation that is due to genetic and environmental factors.The proposed Riemannian modeling approach allows us to estimate trajectories inthe spaces of shapes and connectivity that are constrained to belong to their respectivenon-Euclidean spaces of anatomically/physiologically meaningful estimates. Speciﬁcally,we are able to exclude shapes that are not topologically equivalent to the shapes in thesample, e.g. self-intersecting shapes, and we are able to exclude functional connectivityestimates that are not symmetric positive-deﬁnite objects. Moreover, the shape modelingapproach proposed in this paper can also be easily extended to incorporate heterogeneoustypes of imaging data, such as volumetric representations of subcortical brain structuresand bundles of axons estimated from Diﬀusion Tensor Imaging (Feydy et al., 2017).When it comes to shapes and covariances, diﬀerent representation models and metricshave been proposed in the literature. The choices we have made in this work are mainlydriven by the reasons aforementioned. However, the exploration of diﬀerent shape metrics,such as the Elastic Metric (Kurtek et al., 2011; Jermyn et al., 2012, 2017), is also apromising direction for future work. A particularly attractive property of the latter is19he ability to integrate the registration step with the computation of the representation.Nonetheless, it is in principle more complicated to enforce topological constraints.The proposed variance component model is based on a reduced dimension represen-tation of shape and connectivity. It is of interest to extend the current approach to workdirectly on the bivariate functional tangent space representations of shape and connectiv-ity. Nevertheless, due to the high-dimensionality of the data, this is currently prohibitive,not only from a computation perspective but also due to the need to incorporate regular-ization penalties to control for the nearly co-linear modes of co-variation that arise whenestimating canonical correlation components from high-dimensional data.

Appendices

A Simulations

In this section, we perform simulations to assess the ﬁnite sample estimation propertiesof the variance decomposition model (1).We generate two p ˆ p correlation matrices Σ G and Σ E as independent samples of arandom correlation matrix which has a uniform distribution over the space of positive-deﬁnite correlation matrices (Joe, 2006). In order to obtain a simulation setting that ismost similar to that of our application, we use the n ˆ n kinship matrix K n from theHCP dataset, with n “ nd ˆ nd kinship matrix I d b K n . Sucha kinship matrix describes a situation where nd samples are instead available, with d unrelated groups of n samples that have correlation structure represented by K n .We then generate our data following model (1), i.e. A | G, E “ G ` EG „ MVN p nd ˆ p , I d b K n , Σ G q ,E „ MVN p nd ˆ p , I nd , Σ E q , where no ﬁxed eﬀects is included to reﬂect the setting of our ﬁnal application. The nd ˆ p matrix A represents a set of simulated tangent space coordinates.For every choice of d “ , . . . ,

5, and p “ ,

4, we generate 100 datasets and then applythe mixed eﬀects model in Section 3 to compute the estimates ˆΣ G and ˆΣ E of Σ G and Σ E .We measure the accuracy of the estimate of the genetic component as } ˆΣ G ´ Σ G } F andthat of the environmental component as } ˆΣ E ´ Σ E } F , where } ¨ } F is the Frobenius norm.We show the results of the simulations in Figure 7, where we can see that, as expected,an increase in sample size results in a lower estimation error. Moreover, as in the datageneration model, we generate 1000 correlation matrices representing correlation matricesto be estimated. For each of the 1000 correlation matrices, we generate 1000 associatednaive estimates, which are correlation matrices from the same distribution, and computetheir Frobenius distance from the true correlation. The empirical distribution of thecomputed Frobenius norms describes the performance of the naive random estimator. Wethen select the 0 . p “

3, and 0 .

01 percentile, for p “

4, of the Frobeniusnorms. These are the horizontal lines in Figure 7. We can see that most of the estimatesfrom the mixed-eﬀects model are well below the selected threshold.20 . . . . . . d E s t i m a t i on e rr o r . . . . . . d E s t i m a t i on e rr o r . . . . . . d E s t i m a t i on e rr o r . . . . . . d E s t i m a t i on e rr o r Figure 7: Boxplots describing the performances of the variance component model, inestimating Σ G and Σ E , as a function of the variable d “ , . . . ,

5, which is a multiplicativefactor in the sample size nd . The estimation error is measured with the Frobenius normof the diﬀerence between the covariance to be estimated and its estimate. We run 100simulations for each choice of d and p . The horizontal lines denote the 0.5 and 0.01percentiles, respectively for p “ p “

4, of the smallest estimation errors of arandom guess estimator. 21 eferences

L. Almasy, T. D. Dyer, and J. Blangero. Bivariate quantitative trait linkage analysis:Pleiotropy versus co-incident linkages. In

Genetic Epidemiology , volume 14, pages 953–958, 1997.C. I. Amos. Robust variance-components approach for assessing genetic linkage in pedi-grees.

American Journal of Human Genetics , 54(3):535–543, 1994.S. Arguill`ere, M. I. Miller, and L. Younes. Diﬀeomorphic Surface Registration with Atro-phy Constraints.

SIAM Journal on Imaging Sciences , 9(3):975–1003, jan 2016.V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. Geometric means in a novel vector spacestructure on symmetric positive-deﬁnite matrices.

SIAM Journal on Matrix Analysisand Applications , 29(1):328–347, 2006.S. Arslan, S. I. Ktena, A. Makropoulos, E. C. Robinson, D. Rueckert, and S. Parisot.Human brain mapping: A systematic comparison of parcellation methods for the humancerebral cortex.

NeuroImage , 170(April 2017):5–30, apr 2018.D. Bates, M. M¨achler, B. M. Bolker, and S. C. Walker. Fitting linear mixed-eﬀects modelsusing lme4.

Journal of Statistical Software , 67(1):1–48, oct 2015.E. Bullmore and O. Sporns. Complex brain networks: graph theoretical analysis of struc-tural and functional systems.

Nature Reviews Neuroscience , 10(3):186–198, mar 2009.I. Carmichael, B. C. Calhoun, K. A. Hoadley, M. A. Troester, J. Geradts, H. D. Couture,L. Olsson, C. M. Perou, M. Niethammer, J. Hannig, and J. S. Marron. Joint andindividual analysis of breast cancer histologic images and genomic covariates. dec 2019.B. Charlier, G. Nardi, and A. Trouv´e. The matching problem between functional shapesvia a BV-penalty term: a Γ-convergence result. pages 1–31, 2015.B. Charlier, N. Charon, and A. Trouv´e. The Fshape Framework for the Variability Anal-ysis of Functional Shapes.

Foundations of Computational Mathematics , 17(2):287–357,2017.N. Charon and A. Trouv´e. Functional Currents: A New Mathematical Tool to Modeland Analyse Functional Shapes.

Journal of Mathematical Imaging and Vision , 48(3):413–431, mar 2014.H. Chen and Y. Wang. A Penalized spline approach to functional mixed eﬀects modelanalysis.

Biometrics , 64(3):751–761, sep 2008.A. Dahl, V. Iotchkova, A. Baud, ˚A. Johansson, U. Gyllensten, N. Soranzo, R. Mott,A. Kranis, and J. Marchini. A multiple-phenotype imputation method for geneticstudies.

Nature Genetics , 48(4):466–472, apr 2016.R. S. Desikan, F. S´egonne, B. Fischl, B. T. Quinn, B. C. Dickerson, D. Blacker, R. L.Buckner, A. M. Dale, R. P. Maguire, B. T. Hyman, M. S. Albert, and R. J. Killiany.An automated labeling system for subdividing the human cerebral cortex on MRI scansinto gyral based regions of interest.

NeuroImage , 31(3):968–980, jul 2006.22. L. Dryden and K. V. Mardia.

Statistical Shape Analysis, with Applications in R . WileySeries in Probability and Statistics. John Wiley & Sons, Ltd, Chichester, UK, sep 2016.ISBN 9781119072492.I. L. Dryden, A. Koloydenko, and D. Zhou. Non-Euclidean statistics for covariance ma-trices, with applications to diﬀusion tensor imaging.

The Annals of Applied Statistics ,3(3):1102–1123, sep 2009.B. Eltzner, S. Huckemann, and K. V. Mardia. Torus principal component analysis withapplications to RNA structure.

The Annals of Applied Statistics , 12(2):1332–1359, jun2018.Q. Feng, M. Jiang, J. Hannig, and J. S. Marron. Angle-based joint and individual variationexplained.

Journal of Multivariate Analysis , 166:241–265, jul 2018.J. Feydy, B. Charlier, F. X. Vialard, and G. Peyr´e. Optimal transport for diﬀeomorphicregistration. In M. Descoteaux, L. Maier-Hein, A. Franz, P. Jannin, D. Collins, andS. Duchesne, editors,

Medical Image Computing and Computer Assisted Intervention- MICCAI 2017. MICCAI 2017. Lecture Notes in Computer Science , volume 10433,pages 291–299. Springer, Cham, sep 2017. ISBN 9783319661810.B. Fischl, M. I. Sereno, and A. M. Dale. Cortical Surface-Based Analysis II: Inﬂation,Flattening, and a Surface-Based Coordinate System.

NeuroImage , 9(2):195–207, feb1999.B. Fischl, A. Van Der Kouwe, C. Destrieux, E. Halgren, F. S´egonne, D. H. Salat, E. Busa,L. J. Seidman, J. Goldstein, D. Kennedy, V. Caviness, N. Makris, B. Rosen, and A. M.Dale. Automatically Parcellating the Human Cerebral Cortex.

Cerebral Cortex , 14(1):11–22, jan 2004.J. H. Gilmore, R. C. Knickmeyer, and W. Gao. Imaging structural and functional braindevelopment in early childhood.

Nature Reviews Neuroscience , 19(3):123–137, feb 2018.M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl, J. L. Andersson,J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni, D. C. Van Essen, and M. Jenkinson. Theminimal preprocessing pipelines for the Human Connectome Project.

NeuroImage , 80:105–124, oct 2013.M. F. Glasser, T. S. Coalson, E. C. Robinson, C. D. Hacker, J. Harwell, E. Yacoub,K. Ugurbil, J. Andersson, C. F. Beckmann, M. Jenkinson, S. M. Smith, and D. C.Van Essen. A multi-modal parcellation of human cerebral cortex.

Nature , 536(7615):171–178, 2016.E. M. Gordon, T. O. Laumann, B. Adeyemo, J. F. Huckins, W. M. Kelley, and S. E.Petersen. Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations.

Cerebral Cortex , 26(1):288–303, jan 2016.W. Guo. Functional Mixed Eﬀects Models.

Biometrics , 58(1):121–128, mar 2002.H. C. Hazlett, H. Gu, B. C. Munsell, S. H. Kim, M. Styner, J. J. Wolﬀ, J. T. Elison,M. R. Swanson, H. Zhu, K. N. Botteron, D. L. Collins, J. N. Constantino, S. R. Dager,A. M. Estes, A. C. Evans, V. S. Fonov, G. Gerig, P. Kostopoulos, R. C. McKinstry,23. Pandey, S. Paterson, J. R. Pruett, R. T. Schultz, D. W. Shaw, L. Zwaigenbaum, andJ. Piven. Early brain development in infants at high risk for autism spectrum disorder.

Nature , 542(7641):348–351, 2017.H. Hotelling. Relations between two sets of variates.

Biometrika , 28(3-4):321–377, dec1936.K. Im, J. M. Lee, O. Lyttelton, S. H. Kim, A. C. Evans, and S. I. Kim. Brain size andcortical structure in the adult human brain.

Cerebral Cortex , 18(9):2181–2191, 2008.I. H. Jermyn, S. Kurtek, E. Klassen, and A. Srivastava. Elastic shape matching of param-eterized surfaces using square root normal ﬁelds.

Lecture Notes in Computer Science(including subseries Lecture Notes in Artiﬁcial Intelligence and Lecture Notes in Bioin-formatics) , 7576 LNCS(PART 5):804–817, 2012.I. H. Jermyn, S. Kurtek, H. Laga, and A. Srivastava. Elastic Shape Analysis of Three-Dimensional Objects.

Synthesis Lectures on Computer Vision , 7(3):1–185, sep 2017.H. Joe. Generating random correlation matrices based on partial correlations.

Journal ofMultivariate Analysis , 97(10):2177–2189, 2006.R. A. Johnson and D. W. Wichern.

Applied multivariate statistical analysis . PrenticeHall, Upper Saddle River, NJ, 5. ed edition, 2002. ISBN 0130925535.H. M. Kang, J. H. Sul, S. K. Service, N. A. Zaitlen, S. Y. Kong, N. B. Freimer, C. Sabatti,and E. Eskin. Variance component model to account for sample structure in genome-wide association studies.

Nature Genetics , 42(4):348–354, 2010.S. Kurtek, E. Klassen, Z. Ding, S. W. Jacobson, J. L. Jacobson, M. J. Avison, andA. Srivastava. Parameterization-invariant shape comparisons of anatomical surfaces.

IEEE Transactions on Medical Imaging , 30(3):849–858, 2011.S. Kurtek, A. Srivastava, E. Klassen, and Z. Ding. Statistical Modeling of Curves UsingShapes and Related Features.

Journal of the American Statistical Association , 107(499):1152–1165, sep 2012.K. Lange.

Mathematical and Statistical Methods for Genetic Analysis . Statistics forBiology and Health. Springer New York, New York, NY, 2002. ISBN 978-1-4684-9556-0.E. Lila and J. A. D. Aston. Statistical Analysis of Functions on Surfaces, With anApplication to Medical Imaging.

Journal of the American Statistical Association , pages1–15, aug 2019.B. Liu, L. Wang, and J. Cao. Estimating functional linear mixed-eﬀects regression models.

Computational Statistics & Data Analysis , 106:153–164, feb 2017.E. F. Lock, K. A. Hoadley, J. S. Marron, and A. B. Nobel. Joint and individual variationexplained (JIVE) for integrated analysis of multiple data types.

The Annals of AppliedStatistics , 7(1):523–542, mar 2013. 24. Luo, R. Song, M. Styner, J. H. Gilmore, and H. Zhu. FSEM: Functional StructuralEquation Models for Twin Functional Data.

Journal of the American Statistical Asso-ciation , 114(525):344–357, jan 2019.K. V. Mardia and V. Patrangenaru. Directions and projective shapes.

The Annals ofStatistics , 33(4):1666–1699, aug 2005.J. S. Marron and A. M. Alonso. Overview of object oriented data analysis.

BiometricalJournal , 56(5):732–753, sep 2014.M. I. Miller, A. Trouv´e, and L. Younes. Geodesic Shooting for Computational Anatomy.

Journal of Mathematical Imaging and Vision , 24(2):209–228, mar 2006.J. S. Morris and R. J. Carroll. Wavelet-based functional mixed models.

Journal of theRoyal Statistical Society. Series B: Statistical Methodology , 68(2):179–199, 2006.X. Pennec, P. Fillard, and N. Ayache. A riemannian framework for tensor computing.

International Journal of Computer Vision , 66(1):41–66, 2006.D. Pigoli, J. A. Aston, I. L. Dryden, and P. Secchi. Distances and inference for covarianceoperators.

Biometrika , 101(2):409–422, 2014.S. M. Pizer, S. Jung, D. Goswami, J. Vicory, X. Zhao, R. Chaudhuri, J. N. Damon,S. Huckemann, and J. S. Marron. Nested Sphere Statistics of Skeletal Models. InM. Breuß, A. Bruckstein, and P. Maragos, editors,

Innovations for Shape Analysis.Mathematics and Visualization. , pages 93–115. Springer, Berlin, Heidelberg, 2013.J. D. Power, A. L. Cohen, S. M. Nelson, G. S. Wig, K. A. Barnes, J. A. Church, A. C.Vogel, T. O. Laumann, F. M. Miezin, B. L. Schlaggar, and S. E. Petersen. FunctionalNetwork Organization of the Human Brain.

Neuron , 72(4):665–678, nov 2011.L. Qin. Functional mixed-eﬀects model for periodic data.

Biostatistics , 7(2):225–234, aug2005.J. Ramsay and W. B. Silverman.

Functional Data Analysis . Springer Series in Statistics.Springer-Verlag, New York, 2005. ISBN 0-387-40080-X.B. B. Risk and H. Zhu. ACE of space: estimating genetic components of high-dimensionalimaging data.

Biostatistics , pages 1–45, jun 2019.E. C. Robinson, S. Jbabdi, M. F. Glasser, J. Andersson, G. C. Burgess, M. P. Harms,S. M. Smith, D. C. Van Essen, and M. Jenkinson. MSM: A new ﬂexible framework forMultimodal Surface Matching.

NeuroImage , 100:414–426, oct 2014.E. C. Robinson, K. Garcia, M. F. Glasser, Z. Chen, T. S. Coalson, A. Makropoulos,J. Bozek, R. Wright, A. Schuh, M. Webster, J. Hutter, A. Price, L. Cordero Grande,E. Hughes, N. Tusor, P. V. Bayly, D. C. Van Essen, S. M. Smith, A. D. Edwards,J. Hajnal, M. Jenkinson, B. Glocker, and D. Rueckert. Multimodal surface matchingwith higher-order smoothness constraints.

NeuroImage , 167(September 2017):453–465,feb 2018.F. Scheipl, A. M. Staicu, and S. Greven. Functional Additive Mixed Models.

Journal ofComputational and Graphical Statistics , 24(2):477–501, 2015.25. Shi, R. E. Weiss, and J. M. G. Taylor. An Analysis of Paediatric CD4 Counts forAcquired Immune Deﬁciency Syndrome Using Flexible Random Curves.

Journal of theRoyal Statistical Society. Series C (Applied Statistics) , 45(2):151, 1996.S. M. Smith, C. F. Beckmann, J. Andersson, E. J. Auerbach, J. Bijsterbosch, G. Douaud,E. Duﬀ, D. A. Feinberg, L. Griﬀanti, M. P. Harms, M. Kelly, T. Laumann, K. L.Miller, S. Moeller, S. Petersen, J. Power, G. Salimi-Khorshidi, A. Z. Snyder, A. T. Vu,M. W. Woolrich, J. Xu, E. Yacoub, K. U˘gurbil, D. C. Van Essen, and M. F. Glasser.Resting-state fMRI in the Human Connectome Project.

NeuroImage , 80:144–168, oct2013.S. M. Smith, T. E. Nichols, D. Vidaurre, A. M. Winkler, T. E. Behrens, M. F. Glasser,K. Ugurbil, D. M. Barch, D. C. Van Essen, and K. L. Miller. A positive-negative modeof population covariation links brain connectivity, demographics and behavior.

NatureNeuroscience , 18(11):1565–1567, 2015.J. Su, S. Kurtek, E. Klassen, and A. Srivastava. Statistical analysis of trajectories onRiemannian manifolds: Bird migration, hurricane tracking and video surveillance.

TheAnnals of Applied Statistics , 8(1):530–552, mar 2014.M. Vaillant, M. Miller, L. Younes, and A. Trouv´e. Statistics on diﬀeomorphisms viatangent space representations.

NeuroImage , 23(SUPPL. 1):S161–S169, jan 2004.D. C. Van Essen, M. F. Glasser, D. L. Dierker, J. Harwell, and T. Coalson. Parcellationsand Hemispheric Asymmetries of Human Cerebral Cortex Analyzed on Surface-BasedAtlases.

Cerebral Cortex , 22(10):2241–2262, oct 2012.G. S. Wig, T. O. Laumann, and S. E. Petersen. An approach for parcellating humancortical areas using resting-state correlations.

NeuroImage , 93:276–291, 2014.H. Wu and J.-T. Zhang. Local Polynomial Mixed-Eﬀects Models for Longitudinal Data.

Journal of the American Statistical Association , 97(459):883–897, sep 2002.C. H. Xia, Z. Ma, R. Ciric, S. Gu, R. F. Betzel, A. N. Kaczkurkin, M. E. Calkins,P. A. Cook, A. Garc´ıa de la Garza, S. N. Vandekar, Z. Cui, T. M. Moore, D. R.Roalf, K. Ruparel, D. H. Wolf, C. Davatzikos, R. C. Gur, R. E. Gur, R. T. Shinohara,D. S. Bassett, and T. D. Satterthwaite. Linked dimensions of psychopathology andconnectivity in functional brain networks.

Nature Communications , 9(1), 2018.B. Yeo, M. Sabuncu, T. Vercauteren, N. Ayache, B. Fischl, and P. Golland. SphericalDemons: Fast Diﬀeomorphic Landmark-Free Surface Registration.

IEEE Transactionson Medical Imaging , 29(3):650–668, mar 2010.B. T. T. Yeo, F. M. Krienen, J. Sepulcre, M. R. Sabuncu, D. Lashkari, M. Hollinshead,J. L. Roﬀman, J. W. Smoller, L. Z¨ollei, J. R. Polimeni, B. Fischl, H. Liu, and R. L.Buckner. The organization of the human cerebral cortex estimated by intrinsic func-tional connectivity.

Journal of Neurophysiology , 106(3):1125–1165, sep 2011.L. Younes.

Shapes and Diﬀeomorphisms . Applied Mathematical Sciences. Springer, Berlin,Heidelberg, 2010. ISBN 978-3-642-12054-1.26. Zhang, G. I. Allen, H. Zhu, and D. Dunson. Tensor network factorizations: Rela-tionships between brain structural connectomes and traits.

NeuroImage , 197(April):330–343, aug 2019.L. Zhou, J. Z. Huang, and R. J. Carroll. Joint modelling of paired sparse functional datausing principal components.

Biometrika , 95(3):601–619, sep 2008.X. Zhou and M. Stephens. Eﬃcient multivariate linear mixed model algorithms forgenome-wide association studies.

Nature Methods , 11(4):407–409, 2014.B. Zitov´a and J. Flusser. Image registration methods: a survey.