Nonparametric Bayes Models of Fiber Curves Connecting Brain Regions
NNonparametric Bayes Models of Fiber CurvesConnecting Brain Regions
Zhengwu Zhang , Maxime Descoteaux , and David B. Dunson Department of Statistical Science, Duke University, United States Computer Science Department, Faculty of Science, University of Sherbrooke, Canada
Abstract
In studying structural inter-connections in the human brain, it is common to first estimatefiber bundles connecting different regions of the brain relying on diffusion MRI. These fiberbundles act as highways for neural activity and communication, snaking through the brainand connecting different regions. Current statistical methods for analyzing these fibers reducethe rich information into an adjacency matrix, with the elements containing a count of thenumber of fibers or a mean diffusion feature (such as fractional anisotropy) along the fibers.The goal of this article is to avoid discarding the rich functional data on the shape, size andorientation of fibers, developing flexible models for characterizing the population distributionof fibers between brain regions of interest within and across different individuals. We startby decomposing each fiber in each individual’s brain into a corresponding rotation matrix,shape and translation from a global reference curve. These components can be viewed asdata lying on a product space composed of different Euclidean spaces and manifolds. To non-parametrically model the distribution within and across individuals, we rely on a hierarchicalmixture of product kernels specific to the component spaces. Taking a Bayesian approach toinference, we develop an efficient method for posterior sampling. The approach automaticallyproduces clusters of fibers within and across individuals, and yields interesting new insightsinto variation in fiber curves, while providing a useful starting point for more elaborate modelsrelating fibers to covariates and neuropsychiatric traits.
Keywords:
Brain connectomics; Fiber tracking; Functional data analysis; Mixture model;Neural imaging; Shape analysis. a r X i v : . [ s t a t . A P ] D ec ub 1 Sub 2 Sub 3 Sub 4 R putamen R putamen R parieto occipital sulcus R superior occipital
Figure 1: Examples of fiber curves connecting two different pairs of regions in four subjects.
There has been dramatically increasing interest in recent years in connectomics , which is the studyof functional and structural interconnection networks in the human brain (Jbabdi et al., 2015;Glasser et al., 2016; Park and Friston, 2013; Fornito et al., 2013). This interest has been spurredby the development of new imaging technologies, which allow researchers to non-invasively peerinto the human brain and obtain data on connections. The focus of this article is on structuralconnections , corresponding to fiber bundles that are estimated from diffusion magnetic resonanceimaging (dMRI). dMRI measures the diffusion of water molecules across tissues in the brain; thisdiffusion tends to be directional along white matter tracts acting as highways for neural activity,while being weaker and non-directional in gray matter. By combining data from diffusion MRIand structural MRI, the brain can be segmented into different functional regions, with the fiberbundles connecting the different regions estimated. Focusing on two regions of interest (ROIs) andapplying a recent fiber tracking algorithm (Smith et al., 2012; Girard et al., 2014), Figure 1 showsthe fiber connections for four different individuals. There are large numbers of fibers connectingthese two ROIs, and there are interesting similarities and differences among the subjects in thefiber locations and shapes.Fiber connections in each individual’s brain can be viewed as a type of object data. There aremany exciting possibilities in terms of relating these objects to traits of the individual. For exam-ple, Figure 1 shows evidence of functional data clusters in the fiber connections, with the locationsand numbers of fibers occupying these clusters varying across individuals; perhaps features of theclusters relate to traits of the individual, such as their intelligence or whether they have frequent2igraines or episodes of depression. Our over-arching goal is to develop the statistical and com-putational tools necessary to make such inferences. However, the current literature on statisticalanalysis of fiber tracts reduces the rich functional data to simple summary statistics prior to anal-ysis. In particular, current connectome pipelines output an adjacency matrix consisting of a countof the number of fibers in each pair of brain regions (de Reus and van den Heuvel, 2013; Fornitoet al., 2013), reducing the rich data shown in Figure 1 to a single count for each panel, discardinginformation on fiber shapes, sizes and locations. These adjacency matrices are typically reducedfurther to a binary form (Durante et al., 2014; Durante and Dunson, 2014) or to topological fea-tures of the network (Cheng et al., 2012; Fornito et al., 2013) in order to simplify analyses of brainstructure and its relationship with other factors.Clearly, the data represented in Figure 1 are functional data , and hence it is natural to think ofapplying functional data analysis (FDA) methodology. However, most FDA methods are developedfor much simpler cases in which there is a single function y i : T → (cid:60) for each individual, with
T ⊂ (cid:60) . For example, y i may represent a growth curve with age for individual i . There is also a richliterature on more elaborate FDA models for curve data, allowing multivariate, hierarchical, spatialand temporal dependence structures (Wang et al., 2015). Even in more complex cases, the majorityof the focus has been on one-dimensional curves y i : T → (cid:60) , using a rich variety of representationsranging from spline expansions to functional principal components analysis (FPCA) to Gaussianprocess-based models. After defining a representation for y i , it is often relatively straightforwardto further include structured dependence in the curves (e.g., hierarchical, spatial, temporal, etc).The fiber tracts illustrated in Figure 1 are quite complex in corresponding to many three-dimensional curves snaking through (cid:60) having different intersection points with two non-regularlyshaped brain regions of interest. In addition, there is clear clustering evidence and heterogeneityamong individuals. It is not obvious how to define a model for these data, which is sufficientlyflexible and captures the important characteristics, such as the clusters, without discarding toomuch information or becoming computationally intractable given the number of fibers. There isa rich literature on nonparametric Bayesian models for functional data, which induce clustering(Rodrguez et al., 2009) and can even allow joint modeling of functional predictors with a response(Bigelow and Dunson, 2009), but these methods focus on the case in which a single function y i isobserved for each individual, and hence are not directly relevant.In this article, we propose a novel approach, which relies on characterizing each fiber curvewithin each individual in terms of its rotation, shape and translation from a global reference curve.This allows us to define a nonparametric model for the fiber curve data through a dual representa-tion of the data on a product space. On this product space we define a mixture of product kernelsmotivated by the framework of Bhattacharya and Dunson (2012, 2010b), who showed conditionsunder which Dirichlet process mixtures of product kernels having support on different manifolds3ead to consistent estimation of an unknown joint distribution of data on a product manifold. How-ever, their framework is abstract and they did not consider data consisting of rotation matrices orallow nested dependence, as we obtain due to nesting of the fibers within each individual’s brain.Section 2 describes the basic data structure and representation of fiber curves. Section 3 pro-poses a product mixture model for fiber connections in an individual’s brain, and outlines a Markovchain Monte Carlo (MCMC) algorithm for posterior inference. Section 4 proposes a nested Dirich-let process model for modeling fiber curves for a population of individuals. Section 5 summarizesanalyses of human brain connectomics data, and Section 6 discusses the results and outlines inter-esting next directions. We use a state-of-the-art tractography algorithm (Smith et al., 2012; Girard et al., 2014) to generatethe fiber tracts relying on two steps. First, high angular resolution diffusion imaging (HARDI)techniques are used to estimate the fiber orientation distribution function (ODF) at each location(Descoteaux et al., 2009) (implemented in dipy (Garyfallidis et al., 2014)). Next, streamlinesfollowing the principle directions of the fiber ODF are constructed by probabilistic tractographyalgorithms under local continuity constraints. Anatomical structure information is used to guideselection of where to start and stop the streamlines (Smith et al., 2012; Girard et al., 2014). Thefinal constructed 3D curves are assumed to represent the most likely pathways through the diffusionprofile delineated by the fiber ODF. We refer to these curves as fibers, though they may not exactlycorrespond to anatomical fibers in the brain.Let T j denote the j th subject’s tractography dataset. In general, T j contains millions of fibercurves indicating how different regions of the brain are connected. Let y ji represent a single fibercurve in T j ; the data on y ji output by the tractography algorithm consist of hundreds of pointsalong a curve, but we view y ji : [0 , → (cid:60) as a parameterized curve that can be accuratelyapproximated by spline interpolation of these data points. Figure 2 (a) shows one example of thetractography dataset we generated for an individual’s brain.Directly analyzing all fibers in T j is not realistic for several reasons. The data are huge (mil-lions of fibers in each subject) and current statistical methods are ineffective in handling such bigdata for a sample of subjects. Secondly, the streamline datasets are usually in subject-specificspaces with different coordinate systems, and it is hard to directly compare any two tractographydatasets. Alignment between different subjects is necessary to define a realistic probability modelfor multiple subjects, but there are currently no effective tractography alignment methods.4 a) (b) (c) Figure 2: (a) shows one example of the whole tractography dataset for a brain; (b) shows theDesikan-Killiany parcellation of the cortical region and (c) shows fiber curves connecting a pair ofregions in the Desikan-Killiany parcellation.In this paper, we group each fiber in T j based on the different anatomical regions it connects,and focus our analysis on fibers connecting two specific regions of interest. To achieve this, eachindividual’s brain is first parcellated into different meaningful anatomical regions based on anexisting template, such as the Desikan-Killiany atlas (Desikan et al., 2006). Figure 2 (b) shows theparcellation of the brain using the Desikan-Killiany atlas. Then fiber curves connecting each pairof regions are extracted, as illustrated in Figure 2 (c).Our goal is to build a flexible but parsimonious Bayesian model to characterize the distributionof fiber curves connecting two regions r a and r b within each individual and across a populationof individuals. The extracted fibers { y ji , i = 1 , ..., n j } connecting r a and r b in subject j havesome special properties, e.g. y ji ’s always start from one region and end at another one and they aresmooth and follow similar white matter pathways. These properties make the underlying functionalspace Ω ( r a ,r b ) much smaller comparing with Ω , where Ω is the entire functional space L ([0 , , (cid:60) ) and Ω ( r a ,r b ) is the functional space for fiber curves connecting r a and r b for all subjects in ourdataset. If one fits statistical models with a support of Ω , these models will tend to be inefficientat capturing the data. In order to build an efficient model on the correct space parsimoniously, weconsider a variance decomposition for fibers in Ω ( r a ,r b ) . When we treat a fiber y as a 3D curve, there are five factors contributing to the variance: (1) trans-lation, (2) rotation, (3) scaling, (4) re-parameterization and (5) shape. Translation, rotation, scalingand re-parameterization are shape-preserving transformations (Srivastava et al., 2011). The shape5a) (b) (c) (d)Figure 3: The remaining shape components after removing different shape-preserving transforma-tions. (a) shows the simulated raw 3D curves. (b) shows the shape part after removing translationsand scalings. (c) shows the shape part after removing translations, rotations and scalings. (d) showsshape part after removing translations, rotations, scalings and re-parameterizations.of a fiber represents appearance after removing these shape-preserving transformations. Letting y ∈ L ([0 , , (cid:60) ) be a fiber, a translation of y is represented as y + a , where a ∈ R . The rota-tion of y is represented as O ∗ y , where O ∈ SO (3) is a rotation matrix. Scaling represents thelength of the fiber. Re-parameterization of f is represented as y ( γ ( s )) , s ∈ [0 , , where γ is awarping function in Γ , the set of all orientation-preserving diffeomorphisms of [0 , . Note thatre-parameterization of f does not change the shape, it only changes the point-wise correspondencebetween fibers. In other words, if we let g ( s ) = y ( γ ( s )) , g passes through the same path as y ,but g ( s ) is different from y ( s ) if γ ( s ) (cid:54) = s . The re-parameterization component performs the roleof aligning different fibers (Tucker et al., 2013; Kurtek et al., 2012), and this alignment does notchange the path of each fiber, but reduces the shape component of variability.Figure 3 illustrates the shape components after removing different shape-preserving transfor-mations for simulated fiber curves. As additional shape-preserving components are separatedout, the remaining shape part has decreasing cross-sectional variance at each point s ∈ [0 , . Sincein our particular case, fibers connecting two regions of interest usually have similar length, we donot remove scaling. In addition, the re-parameterization component does not contribute to thegeometric appearance of fibers, i.e. the path of the fibers, and thus we treat it as a confoundingvariable, which is removed in an alignment phase prior to statistical analysis.The main motivation for us to perform this decomposition is that the variance in each com-ponent after decomposition becomes much smaller, allowing us to more effectively model eachcomponent separately while inducing a flexible joint model. This is especially true for the shapepart. After separating the shape-preserving transformations, the remaining shape part of the fibers6re aligned together. This alignment can be done within and between subjects, which means that wealign all fibers from different subjects together. This procedure naturally solves the misalignmentissue for analyzing fibers in a set of subjects. In addition, since we only consider fibers connectingthe same anatomical regions, we find that these fibers have similar shapes, as illustrated in Figure1. Therefore, the variation decomposition process enables us to learn a low dimensional structurefor the shape part of the fibers in each connection for all subjects. The other parts, such as the trans-lation and rotation, can be efficiently represent with very few parameters. This process produces alow dimensional representation for fiber curves in a connection. As a preliminary step before defining a Bayesian model, we extract each component in the variancedecomposition by using the elastic shape analysis framework of Srivastava et al. (2011). Given aset of fiber curves { y , ..., y n } in one connection, to separate the translation, we center each fiber by y ci ( · ) = y i ( · ) − c (1) i , where c i = (cid:82) y i ( s ) | ˙ y i ( s ) | ds/L y i , in which ˙ y i ( s ) = dy i ( s ) /ds and L y is thelength of fiber y . Without special notation, all fibers have been centered from now on. To separatethe rotation and re-parameterization, we represent each fiber y as its square-root velocity function(SRVF) q ( s ) , defined as q ( s ) = ˙ y ( s ) / (cid:112) (cid:107) ˙ y ( s ) (cid:107) . A rotation of y by O ∈ SO (3) is denoted as O ∗ y and its SRVF becomes O ∗ q . A re-parameterization of y by γ ∈ Γ is denoted as y ( γ ( s )) , and itsSRVF is denoted as ( q, γ ) = ( q ◦ γ ) √ ˙ γ , where ◦ denotes the composition of two functions. Themotivation for using SRVF is that it allows us to use a well-known elastic metric, the Fisher-RaoRiemannian metric, to perform elastic shape analysis, i.e. separating the re-parameterization fromthe shape part of the fibers.To align all fibers by separating the rotation and re-parameterization, one needs to estimate atemplate fiber first, denoted as y µ , and then align all fibers to the template. We formulate the cal-culation of y µ and individual alignment as an iterative procedure: first initialize the mean function y µ and its SRVF q µ and then iteratively solve for ( O i , γ i ) = argmin O ∈ SO (3) ,γ ∈ Γ (cid:107) q µ − O ∗ ( q i , γ ) (cid:107) , and q µ = n − n (cid:88) i =1 O i ∗ ( q i , γ i ) (1)for i = 1 , . . . , n until convergence. We optimize O i through Procrustes analysis and γ i throughdynamic programming (Srivastava et al., 2011). As the output of this iterative algorithm, for eachfiber y i , we obtain the best rotation O i , re-parameterization γ i , to the template y µ , and the shapepart g ( s ) = O i ∗ y i ( γ i ( s )) .To efficiently represent the shape part of the fibers in the connection of ( r a , r b ) , we use FPCA tolearn basis functions representing the aligned fiber curves. We learn an FPCA basis { φ l : [0 , → , l = 1 , ..., T } using training data from healthy subjects. FPCA can characterize the variationswithin the given training data set and extract the principal modes of deformations of the fibersrelative to the mean fiber. For each pair of ROIs, these basis functions only need to be learnt onceand can be saved for further use. For the connection ( r a , r b ) , we obtain a low-dimensional structureconsisting of L ( r a ,r b ) = { y µ , { φ l , l = 1 , ..., T }} . Letting g be the shape part of fiber y ∈ Ω ( r a ,r b ) , we can represent g as g ( s ) ≈ y µ ( s ) + T (cid:88) l =1 x l φ l ( s ) , where x l represents the coefficient corresponding to φ l . For notational convenience, we let c (2) =[ x , x , ..., x T ] (cid:48) ∈ (cid:60) T .We decompose fiber curve y ∈ Ω ( r a ,r b ) as y ( s ) := { c (1) , c (2) , O , γ } , where c (1) , c (2) , O and γ are the translation, shape, rotation and reparameterization components, respectively. The originalfiber path can be recovered from these components using ˆ y ( s ) ≈ O T ∗ (cid:0) y µ + T (cid:88) l =1 c (2) ( l ) φ l (cid:1) + c (1) . The difference between the recovered path ˆ y and the original path y depends on the representationprecision of the shape part: with more basis functions, one can more precisely recover the shapepart and thus the original fiber path. We did not include γ in the recovery formula because γ doesnot change the geometric appearance of y but only changes its parameterization. In this paper, wefocus on modeling the geometry of the fiber curves, and therefore, γ is excluded. In this section, we model fiber curves from a single subject. Let { y i } for i = 1 , ..., n be the fibercurves connecting a pair of regions of interest in an individual. After the decomposition, each fiber y i is represented as y i := { c (1) i , c (2) i , O i } . For notation convenience, we denote c (3) i = O i , andwe have y i := { c (1) i , c (2) i , c (3) i } . Each of the c ( m ) i (for m = 1 , .., M ) has a different Euclidean or8anifold support. Letting c ( m ) i ∈ Y m , we have y i ∈ Y = (cid:79) m ∈ I Y m , I = { , ..., M } . Our goal is to specify a joint model in which y i ∼ f , with f a probability measure characterizingthe joint distribution. Let B ( Y ) denote an appropriate σ -algebra of Y , with f assigning probability f ( B ) to each B ∈ B ( Y ) .Initially, we focus on modeling one component of y i , the m th component c ( m ) i . A straightfor-ward strategy is to use a mixture model with f m ( c ) = (cid:90) Θ m K m ( c ; θ ( m ) ) dP ( θ ( m ) ) , c ∈ B ( Y m ) , (2)where K m ( · ; θ ( m ) ) is a parametric probability measure on {Y m , B ( Y m ) } , and P is a probabilitymeasure over { Θ m , B (Θ m ) } . A nonparametric Bayesian approach is realized by choosing P as arandom probability measure and assigning an appropriate prior through P = K (cid:88) h =1 π h δ θ h , θ h ∼ P m , (3)where P m is a base measure on { Θ m , B (Θ m ) } and δ θ h denotes a degenerate distribution with all itsmass at θ h . Equation (3) contains a broad class of species sampling priors, including, for example,the Dirichlet process and Poisson-Dirichlet process. In the Dirichlet process case, K = ∞ and π h is generated through a stick-breaking process (Sethuraman, 1994), with π h = V h (cid:81) l The translation component c (1) is a vector in (cid:60) . We simply use amultivariate normal distribution for c (1) , K ( c (1) ; µ, Σ ) = 1 (cid:112) (2 π ) | Σ | exp (cid:110) − 12 ( c (1) − µ ) T Σ − ( c (1) − µ ) (cid:111) . FPCA Coefficient Component: Let c (2) ∈ (cid:60) T denote the shape component corresponding tothe coefficients of the FPCA basis functions. Similar to the translation component, we assign amultivariate normal distribution for c (2) , K ( c (2) ; µ, Σ ) = 1 (cid:112) (2 π ) T | Σ | exp (cid:110) − 12 ( c (2) − µ ) T Σ − ( c (2) − µ ) (cid:111) . The Rotation Component : The rotation matrix c (3) is an element of the special orthogonalgroup SO (3) = { X ∈ O (3) | det( X ) = 1 } . The most common parametric distribution on SO (3) is the matrix Fisher distribution, also known as the Langevin distribution (Downs, 1972; Khatriand Mardia, 1977; Jupp and Mardia, 1979). Bingham et al. (2009) and Qiu et al. (2014) proposeda more flexible class of Uniform Axis Random Spin (UARS) distributions, which improves uponthe flexibility of the Langevin. We carefully considered both choices, but faced computational andstability problems in conducting inferences, particularly as the number of fibers increases.To address these problems and take advantage of the similarity of the decomposed rotationmatrices to the identity, we define a simple Gaussian like parametric distribution based on anembedding in the Lie algebra of SO (3) . Let I denote the identity element of SO (3) . The tangent10pace at I , T I ( SO (3)) forms a Lie algebra, which is usually denoted as so (3) . The exponentialmap, exp : so (3) → SO (3) , provides a mapping from the tangent space T I ( SO (3)) to SO (3) .The inverse of the exponential map is called the log map. so (3) is a set of × skew-symmetricmatrices. We use the following notation to denote any matrix A v ∈ so (3) : A v = − v v v − v − v v where v = [ v , v , v ] ∈ (cid:60) . The exponential map for so (3) is given by Rodrigues’ formula, exp( A v ) = I , α = 0 I + sin( α ) α A v + − cos ( α ) α A v , α ∈ (0 , π ) , where α = (cid:113) tr ( A vT A v ) = (cid:107) v (cid:107) ∈ [0 , π ) . The log map for a matrix X ∈ SO (3) is a matrix in so (3) , given by log( X ) = , α = 0 α α ) ( X − X T , ) | α | ∈ (0 , π ) , where α satisfies tr ( X ) = 2 cos( α ) + 1 .Define a mapping Φ to embed an element in so (3) to (cid:60) , Φ : so (3) → (cid:60) , φ ( A v ) = v . Let φ ( X ) = Φ(log( X )) ∈ (cid:60) be the embedded vector for the element X ∈ SO (3) in R . We define atrivariate normal distribution on this embedding space: K ( X ; µ, Σ ) = 1 (cid:112) (2 π ) | Σ | exp (cid:26) − 12 ( φ ( X ) − µ ) Σ − ( φ ( X ) − µ ) T (cid:27) , where v is an element in the embedding space (cid:60) . This embedded Gaussian kernel has substantialpractical advantages over alternative intrinsic parametric kernels we attempted to implement. To complete a Bayesian specification of the model, we choose a prior for the cluster probabilities: π = ( π , . . . , π K ) (cid:48) ∼ Dir (cid:18) αK , . . . , αK (cid:19) , K is an upper bound on the number of clusters. In the limit as K → ∞ , this choice leadsto a Dirichlet process mixture model. In addition, Rousseau and Mengersen (2011) motivated asimilar choice of prior as being effective as favoring deletion of redundant mixture components notneeded to characterize the data. If K is chosen to be too small, then none of the clusters will beunoccupied, and the analysis should be repeated for larger K .Posterior sampling proceeds via the following steps:1. Update the cluster allocation of S i for each fiber curve from the conditional posterior withPr ( S i = h |− ) = π h (cid:81) Mm =1 K m ( c ( m ) i ; θ ( m ) h ) (cid:80) Kl =1 π l (cid:81) Mm =1 K m ( c ( m ) i ; θ ( m ) l ) . 2. Update the weights on each component from the conjugate conditional posterior ( π |− ) ∼ Dir (cid:18) αK + n , ..., αK + n K (cid:19) , where n h is the number of observations with cluster h .3. Update the parameter θ ( m ) h for m = 1 , ..., M and h = 1 , ..., k from ( θ ( m ) h |− ) ∝ P m ( θ ( m ) h ) (cid:89) i : S i = h K m ( c ( m ) i ; θ ( m ) h ) , where P m is a conjugate prior to K m ( c ( m ) i ; θ ( m ) h ) for each component m ; in particular, we useGaussian-Inverse Wishart priors. Section 3 proposes a flexible mixture model for the distribution of fibers connecting a pair ofregions of interest in a single individual’s brain; in this section, we generalize the model to ac-commodate multiple individuals. This generalization is challenging because (1) fibers in eachindividual have their own coordinate system inherited from the diffusion MRI scan; (2) there aredifferent numbers and appearances of fiber curves for different individuals. Although (1) can po-tentially be addressed via image alignment before or during tractography, such alignment is notstraightforward. Our variation decomposition bypasses this issue by building a common coordi-nate system for the fiber shapes. Issue (2) can be solved by using a hierarchical Bayesian model toallow differences between individuals while encouraging borrowing of information.12et { y ji } for j = 1 , .., J and i = 1 , ..., n j be a collection of fiber curves for the same pair ofbrain regions in n subjects, where n j represents the number of fiber curves in the j th subject. Wehave y ji = { c (1) ji , ..., c ( M ) ji } , so that the fibers are represented by their different geometric compo-nents. In addition, let w j denote a scalar summary of the strength of connection between the brainregions for individual j . In the literature, w j is usually set as the number of fibers, n j . The model in Section 3 allows the distribution f of fiber curves within an individual to be un-known. Generalizing to multiple individuals, we have distributions f j , for j = 1 , . . . , J , andrequire a model for an unknown distribution of distributions, f j ∼ Q , with Q unknown. One nat-ural possibility is a hierarchical Dirichlet process (HDP) mixture (Teh et al., 2006), which wouldinduce clusters of fibers, with these clusters having different weights for each individual j . Thistype of model would effectively assume that white matter pathways (each pathway represents acluster) connecting two regions of interest are shared by all individuals, but the proportions offiber curves in each pathway are different. However, we found that this type of model has poorperformance, as our data (illustrated in Figure 1) show that many subjects have completely dif-ferent white matter fiber bundles. This motivates us to instead use the nested Dirichlet process(NDP) (Rodr´ıguez et al., 2008), which clusters subjects based on their fiber curve distribution,with subjects in a cluster having similar clusters of fibers.Our NDP model has the following form: f j ( y ji ) = (cid:90) M (cid:89) m =1 K m ( c ( m ) ji ; θ ( m ) ) dG j ( θ ) , θ = { θ (1) , ..., θ ( M ) } ,G j ( · ) ∼ ∞ (cid:88) h =1 π ∗ h δ G ∗ h ( · ) , G ∗ h ( · ) = ∞ (cid:88) l =1 ω ∗ lh δ θ ∗ lh , (7)where θ ∗ lh = { θ (1) ∗ lh , ..., θ ( M ) ∗ lh } and θ ∗ lh ∼ (cid:81) Mm =1 P m , ω ∗ lh = u ∗ lh (cid:81) l − s =1 (1 − u ∗ sh ) , π ∗ h = v ∗ h (cid:81) h − s =1 (1 − v ∗ s ) , v ∗ h ∼ beta (1 , α ) , and u ∗ lh ∼ beta (1 , β ) . The collection of individual-specific mixing measures { G j } are drawn from an NDP, { G j } ∼ NDP ( α, β, P ) , where P = (cid:81) Mm =1 P m is the base measure.Under this structure, the prior probability that two individuals are assigned to the same brainstructure cluster is / (1 + α ) , while the prior probability of clustering two fibers together within abrain is / (1 + β ) . The model can be used for any combination of the components of variability inthe fiber curves; for example, one can use only the shape component or a combination of differentcomponents to estimate f j . In applying these models to brain connectomics data, we will assesshow clustering performance depends on which components are included.13 .2 Posterior inference Following Rodr´ıguez et al. (2008), we propose a blocked Gibbs sampling algorithm. An approxi-mation of the stick-breaking process is used, with the infinite sums in (7) replaced by finite sumsof K (for G j ) and L (for G ∗ h ) elements. Let ζ j , for j = 1 , ..., J , be the membership indicator ofindividuals and let ξ ji , for i = 1 , ..., n j , be the membership indicator of fiber curves for the j thsubject. Sampling proceeds via the following steps:1. Sample the membership indicator for the j th individual ( j = 1 , ..., J ) from a multinomial: P ( ζ j = h |− ) ∝ π ∗ h n j (cid:89) i =1 L (cid:88) l =1 w ∗ lh M (cid:89) m =1 K m ( c ( m ) ji | θ ( m ) ∗ lh ) . 2. Sample the membership indicator ξ ji , for j = 1 , ..., J and i = 1 , ..., n j , with: P ( ξ ji = l |− ) ∝ w ∗ lζ j M (cid:89) m =1 K ( c ( m ) ji | θ ( m ) ∗ lζ j ) . 3. Sample π ∗ h by first sampling ( u ∗ h |− ) ∼ beta (1 + m h , α + (cid:80) Ks = h +1 m s ) , h = 1 , ..., K − ,and u ∗ K = 1 , where m h is the number of subjects assigned to cluster h , and then let π ∗ h = u ∗ h (cid:81) h − s =1 (1 − u ∗ s ) . 4. Sample w ∗ lh by first sampling ( v ∗ lh |− ) ∼ beta (1 + n lh , β + (cid:80) Ls = l +1 n sh ) , l = 1 , . . . , L − , h = 1 , . . . , K and v ∗ LK = 1 , where n lh is the number of observations assigned to atom l ofdistribution h , and then w ∗ lh = v ∗ lh (cid:81) l − s =1 (1 − v ∗ sh ) . 5. Sample the parameters θ ( m ) ∗ lh for l = 1 , ..., L , h = 1 , ..., K and m = 1 , ..., M from P ( θ ( m ) ∗ lh |− ) ∝ P m ( θ ( m ) ∗ lh ) (cid:89) { i,j | ζ j = h,ξ ij = l } K m ( c ( m ) ji | θ ( m ) ∗ lh ) , where P m ( · ) is the conjugate prior for parameters in K m ( ·| θ ( m ) ) . If no observation is as-signed to the lh th cluster, we draw θ ( m ) ∗ lh from the prior P m .6. Sample the concentration parameters α and β : We choose conjugate priors: α ∼ gamma ( a α , b α ) and β ∼ gamma ( a β , b β ) . The posterior samples for α and β are constructed as P ( α |− ) ∼ gamma (cid:18) a α + ( K − , b α − K − (cid:88) h =1 log(1 − µ ∗ h ) (cid:19) , ( β − ) ∼ gamma (cid:18) a β + K ( L − , b β − L − (cid:88) l =1 K (cid:88) h =1 log(1 − v ∗ lh ) (cid:19) . We will evaluate the performance of this Gibbs sampler through application to human brain con-nectome data. Model (7) does not incorporate information on the strength of connection w j between the twoROIs within individual j . However, it is straightforward to generalize the model to include thisadditional information by letting G j ∼ ∞ (cid:88) h =1 π ∗ h δ { G ∗ h ( · ) ,ψ h } , where now the h th component of G j includes not only the mixing measure G ∗ h characterizing thedistribution of fiber curves in that component but also parameters ψ h within a kernel K w ( · ; ψ h ) forthe measure of connection strength. The resulting joint model characterizes flexible dependencein the connection strength and fiber curves through shared dependence on the individual’s clusterallocation. For continuous measures of connection strength w j , we can simply use a Gaussiankernel. However, we will focus on w j equal to the number of connections between the regions ofinterest, so that K w ( w ; ψ ) is a parametric distribution with support on the non-negative integers.To induce this kernel, we apply the approach of Canale and Dunson (2011) and simply ‘round’ aGaussian kernel with unknown mean and variance, with negative values mapped to 0, values in(0,1) mapped to 1, values in (1,2) mapped to 2 and so on. Posterior sampling can proceed via aslight modification of the sampler of Section 4.2, with details provided in a Supplement. We consider two data sets: a Test-Retest Dataset and the Human Connectome Project Dataset. Test-Retest Dataset : Contains 3 scans for each subject taken at one month intervals. A total of15 acquisitions, from 5 healthy participants, were utilized for our analysis. In each scan, a dMRIimage and an anatomical T1-weighted image were acquired on a 1.5 Tesla SIEMENS Magnetom.The dMRI image has a 2 mm isotropic resolution and was acquired along uniformly distributeddirections. The T1 image has a 1 mm isotropic resolution. Diffusion data was up-sampled (using atrilinear interpolation) to the same resolution as T1 image before performing tractography. The T1image was parcellated using Freesurfer (with Desikan-Killiany atlas) and registered to the diffusion15omain. Quality control by manual inspection was used to verify the parcellation and registration. Human Connectome Project (HCP) Dataset : The 2016 HCP data contain about subjects,and we focus on the subjects having both diffusion data and an anatomical T1-weighted image.The dMRI images in HCP have isotropic voxel size of . mm , and diffusion weighted scans.HCP has processed the diffusion image and T1 image such that they have the same resolution andlie in the same space (aligned). Desikan-Killiany parcellation for each T1 image is also provided.See et. al (2012) for more details about the HPC data.The tractography dataset for each subject is generated using the probabilistic method of Girardet al. (2014) with the recommended optimal parameters. Each streamline in the constructed trac-tography has a step size of 0.2 mm . About 1 million fiber curves for each subject are generated.Under the Desikan-Killiany atlas, the brain cortical bands are segmented into anatomical re-gions (34 regions per hemisphere). The fiber curves connecting any pair of regions are extracted.Before applying our method to each connection, outlying fiber curves that do not follow the majorwhite matter pathways (false positives caused by the fiber tracking algorithm) are removed using asimilar method proposed by Cˆot`u et al. (2015). For any two regions ( r a , r b ) , to learn a low-dimensional structure L ( r a ,r b ) representing the shapecomponent, we randomly selected subjects from HCP as the training data and learnt a set ofbasis functions using FPCA. We focus on two connections: (1) between right paracentral lobule ( r pl ) and left postcentral gyrus ( l pg ); (2) between right paracentral lobule ( r pl ) and left posteriorcingulate cortex ( l pcc ). Figure 4 illustrates these connections in a subject of the test-retest dataset.Using the method introduced in Section 2.3, we estimated three components c (1) , c (2) and c (3) for all fibers. For c (2) ∈ (cid:60) T , we set T = 3 , so we use three coefficients (on three major FPCAbasis functions) to represent a fiber. Using a larger T will increase representation precision, butwe found T = 3 is sufficient. In Figure 5, we plot the estimated components for fiber curves inthe two connections (shown in Figure 4). For the connection ( r pl , l pg ), the fiber curves startfrom the right paracentral lobule, group into a bundle, traverse the corpus callosum, and then splitinto two bundles to connect the left postcentral gyrus region. The split makes the fiber curve havetwo distinct shapes, and therefore, we expect that the shape component should be able to tell thissplit. For the connection ( r pl , l pcc ), there are a few distinct pathways, differing in both shapeand location. Therefore, the shape and translation components should contain the most geometricinformation about this connection. The rotation components in both cases center around the originand it is unclear how much information they have. In Figure 5 (d), we plot the recovered fibercurves using c (1) , c (2) and c (3) . The color along the curves indicates the discrepancy (with a unit of16 R L R R L L R L R L_postcentral R_paracentral R_paracentral L_post_cingulate (a) (b) Figure 4: Example of two connections we used in this paper: (a) between r pl and l pg and (b)between r pl and l pcc . mm ) between the original fiber and the recovered fiber. We can see that with only parameters, wecan accurately recover most fibers. The biggest discrepancy generally focuses on the starting andending points. The main reason is that the starting and ending points are either in the gray matteror in the interface of gray matter and white matter. Diffusion is close to isotropy (Descoteaux et al.,2009) in these regions, which makes accurate fiber reconstruction intrinsically difficult. We applied the model in Section 3 for the fiber curves in each connection. We are interested inanswering two questions: (a) among the components (shape, translation and rotation), which onecontains the most geometric information about a connection; (b) can the proposed model efficientlycapture the geometric information?We first used the nonparametric mixture model defined in (2) to explore the geometric in-formation inside each component separately. The multivariate data were centered to the ori-gin and rescaled such that each coordinate has unit variation. The prior specification and pos-terior sampling procedure are described in Section 3.3. We assigned a normal-inverse-WishartNIW ( µ ( m )0 , λ ( m )0 , Φ ( m )0 , v ( m )0 ) for P m ( θ ( m ) ) ( θ ( m ) = { µ ( m ) , Σ ( m ) } ), where µ ( m )0 = [0 , , T , λ ( m )0 =1 , Φ ( m )0 = I , and v ( m )0 = 5 for m = 1 , , , implying that E ( µ ( m ) | Σ ( m ) ) = [0 , , T and E ( Σ ( m ) ) = I . The inference is based on , samples from the MCMC sampler after a burnin of , samples. It takes about 4 minutes to draw , samples using our MATLAB imple-mentation with a 2.5 GHz Intel Core i7 CPU. The results are robust to small to moderate changesin prior specification, and there is no evidence of lack of convergence.17 -1-0.500.51-0.5000.20.4-0.20.60.5 -20020-40-20020-4040200-2040 (a) (b) (c) (d)Figure 5: Illustration of decomposed components and the recovered fiber curves. (a), (b) and (c)show the shape, translation and rotation components, respectively. (d) shows the recovered fibercurves using these components. The color indicates the difference (in a unit of mm ) between therecovered and original fiber curve.Each component contains different geometric information about the connection and such dif-ferences are reflected in the clustering result inferred based on the posterior samples. Figure 6summarizes the clustering results for connection ( r pl , l pg ). The first row shows the result for theshape component, the second row shows the result for the translation component and the last rowshows the result for the rotation component. Column (a) shows posterior samples of number ofclusters and (b) shows the pairwise probability heat map according to the posterior samples. Tomake sense of the heat map, we reordered the fiber curves such that fibers with similar shapes areclose to each other.There are several approaches to obtain a final clustering configuration from the pairwise prob-ability matrix (Rodr´ıguez et al., 2008; Zhang et al., 2015). Following Zhang et al. (2015), we usethe mode of the posterior distribution on number of clusters as the final cluster number k . The finalcluster configuration is estimated by mapping the pairwise probability matrix into a membershipmatrix minimizing the discrepancy to the pairwise probability matrix and having k clusters. Figure6 (c) and (d) show the clustering results on { c ( m ) i } and the original fibers using this method. It isclear that shape plays the most important role. With only the shape component, we can distinguishthe two fiber bundles inside the connection. The translation and rotation components contain somegeometric information (based on the final clustering results), but much less than shape.In another experiment, we applied the mixture product kernel model in (4) to fuse all threecomponents together. Figure 7 shows the results based on , samples with a burn-in of , .From the heat map of the pairwise probability matrix, we can see that the joint mixture model18 20 40 60 80102030405060708090 20 40 60 80102030405060708090 20 40 60 80102030405060708090 (a) Posterior dist. of k (b) Adjacency matrix (c) Clustering of { c ( m ) i } (d) Clustering of fibersFigure 6: Posterior summary for the connection ( r pl , l pg ). The three rows are results for theshape, translation and rotation component, respectively.prefers two clusters, and the final clustering result is similar to only using the shape component.Similar procedures were applied to the other connection ( r pl , l pcc ) and the result is shown inthe supplemental material. To quantitatively evaluate the modeling results, we manually clusteredthe fiber curves in each connection to assign “ground truth” labels. Fibers in ( r pl , l pg ) wereclustered into two classes and fibers in ( r pl , l pcc ) were clustered into five classes (see the sup-plemental material for the clustering criteria and final results). The Rand index (Rand, 1971) andadjusted Rand index (Hubert and Arabie, 1985) are used to measure the accuracy of clustering.For any two partitions C and C of { , ..., n } , the Rand index calculates the ratio of agreementbetween C and C of { , ..., n } . Three quantities denoted as a, b and c are calculated: a represents 20 40 60 80102030405060708090 (a) Posterior dist. of k (b) Adjacency matrix (c) Clustering on raw fibersFigure 7: Joint model result for the connection ( r pl , l pg ).19able 1: Quantitative evaluation of clustering result for connections ( r pl , l pg ) and ( r pl , l pcc )( r pl , l pg ) ( r pl , l pcc )Shape Trans. Rot. All Shape Trans. Rot. AllRI 0.8961 0.6090 0.7254 0.9789 0.8626 0.8284 0.6767 0.8762ARI 0.7923 0.2153 0.4565 0.9579 0.7088 0.6119 0.3788 0.7384the number of pairs of objects that are placed in the same cluster in C and the same cluster in C , b is the pairs that are in different clusters in both partitions, and c is the total number of pairs (cid:0) n (cid:1) . The Rand index (RI) is RI = ( a + b ) /c . The adjusted Rand index is corrected for chance.The Rand index can take values in [0 , and higher values indicate better agreement. The adjustedRand index (ARI) also has a maximum value of , but can yield negative values if the index issmaller than the expected index. Table 1 shows the quantitative result. Again, one can confirmthat the shape component contains most of the information. However, combining all componentstogether gives us better clustering results. Next, we study the connections in a set of individuals using the test-retest dataset. Figure 8 showsthe fiber curves of the connection ( r pl , l pg ). These fibers come from three subjects in three differ-ent scans. The number in the bottom left bracket is the number of fibers in each connection. In theroutine brain network analysis literature, each connection is reduced to either a binary number “0”or “1” (to indicate whether two regions are connected) or a scalar number, e.g. the count of fibers(to indicate the strength of this connection). For ( r pl , l pg ), if we reduce each connection intoa binary number, there is no heterogeneity among different subjects. The rich information aboutthe connection is totally discarded. Although one can use the count to incorporate more infor-mation about the connection, it is well known that the count of fibers can be easily contaminatedby many confounding variables in the tractography algorithm. From Figure 8, we can observethat for the same subject and the same connection, different scans give us different counts. Thevariation within subject for different scans is not smaller than variation between subjects. How-ever, the structural connectome in healthy human brains is not expected to change rapidly across ashort period. The variation of the count measure is mainly caused by the noise introduced by thetractography processing pipeline.An important question is whether the count can be replaced by shape information to obtain amore robust and reproducible summary of each connection; this would have significant practicalramifications in the routine analysis of brain connectome data. By substituting in shape features,we can potentially improve the ability to detect differences in brain connection structure across20 ubject 1 Subject 2 Subject 3 Scan 1: Scan 2: Scan 3: (48) (82) (166) (43) (108) (92) (92) (153) (87) Figure 8: Fiber curves connecting r pl and l pg in 9 scans of 3 subjects in the test-retest dataset.individuals, possibly related to traits of the individual. To assess this, we apply our NDP modelto cluster individual brain connectome scans in an unsupervised manner, which does not includesubject ids in the analysis. The results in the previous section suggest that the rotation componentdoes not contain much information, and hence we merge it into the shape part and decompose eachfiber curve into two components: translation c (1) ∈ (cid:60) and shape c (2) ∈ (cid:60) .In our first experiment, we used the fiber curves shown in Figure 8 to demonstrate our al-gorithm. All connections were demeaned to coarsely align them between different subjects andscans. In addition, each component was demeaned globally and rescaled to have unit variation.These pre-processing steps simplify the prior specification. Similar to the case of modeling thefiber curves in an individual, we set P = (cid:81) Mm =1 P m , where P m ∼ NIW ([0 , , T , , I , , and α, β ∼ gamma (3 , , a priori. The prior on α and β implies that E ( α ) = 1 and E ( β ) = 1 , whichis a common choice in the literature. The results that follow are based on MCMC sampleswith a burn-in of . We set K = 9 and L = 15 , where K and L are upper bounds on the numberof clusters of subjects and curves within subject clusters, respectively.Pairwise probability heat maps in different scenarios are shown in Figure 9, showing clusteringresults based on (a) only shape, (b) only translation, and (c) both shape and translation. The scans were ordered by concatenating columns of Figure 8 (scans of the same subject are next toeach other). From (a) we observe that, if we only use the shape part, the posterior clustering resultfavors five clusters: 3 scans of subject 1 are clustered together; scan 2 and scan 3 of subject 2 areclustered together and scan 1 is a separate cluster; scan 1 and scan 2 of subject 3 are clusteredtogether and are different from scan 3. This result can be easily verified visually in Figure 8, 3scans of subject 1 are different from scans of subject 2 and 3 (in terms of orientation, note that21 (a) Shape (b) Trans. (c) Shape & Trans.Figure 9: Pairwise probabilities of clustering for scans of subjects in Figure 8.Table 2: Clustering of subjects using fiber curves connecting ( r pl , l pg ).Shape Trans. Shape & Trans. CountRI 0.8889 0.7222 0.7222 0.6389ARI 0.6522 0.3130 0.3130 -0.1818these fibers are viewed from the same angle); scan 2 and 3 of subject 2 are different from scan1; scan 1 and 2 of subject 3 are more similar comparing with scan 3. To obtain a final clusteringconfiguration, similar to previous experiments, we used the method in Zhang et al. (2015) to mapthe pairwise probability matrix to a membership matrix. We compared the final NDP clusteringresult with the ground truth subject ids. Table 2 shows the Rand index and adjusted Rand index. Wecan see that the shape part has the best clustering performance. Combining shape and translationdoes not improve clustering.As a comparison, we clustered subjects according to their fiber counts by the rounded kernelmixture model of Canale and Dunson (2011), using their recommended priors, collecting , posterior draws, and discarding the first , . Figure 10 shows the result, with (a) the estimatedfiber count pmf for the scans and (b) pairwise probabilities that two elements are clustered to-gether. The estimated pmf illustrates the enormous heterogeneity in the counts, with five peaks inthe distribution. From the pairwise probability matrix, scans for the same subject are not reliablyclustered together. The Rand index and adjusted Rand index of the final clustering configurationare reported in Table 2. These results illustrate that fiber counts have very high variability andcannot reliably distinguish between subjects.We conducted a more comprehensive analysis using all subjects and their scans from thetest-retest dataset. Connections between the left and right hemisphere having more than fibercurves were filtered out, leading to connections. We compared clustering results using geomet-ric information or only fiber count. Table 3 shows results for connections, with the remainingresults in the supplement. The ROIs are indexed by numbers and their names are provided in thesupplement. These results provide additional evidence that shape provides the most useful sum-22 stimated pmf Empirical pmf (a) (b)Figure 10: (a) shows the posterior estimation of the pmf for fiber counts; (b) shows the heat mapof the adjacency matrix.Table 3: Comparison of clustering results of using geometric information and count.RI/ARI (2,61) (3,61) (7,43) (7,58) (7,62) (9,58) (9,62) (11,47)Shape 0.91/0.72 1.0/1.0 0.87/0.51 0.90/0.50 0.90/0.47 0.91/0.72 0.91/0.72 0.84/0.51Trans. 0.90/0.64 0.74/0.31 0.65/0.18 0.71/0.18 0.81/0.30 0.66/0.30 0.60/0.16 0.52/0.15Shape &Trans 0.70/0.3 0.74/0.4 0.74/0.23 0.54/0.12 0.82/0.28 0.66/0.30 0.62/0.13 0.70/0.30Count 0.64/0.23 0.58/0.19 0.49/0.16 0.63/0.21 0.49/0.16 0.14/0 0.45/0.01 0.58/0.19RI/ARI (13,55) (13,47) (16,50) (16,55) (16,56) (16,57) (16,61) (22,50)Shape 0.82/0.35 1.0/1.0 0.91/0.72 0.86/0.51 0.83/0.53 0.91/0.72 0.90/0.64 0.74/0.4Trans. 0.61/0.14 0.83/0.53 0.49/0.16 0.61/-0.04 0.75/0.37 0.49/0.16 0.78/0.25 0.58/0.19Shape &Trans 0.70/0.21 0.74/0.40 0.74/0.40 0.47/0.11 0.52/0.15 0.74/0.40 0.58/0.19 0.66/0.30Count 0.62/0.21 0.58/0.19 0.50/0.04 0.49/0.16 0.14/0 0.58/0.19 0.60/0.18 0.58/0.19mary of a connection: (1) shape can be reproduced robustly, (2) it is much more informative thanother features (e.g., the widely used count); (3) using the whole fiber curves (shape & translation)is not a good idea due to registration issues and the relatively limited information in the translationcomponent. These results suggest that future analyses of brain connectomes should ideally replacebinary or count measures of connection strength with geometric features. We have presented a novel framework to non-parametrically model the geometric information offiber curves connecting any two brain regions. Geometry is decomposed into three components:shape, rotation, and translation. Our decomposition not only encourages a low dimensional repre-sentation of the shape component but also naturally solves the misalignment issue across multiplebrain scans. Relying on a flexible hierarchical mixture model, we cluster fibers within and across23ndividuals according to different geometric information. These clustering results provide newinsights about how to better utilize the tractography dataset for brain connectome analysis. Theshape component is the most discriminative feature to distinguish different subjects and can bereliably reproduced in repeated scans.As a first step toward incorporating geometric information in brain structural connectome anal-ysis, our results suggest many interesting future directions. One thread is to more intensively in-vestigate the reproducibility of the tractography dataset from a geometric object perspective. Mostprevious analyses focus on analyzing arbitrarily thresholded binary networks or count weightednetworks. As we have illustrated, these features discard shape information and are highly sensitiveto errors in tractography processing pipelines. Fiber shapes appear to be significantly more robustand informative. A comprehensive study of the reproducibility of all brain connections using theirgeometric information can let us know which fiber bundles can be reliably reproduced. We can as-sign reliability scores to every connection according to their reproducibility and give more weightsto the connections with high reproducibility scores in future network analysis. This step will befundamental in improving the reproducibility of findings in structural brain network analysis.Another important future direction motivated by our results is to assess the extent to whichfiber tract shapes between ROIs relate to covariates and traits of the individual. For example, neu-rodegenerative diseases may alter some white matter pathways and thus change the distributionsof certain connections, or shapes of connections may vary systematically in relationship to cogni-tive abilities. By calculating a low-dimensional set of shape features for each connection, one canobtains a ROI × ROI × feature tensor for each individual at each time point; it remains to developappropriate statistical methods for analyzing a population distribution of such tensor-structuredrandom variables in relation to predictors and other factors. The authors acknowledge financial support from the Statistical and Applied Mathematical SciencesInstitute and the Army Research Institute. We thank Kevin Whittingstall, Michael Bernier, MaximeChamberland, Gabriel Girard and Jean-Christophe Houde for acquiring the test-retest database(supported by the CHU Sherbrooke and the NeuroInformatics Research Chair jointly funded bythe Medical and Science faculties).We also thank the Human Connectome Project, WU-Minn Con-sortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) fundedby the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research;and by the McDonnell Center for Systems Neuroscience at Washington University.24 eferences Banerjee, A., Murray, J., and Dunson, D. B. (2013). Bayesian learning of joint distributions ofobjects. In AISTATS , volume 31 of JMLR Workshop and Conference Proceedings , pages 1–9.Bhattacharya, A. and Dunson, D. (2012). Nonparametric Bayes classification and hypothesis test-ing on Manifolds. Journal of Multivariate Analysis , 111:1–19.Bhattacharya, A. and Dunson, D. B. (2010a). Nonparametric Bayes regression and classificationthrough mixtures of product kernels. Biometrika , 97(4):851–865.Bhattacharya, A. and Dunson, D. B. (2010b). Nonparametric Bayesian density estimation onmanifolds with applications to planar shapes. Biometrika , 97(4):851–865.Bigelow, J. L. and Dunson, D. B. (2009). Bayesian semiparametric joint models for functionalpredictors. Journal of the American Statistical Association , 104(485):26–36.Bingham, M. A., Nordman, D. J., and Vardeman, S. B. (2009). Modeling and inference for mea-sured crystal orientations and a tractable class of symmetric distributions for rotations in threedimensions. Journal of the American Statistical Association , 104(488):1385–1397.Canale, A. and Dunson, D. B. (2011). Bayesian Kernel Mixtures for Counts. Journal of theAmerican Statistical Association , 106(496):1528–1539.Cheng, H., Wang, Y., Sheng, J., Kronenberger, W. G., Mathews, V. P., Hummer, T. A., and Saykin,A. J. (2012). Characteristics and variability of structural networks derived from diffusion tensorimaging. Neuroimage , 61(4):1153–1164.Cˆot`u, M.-A., Garyfallidis, E., Larochelle, H., and Descoteaux, M. (2015). Cleaning up the mess:tractography outlier removal using hierarchical quickbundles clustering. In ISMRM , Interna-tional Society of Magnetic Resonance in Medicine.de Reus, M. A. and van den Heuvel, M. P. (2013). The parcellation-based connectome: limitationsand extensions. Neuroimage , 80:397–404.Descoteaux, M., Deriche, R., Knosche, T. R., and Anwander, A. (2009). Deterministic and proba-bilistic tractography based on complex fibre orientation distributions. IEEE Trans Med Imaging ,28(2):269–286.Desikan, R. S., Sgonne, F., Fischl, B., Quinn, B. T., Dickerson, B. C., Blacker, D., Buckner,R. L., Dale, A. M., Maguire, R. P., Hyman, B. T., Albert, M. S., and Killiany, R. J. (2006). An25utomated labeling system for subdividing the human cerebral cortex on MRI scans into gyralbased regions of interest. NeuroImage , 31(3):968 – 980.Downs, T. D. (1972). Orientation statistics. Biometrika , 59(3):665–676.Durante, D. and Dunson, D. B. (2014). Bayesian inference and testing of group differences inbrain networks. ArXiv e-prints .Durante, D., Dunson, D. B., and Vogelstein, J. T. (2014). Nonparametric bayes modeling ofpopulations of networks. ArXiv e-prints .et. al, D. V. E. (2012). The human connectome project: A data acquisition perspective. NeuroIm-age , 62(4):2222 – 2231.Fornito, A., Zalesky, A., and Breakspear, M. (2013). Graph analysis of the human connectome:promise, progress, and pitfalls. Neuroimage , 80:426–444.Garyfallidis, E., Brett, M., Amirbekian, B., Rokem, A., van der Walt, S., Descoteaux, M., andNimmo-Smith, I. (2014). Dipy, a library for the analysis of diffusion MRI data. Front Neuroin-form , 8:8.Girard, G., Whittingstall, K., Deriche, R., and Descoteaux, M. (2014). Towards quantitative con-nectivity analysis: reducing tractography biases. NeuroImage , 98:266 – 278.Glasser, M. F., Coalson, T. S., Robinson, E. C., Hacker, C. D., Harwell, J., Yacoub, E., Ugurbil,K., Andersson, J., Beckmann, C. F., Jenkinson, M., Smith, S. M., and Van Essen, D. C. (2016).A multi-modal parcellation of human cerebral cortex. Nature , 536(7615):171–178.Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of Classification , 2(1):193–218.Jbabdi, S., Sotiropoulos, S. N., Haber, S. N., Van Essen, D. C., and Behrens, T. E. (2015). Mea-suring macroscopic brain connections in vivo. Nature Neuroscience , 18(11):1546–1555.Jupp, P. E. and Mardia, K. V. (1979). Maximum likelihood estimators for the matrix Von Mises-Fisher and Bingham distributions. The Annals of Statistics , 7(3):599–606.Khatri, C. G. and Mardia, K. V. (1977). The Von Mises-Fisher matrix distribution in orientationstatistics. Journal of the Royal Statistical Society. Series B , 39(1):95–106.Kurtek, S., Srivastava, A., Klassen, E., and Ding, Z. (2012). Statistical modeling of curves usingshapes and related features. J. Am. Statist. Ass. , 107(499):1152–1165.26ark, H. J. and Friston, K. (2013). Structural and functional brain networks: from connections tocognition. Science , 342(6158):1238411.Qiu, Y., Nordman, D. J., and Vardeman, S. B. (2014). A wrapped trivariate normal distribution andBayes inference for 3D rotations. Statistica Sinica , 24(2):897–917.Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of theAmerican Statistical Association , 66(336):846–850.Rodr´ıguez, A., Dunson, D. B., and Gelfand, A. E. (2008). The nested Dirichlet process. Journalof the American Statistical Association , 103(483):1131–1154.Rodrguez, A., Dunson, D. B., and Gelfand, A. E. (2009). Bayesian nonparametric functional dataanalysis through density estimation. Biometrika , 96(1):149–162.Rousseau, J. and Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution inoverfitted mixture models. Journal of the Royal Statistical Society: Series B , 73(5):689–710.Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Stat. Sinica , 4(2):639–650.Smith, R. E., Tournier, J. D., Calamante, F., and Connelly, A. (2012). Anatomically-constrainedtractography: improved diffusion MRI streamlines tractography through effective use ofanatomical information. Neuroimage , 62(3):1924–1938.Srivastava, A., Klassen, E., Joshi, S., and Jermyn, I. (2011). Shape analysis of elastic curves inEuclidean spaces. IEEE Trans. Pattern Anal. Mach. Intell. , 33(7):1415–1428.Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association , 101(476):1566–1581.Tucker, J. D., Wu, W., and Srivastava, A. (2013). Generative models for functional data usingphase and amplitude separation. Comput. Stat. Data Anal. , 61:50–66.Wang, J.-L., Chiou, J.-M., and Mueller, H.-G. (2015). Review of functional data analysis. ArXive-prints .Zhang, Z., Pati, D., and Srivastava, A. (2015). Bayesian clustering of shapes of curves.