[PDF] Optimized Diffusion Imaging for Brain Structural Connectome Analysis

Abstract

High angular resolution diffusion imaging (HARDI), a type of diffusion magnetic resonance imaging (dMRI) that measures diffusion signals on a sphere in q-space, is widely used in data acquisition for human brain structural connectome analysis. Accurate estimation of the local diffusion, and thus the structural connectome, typically requires dense sampling in HARDI, resulting in long acquisition times and logistical challenges. We propose a method to select an optimal set of q-space directions for recovery of the local diffusion under a sparsity constraint on the sampling budget. Relevant historical dMRI data is leveraged to estimate a prior distribution of the local diffusion in a template space using reduced rank Gaussian process models. For a new subject to be scanned, the priors are mapped into the subject-specific coordinate and used to guide an optimized q-space sampling which minimizes the expected integrated squared error of a diffusion function estimator from sparse samples. The optimized sampling locations are inferred by an efficient greedy algorithm with theoretical bounds approximating the global optimum. Simulation studies and a real data application using the Human Connectome Project data demonstrate that our proposed method provides substantial advantages over its competitors.

Full PDF

OOPTIMIZED DIFFUSION IMAGING FOR BRAIN STRUCTURALCONNECTOME ANALYSIS B Y W ILLIAM C ONSAGRA * A RUN V ENKATARAMAN * AND Z HENGWU Z HANG † University of Rochester * , University of North Carolina at Chapel Hill † High angular resolution diffusion imaging (HARDI), a type of diffusionmagnetic resonance imaging (dMRI) that measures diffusion signals on asphere in q-space, is widely used in data acquisition for human brain struc-tural connectome analysis. Accurate estimation of the local diffusion, andthus the structural connectome, typically requires dense sampling in HARDI,resulting in long acquisition times and logistical challenges. We propose amethod to select an optimal set of q-space directions for recovery of the localdiffusion under a sparsity constraint on the sampling budget. Relevant histor-ical dMRI data is leveraged to estimate a prior distribution of the local diffu-sion in a template space using reduced rank Gaussian process models. For anew subject to be scanned, the priors are mapped into the subject-speciﬁc co-ordinate and used to guide an optimized q-space sampling which minimizesthe expected integrated squared error of a diffusion function estimator fromsparse samples. The optimized sampling locations are inferred by an efﬁcientgreedy algorithm with theoretical bounds approximating the global optimum.Simulation studies and a real data application using the Human ConnectomeProject data demonstrate that our proposed method provides substantial ad-vantages over its competitors.

1. INTRODUCTION.

Diffusion Magnetic Resonance Imaging (dMRI) is an advancedMRI methodology that measures the probabilistic motion of water molecules in order toprobe tissue microstructure (Stejskal and Tanner, 1965). The earliest applications of dMRIfocus on studying the anisotropic diffusion of water in the white matter (WM) using thediffusion tensor imaging (DTI) model (Basser et al., 1994). However, DTI has signiﬁcantlimitations, since it can only describe one major WM ﬁber direction within each voxel (Tuchet al., 2002), presenting a signiﬁcant obstacle for efforts to trace WM pathways in crossingﬁber regions (Maier-Hein et al., 2017). To overcome this limitation, advanced acquisitionmethods, such as diffusion spectrum imaging (DSI) (Tuch et al., 2003) and high angular res-olution diffusion imaging (HARDI) (Alexander et al., 2002), have been proposed to bettercharacterize the underlying WM structure by estimating an object known as the diffusionorientation distribution function (ODF), denoted here as f . The ODF, or its sharper version,the ﬁber ODF (fODF) obtained by deconvolving the ODF (Descoteaux et al., 2009), is theprimary object of interest for structural connectome mapping. To estimate the structural con-nectome, tractography algorithms (Tournier et al., 2012; Girard et al., 2014) are applied to theestimated fODF ﬁeld to compute streamlines connecting different brain regions. Therefore,accurate estimation of the ODF is paramount for performing quality structural connectomeanalysis.To estimate the ODF, DSI employs the Fourier relationship between the diffusion ensembleaverage propagator (EAP), a probability density function characterizing the displacement ofwater molecule diffusion, and the measured diffusion signal. The ODF can then be calculatedfrom the EAP through radial projection. In HARDI, instead of relying on ﬁrst computing Keywords and phrases: diffusion imaging, Gaussian process, experimental design, functional data, structuralconnectome. a r X i v : . [ s t a t . A P ] F e b CONSAGRA,VENKATARAMAN AND ZHANG the EAP, the ODF can be directly estimated from the diffusion signal based on the Funk-Radon transform (FRT) (Tuch, 2004). This fact coupled with several other weaknesses ofDSI, including the requirement of time-consuming gradient sampling on a three-dimensionalCartesian lattice and large pulsed ﬁeld gradients to satisfy the Nyquist condition, have con-tributed to HARDI becoming the more popular dMRI acquisition framework for structuralconnectome analysis.Commonly, HARDI data are acquired using uniformly distributed diffusion gradient vec-tors densely sampled on spheres, known as shells, which inhabit q-space. The distributionof these gradient vectors has been a subject of study, and frequently an electrostatic repul-sion model is used to generate their locations (Jones et al., 1999). As shown in this paper,such gradient setting can be sub-optimal with respect to recovering the local diffusion sig-nal. Moreover, in practice, scans are often resource limited and, consequently, parameters arealtered for fast data acquisition, e.g., only or less directions are sampled in some studies(Chen et al., 2013; Chang et al., 2015). Therefore, there are two important questions to con-sider in HARDI: if we are restricted to a limited number of gradient directions to sample,where in q-space should we sample, and given the sparse samples, how do we recover theODF for connectome analysis? In this paper we address these questions by (a) optimizing thediffusion sampling scheme, and (b) estimating diffusion signal from sparse samples.The methods proposed in this paper leverage the assumption that high-quality dMRI dataare available for the population of interest, e.g. healthy subjects in a certain age range. Forsimplicity, we only consider diffusion signal on a single shell. The diffusion signal at the v -thvoxel can be considered as a realization of a random function X v ∈ L ( S ), i.e., a measurablefunction with realizations in the space of square integrable functions over the 2-sphere. Inparticular, we assume the distribution of X v can be modeled as a Gaussian process. Note thatGaussian process models have previously been used to model diffusion signal for applicationsin image acceleration and motion correction (Andersson and Sotiropoulos, 2015). Employingan empirical Bayesian approach, the historical dMRI data is used to estimate the mean andcovariance functions deﬁning X v , characterizing the prior distribution of the diffusion signalacross the brain. This prior is used to derive an estimator for a new subject’s diffusion signalthat is robust even in the sparsely sampled case. Additionally, we propose a greedy diffusiondirection selection algorithm that sequentially minimizes the voxel-speciﬁc mean integratedsquared error to direct dMRI data acquisition. In both simulated and real data analyses, theproposed method shows signiﬁcant advantages over the current dMRI acquisition and signalestimation procedures. In particular, we demonstrate that our method can produce a muchbetter structural connectome reconstruction with sparse diffusion samples.The incorporation of prior information for the design of q-space sampling sequences hasbeen investigated in the literature. Caruyer and Deriche (2009) propose an optimal samplingscheme in the context of DTI based on minimization of expected squared error of the recon-structed diffusion tensor. Historical data are used to estimate the probability density functionof the diffusion tensor ﬁeld required to calculate the desired expectation. Poot et al. (2010)use diffusion kurtosis imaging (DKI) to form a parametric approximation of the diffusionsignal and incorporate a Rician likelihood in order to derive the variance of a parameter ofinterest as a function of sampling gradient directions. Prior information is incorporated viathe DKI parameters estimated from historical subjects, which are then used to approximatethe expectation of this quantity by Monte Carlo integration.Both of the aforementioned methods for q-space optimal design are derived in the contextof ﬁnite dimensional parametric models for diffusion signal estimation, DTI and DKI, re-spectively, which can be problematic for connectome analysis, owing to their assumptions ofthe underlying diffusion process. The proposed modeling framework considers the diffusionsignal as a random function with realizations in L ( S ) and therefore is capable of represent-ing a far richer class of local diffusion patterns than DTI or DKI. Accordingly, the optimal q-space acquisition problem for the reconstruction of the diffusion signal at a particular voxelis best formulated as a problem of optimal experimental design for the sampling functionaldata. The application of optimal experimental design techniques to functional data in exist-ing literature is mostly concerned with functions on R (Ji and Müller, 2017; Wu et al., 2018),though Gao et al. (2018) consider the related problem of landmark selection on arbitrarymanifolds. Our approach makes repeated use of the special properties of L ( S ) , especiallyto incorporate relevant prior information, thus distinguishing our problem from these existingones. We summarize the main contributions of this work as follows:1. The diffusion signal is modeled as a random function which allows for more ﬂexibil-ity in the types of signals that can be accommodated than do parametric models such asDTI and DKI. This is particularly important for modeling complex diffusion signals inﬁber crossing regions, and faster transformation of the diffusion signal to the ODF. Toour knowledge, this is the ﬁrst work for optimal design in dMRI which both incorpo-rates historical data for constructing priors and does not induce a ﬁxed ﬁnite dimensionalparametric model for the diffusion signal.2. In our methodology, we propose to use historical data (e.g., data from large brain imag-ing studies such as the Human Connectome Project (Glasser et al., 2016)) to construct aspatially varying prior distribution of diffusion signals over all voxels in a template space.These template space priors can easily be ported to the coordinate space of a new sub-ject of interest to facilitate real-time optimal sampling design and sparse sample signalreconstruction, amongst other dMRI tasks of interest.3. We propose a computationally efﬁcient approximate optimal design algorithm that avoidscostly high-dimensional optimization or stochastic techniques such as simulated anneal-ing. Further, we are able to derive computable bounds on the deviation between our ap-proximate solution and the true optimal design.The rest of the paper is organized as follows. In Section 2, we establish relevant back-ground information from Q-ball imaging, the theory of random functions and Gaussian pro-cesses. In Section 3 we present the proposed methodology for addressing optimal sample lo-cation selection and diffusion signal reconstruction from sparse samples. Section 4 presentssimulation results that compare the performance of our method with one commonly used inthe dMRI literature. In Section 5, we show real data examples for several tasks related todMRI processing and structural connectome analyses. Discussion and future directions arepresented in Section 6.

2. BACKGROUND.

This section introduces several topics that will be helpful for un-derstanding our methodology development in Section 3, including Q-ball imaging, the theoryof random functions, and Gaussian processes.2.1.

Q-ball Imaging for ODF Estimation.

Q-ball imaging (Tuch et al., 2003; Tuch, 2004)is a method to reconstruct the ODF describing the underlying WM ﬁber distribution. Tuch(2004) shows that the ODF can be computed from the diffusion signal on a single sphere inq-space by the Funk-Radon transform (FRT): f ( p ) = G ( X )( p ) = (cid:90) r ∈ S δ ( r (cid:124) p ) X ( r ) dr where p ∈ S , f is the ODF, G represents the FRT, X represents the true diffusion function on S , and δ is the Dirac delta function. Q-ball imaging provides an advancement over previousmodels for structural connectome mapping due to the fact that: 1) diffusion signals need onlyto be measured on a single sphere in q-space, and thus far fewer acquisitions are required CONSAGRA,VENKATARAMAN AND ZHANG dMRI Diffusion Signal 𝑋 FRT ODF Deconvolution fODF … Fig 1: Illustration of the process of constructing fODF from dMRI measures. Discrete dMRImeasures along different b-vectors are used to estimate a smoothed diffusion signal X , whichis then transformed into the ODF function and then fODF for structural connectome analysis.compared with DSI; 2) it is model-independent and does not require the Gaussian assumptionpresent in many models, which often is violated in practice (Tuch et al., 2002); 3) the ODFgives a much higher resolution representation of the ﬁber proﬁle than the diffusion tensor(DTI); 4) a convenient spherical deconvolution can be applied to the ODF to get the sharperfODF that better reﬂects the ﬁber conﬁguration in a voxel (Descoteaux et al., 2009); and5) there are analytical forms to translate the diffusion signal X to the ODF f and further tofODF, under certain basis system (Descoteaux et al., 2007). Figure 1 illustrates the process ofconstructing a smoothed diffusion signal X , ODF and fODF from discrete dMRI measures.There are two popular methods to estimate a continuous signal on S from discrete dMRIobservations: 1) the kernel smoothing method presented in Tuch (2004), and 2) the penalizedregression method presented in Descoteaux et al. (2007). However, both of these techniquesuse data from only one voxel in one subject and do not allow for the incorporation of valuableprior information gleaned from readily available high-resolution data. If the diffusion resolu-tion (number of b-vectors) is low, such estimation will be problematic, which motivates ourdevelopment of alternative procedures.2.2. Random Functions and Gaussian Processes.

Let D be a compact subset of R d , in this application D = S . Consider the function space L ( D ) = { X : D → R : (cid:82) D X ( p ) dλ ( p ) < ∞} , the collection of square integrable functions with respect to mea-sure λ over D . This is a Hilbert space with inner product deﬁned by (cid:104) X, Y (cid:105) L ( D ) = (cid:82) D X ( p ) Y ( p ) dλ ( p ) for X, Y ∈ L ( D ) . With slight abuse of notation, let X be a randomfunction deﬁned on probability space (Ω , F , P ) with realizations in L ( D ) , i.e., X is ameasurable function from Ω → L ( D ) . Deﬁne E P [ X ] = µ and covariance function C = E P [( X − µ )( X − µ )] and assume that (cid:82) D (cid:82) D C ( p, t ) dλ ( p ) dλ ( t ) < ∞ . By Mercer’s theorem,the covariance function is guaranteed the eigen-decomposition C ( p, t ) = ∞ (cid:80) k =1 ρ k ψ k ( p ) ψ k ( t ) ,where { ψ k } ∞ k =1 forms an orthonormal sequence of eigenfunctions in L ( D ) and { ρ k } ∞ k =1 is anon-increasing sequence of real, non-negative eigenvalues. The spectral data of C is deﬁnedby the inﬁnite dimensional integral equation(1) (cid:90) D C ( p, t ) ψ k ( t ) dλ ( t ) = ρ k ψ k ( p ) for k = 1 , , ..., ∞ , with orthogonality constraint (cid:82) D ψ k ( p ) ψ j ( p ) = δ kj , where δ kj is the Kronecker delta function(Hsing and Eubank, 2013). The eigenfunctions can also be used to form a useful decompo-sition of the random function X . By the Karhunen-Loéve theorem (Karhunen, 1946), withprobability one we have(2) X ( p ) = µ ( p ) + ∞ (cid:88) k =1 ξ k ψ k ( p ) , where ξ k = (cid:104) X − µ, ψ k (cid:105) L ( D ) , which are mean zero random variables with E P [ ξ k ξ j ] = ρ k δ kj .The convergence of the sum in Equation (2) holds uniformly since sup p ∈D E P [( X ( p ) − µ ( p ) − K (cid:80) k =1 ξ k ψ k ( p )) ] → as K → ∞ . The expansion in Equation (2) also facilitatesparsimonious representation as the ﬁrst K eigenfunctions form the best rank- K basissystem for representing X , in terms of minimum expected mean integrated squared er-ror (MISE). That is, if { e k } ∞ k =1 is a complete orthogonal basis system for L ( D ) , then E P [ (cid:107) X − µ − K (cid:80) k =1 (cid:104) X, e k (cid:105) L ( D ) e k (cid:107) L ( D ) ] is minimized by taking e k = ψ k for k = 1 , , ..., K .Functional data typically comes in the form of a sample { X , ..., X N } of independent real-izations of X , where the i th realization is observed at a discrete set of points p i , ..., p iM i ∈ D .In this work, each X i ( p im ) is assumed to be contaminated with independent additive error (cid:15) im with mean 0 and variance σ i , i.e., the observed signal is modeled as(3) S i ( p im ) = X i ( p im ) + (cid:15) im for i = 1 , ..., N, and m = 1 , ..., M i . The technique of functional principal component analysis (FPCA) refers to methods for esti-mating the eigenfunctions and eigenvalues from the observed data { S i ( p im ) } . FPCA is oftendone in an exploratory context, where the ﬁrst few eigenfunctions are used to analyze thedominant modes of variation of a function sample. In this work, we use the FPCA in orderto construct an optimal parsimonious basis for subsequent estimation of functional sampleswhen only discrete and sparse data points are observed.An important and widely used class of random function is the so-called Gaussian process,abbreviated GP from here on. A GP on D is deﬁned as a stochastic process such that for anyﬁnite collection of points p , p , ..., p M ∈ D the joint distribution of the process at these loca-tions is a multivariate Gaussian with mean ( µ ( p ) , µ ( p ) , ..., µ ( p M )) (cid:124) ∈ R M and covariancematrix C M ∈ R M × M such that [ C M ] ij = C ( p i , p j ) , and is denoted as GP( µ, C ) . It is easy toshow that a random function X with the decomposition (2) is a GP if and only if ξ k are nor-mally distributed, i.e., ξ k ∼ N (0 , ρ k ) . We will see there are situations where it is convenientor necessary to use GP models that are degenerate, i.e., whose covariance functions haveonly ﬁnitely many non-zero eigenvalues. In this case, a valid GP may still be constructedby assuming a Gaussian distribution on ξ k for k = 1 , , ..., K . Such a formulation has beentermed a reduced rank GP , and Quiñonero-Candela and Rasmussen (2005) offer an extensivetreatment of its properties.

3. METHODS.

The observed diffusion signal, normalized with respect to the mean nondiffusion-weighted signal (diffusion signal at b-value equal to 0), at voxel v ∈ V i for subject i is denoted as { S vi ( p i ) , · · · , S vi ( p iM i ) } , where p im ∈ S represents one of the M i diffusiongradient directions (i.e., b-vectors in dMRI), and V i represents the set of voxels in a region ofinterest (e.g., voxels in the white matter of subject i ). For notational simplicity, from here onwe will denote S vi ( p im ) as s vim . We assume that all of the M i directions correspond to thesame shell, e.g., with b-value equal to 1000. We also assume the statistical model (3) at eachvoxel: s vim = X vi ( p im ) + (cid:15) vim , where X vi ∈ L ( S ) is the true diffusion signal for subject CONSAGRA,VENKATARAMAN AND ZHANG • N historical subjects • Estimate voxel specific diffusion signal function onsphere Align N subjects into a template space Estimate template space random local diffusion field ... ... • Pre scanning of b=0 data • Map template space prior to subject space • Calculate diffusion weighted acquisition directions by Algorithm 1 and acquire the dataSparse sample diffusion signal estimation

FRT, deconvolution and tractography (a) Historical Data (b) Prior on Template(c) Protocol design for new subject (d) Connectome Analysis

GP(µ, C)

Fig 2: Flowchart of the proposed methodology: (a) estimation of historical diffusion signal,(b) construction of template space prior, (c) mapping template space prior to subject spacefor diffusion direction selection and (d) downstream dMRI analysis tasks, e.g. connectomeanalysis. i at voxel v and is a realization of a voxel speciﬁc Gaussian process and (cid:15) vim ∼ N (0 , σ i ) ,which is independent of the diffusion direction p im and brain region v .We further assume that we have access to a relevant collection of high-resolution histor-ical dMRI data, denoted as H = { ( p im , s vim ) : for m = 1 , ..., M i and i = 1 , ..., N and v =1 , ..., |V i |} , where N represents the number of subjects. Such data are ubiquitous and easy toaccess due to recent efforts from the HCP and the UK Biobank. We would like to leverage H to build priors over the diffusion signals and then use them to guide both dMRI acquisitionand signal reconstruction for new subjects. This proves challenging in that each subject’s datain H is gathered in a different coordinate system, referred to here as the subject space, andmoreover, the subjects have different brain morphology. We circumvent this issue by per-forming a registration of each of the historical subjects to a common space, referred to hereas the template space. A GP model for the distribution of diffusion signal at each voxel inthe template space is estimated from the aligned data. The warping function used to performregistration is constrained to be a diffeomorphism. As a result, for any subject of interestthat has been registered to the template space, the inverse warping can be used to map theestimated priors from the template space to the subject space smoothly to guide our analy-sis. This pipeline is illustrated in Figure 2 and described in detail over the remainder of thesection.3.1. Prior Summarization From Historical Data.

Diffusion Signal Estimation in Subject Space.

Given the fact that data in H areacquired with high-resolution, we can conﬁdently estimate smoothed diffusion signal func-tions at each voxel independently. Following the procedure in Descoteaux et al. (2007), we model the diffusion signal as a linear combination of elements from an orthonormal basis sys-tem for symmetric functions in L ( S ) constructed from the spherical harmonics, denoted as { φ j } Jj =1 . The coefﬁcients of the expansion are estimated by regularized least squares regres-sion with a second order roughness penalty. That is, for voxel v in subject i , its diffusionsignal can be estimated by solving the following regularized least squares problem ˆ c vi = argmin c ∈ R J M i (cid:88) m =1 ( s vim − c (cid:124) φ ( p im )) + λ i (cid:90) S (∆ S ( c (cid:124) φ ( p ))) dp, where φ ( p ) = ( φ ( p ) , ..., φ J ( p )) (cid:124) , ∆ S denotes the Laplace-Beltrami operator on S andpenalty λ i > is selected by minimizing the generalized cross validation (GCV) criteria.The resulting smoothed estimate of diffusion signal is given by ˆ X vi ( p ) = ˆ c (cid:124) vi φ ( p ) . Moredetails on this estimation procedure can be found in the supplementary materials.3.1.2. Mean and Covariance in Template Space.

Let γ i be the diffeomorphic warpingfunction from the historical subject i to the template space based on the subject’s diffusiondata (e.g., fractional anisotropy or FA image) (Avants et al., 2008). The warping functiondeforms the regular voxel grid in subject space to align the subject’s data to the template,and a re-gridding procedure is needed in the template space to reconstruct a regular voxelgrid for the subject. We can obtain an estimate of the subject’s diffusion function at each ofthe voxels in the template space through interpolation of ˆ X vi . That is, for any voxel ˜ v in thetemplate space, we have ˆ X ˜ vi ( p ) = ˆ c (cid:124) ˜ vi φ ( p ) where ˆ c ˜ vi is computed by a linear interpolationscheme applied to { ˆ c vi } for v ’s in the subject space mapped near ˜ v by γ i . We use the popularAdvanced Normalization Tools (ANTs) to perform this alignment and refer the reader toAvants et al. (2009) for more details on the interpolation procedure.Once the aligned diffusion signal ˆ X ˜ vi ( p ) has been computed for all subject in H , the meanand covariance functions at ˜ v can be estimated by the following empirical estimators(4) ˆ µ ˜ v ( p ) = ˆ¯ c (cid:124) ˜ v φ ( p ) , (5) ˆ C ˜ v ( p, t ) = φ ( p ) (cid:124) (cid:98) Σ ˜ v φ ( t ) where ˆ¯ c ˜ v and (cid:98) Σ ˜ v are the sample mean and covariance matrix of { ˆ c ˜ v , ..., ˆ c ˜ vN } , respectively.We assume a large historical sample, N (cid:29) J and therefore we expect these estimators tobe stable. Then for each ˜ v , the distribution of the diffusion signal on S is modeled by thereduced rank GP( ˆ¯ c (cid:124) ˜ v φ , φ (cid:124) (cid:98) Σ ˜ v φ ).3.1.3. Mean and Covariance in Subject Space.

Let γ be the template space warpingfunction for a subject of interest, who may or may not be in H . We want to map the GP priorsfrom the template to this subject space for future diffusion data analysis, such as constructingan optimal diffusion sampling scheme and subsequent diffusion signal function estimation.To do this, we map the template space mean and covariance functions to the subject space anduse them to deﬁne a GP adapted to the subject space coordinate system. The mean functionsare identiﬁed as a vector in R J , and therefore the same linear interpolation scheme we useto warp the estimated historical diffusion functions from Section 3.1.2 can be used for esti-mating ˆ µ v in the subject space. On the other hand, the covariance functions are identiﬁed bysymmetric positive deﬁnite (SPD) matrices (cid:98) Σ ˜ v and thus the estimate of the covariance func-tion ˆ C v for subject space voxel v requires an interpolation of SPD matrices. Considering thenon-Euclidean structure of the SPD matrix manifold, we propose the following interpolationscheme. CONSAGRA,VENKATARAMAN AND ZHANG

Denote S J + the manifold of J × J SPD matrices and let M , ..., M n be a sample of SPDmatrices. The weighted Karcher mean of the sample is deﬁned as(6) M Kmean = argmin M ∈S J + n (cid:88) i =1 w i d ( M i , M ) , with n (cid:88) i =1 w i = 1 , where d ( · , · ) is induced by a selected Riemannian metric. If we let the weights w i be propor-tional to the Euclidean distance between γ − (˜ v i ) and the voxel of interest v in the subjectspace, the weighted Karcher mean can be used as an interpolation scheme of the covariancefunctions.There are several valid metrics that can be chosen to compute the distance between ele-ments in S J + (Arsigny et al., 2007). It is easy to show (cid:82) S × S ( ˆ C ˜ v ( p, t ) − ˆ C ˜ v ( p, t )) dpdt = (cid:107) (cid:98) Σ ˜ v − (cid:98) Σ ˜ v (cid:107) F , so using the standard Euclidean (Frobenius) norm as a metric for the inter-polation appears as a reasonable choice. However, interpolation of SPD matrices under theEuclidean norm can result in a “swelling” effect, in which the determinant of the interpolatedvalue is larger than the determinant of any of the original SPD matrices (Dryden et al., 2009).This would result in an undesirable ampliﬁcation of the estimated variability. To overcomethis limitation, we use the log-Euclidean metric proposed in Arsigny et al. (2007) to performthe interpolation (6), deﬁned as (cid:107) M − M (cid:107) Log = (cid:107) log ( M ) − log ( M ) (cid:107) F , where log ( · ) represents the matrix logarithm. For any M ∈ S J + , the matrix logarithm alwaysexists and is deﬁned by log ( M ) = V log ( D ) V (cid:124) , where V is the matrix whose columns arethe eigenvectors of M and D is the diagonal matrix of eigenvalues. Since D is diagonal, thematrix logarithm has an especially simple form: log ( D ) = Diag ( log ( D ) , log ( D ) , ..., log ( D JJ )) .The subject space covariance function for voxel v is deﬁned by the solution to the inter-polation (6) applied to the set of (cid:98) Σ ˜ v , for ˜ v mapped near v by γ − , with d ( M , M ) = (cid:107) M − M (cid:107) Log .3.2.

Diffusion Signal Estimation from Sparse Samples.

Our main goal is to develop anoptimized gradient table to sample the diffusion signal when only a few diffusion directionscan be reliably measured. The optimization will depend on the choice of estimator for thediffusion signal function, which must be adapted to the situation of sparsely sampled data.We propose our estimator in this section and develop the optimized sampling scheme in thesection that follows. Note, since the estimation is performed at the voxel level, in what followswe suppress the subscript v for notational simplicity.The true diffusion function at a voxel is assumed to be a realization of the voxel speciﬁcrandom function X ∼ GP( µ , C ). We consider the estimation of the diffusion signal functionfor sparsely sampled dMRI, i.e., { ( p m , s m ) } Mm =1 with a small M . To facilitate future con-nectome extraction, it is important to obtain a continuous representation of the estimated dif-fusion function with respect to basis functions { φ , ..., φ J } , since with such a representationthere is a closed form transformation from diffusion signal function to ODF for ﬁber trackingvia the FRT (Descoteaux et al., 2007). However, direct estimation the coefﬁcient of φ j withsparse and irregular observations can be problematic, as illustrated in James et al. (2000),where a reduced rank solution was promoted. As outlined in Section 2, the optimal rank K basis system for representing X in terms of minimum expected integrated squared error isgiven by the ﬁrst K eigenfunctions of the covariance function C , denoted as { ψ , ..., ψ K } .Due to this optimality property we adopt the { ψ , ..., ψ K } for sparse sample modeling. If weimpose the restriction that ψ k lies in the linear span of { φ , ..., φ J } , then there exists a simplelinear transformation between the ψ k ’s and φ j ’s, and thus we can still leverage the analytic FRT for sparse sample ODF estimation. Furthermore, this restriction translates the inﬁnitedimensional integral Equation (1) to the following ﬁnite dimensional eigenvalue problem: ˆ Σ ˆ b k = ˆ ρ k ˆ b k , for k = 1 , , ..., J, with orthonormality restriction ˆ b (cid:124) i ˆ b j = (cid:40) i = j i (cid:54) = j , i.e. FPCA applied to the subject space covariance function C . The estimate of the k -th eigen-function is given by ˆ ψ k ( p ) = ˆ b (cid:124) k φ ( p ) , with the corresponding eigenvalue ˆ ρ k . The rank of themodel K ( K ≤ J ) can be chosen based on a threshold of the proportion of variance explained(PVE) or by using an information criteria.Given the sparse sample { ( p m , s m ) } and historical data H , we would like to obtain an es-timate of X in the form X ( p ) = µ ( p ) + (cid:80) Kk =1 ξ k ψ k ( p ) . An estimate of ξ k can be constructedby computing the conditional distribution of P ( ξ K | s M , P M , H ) , where ξ K = ( ξ , ..., ξ K ) (cid:124) , s M = ( s , ..., s M ) (cid:124) and P M = ( p , ..., p M ) (cid:124) . We suppress ( P M , H ) in the following con-ditioning notations since all of them are conditional on ( P M , H ) . Given the model (3) andGaussian assumptions, we can easily derive the following joint distribution on ( ξ K , s M )( ξ K , s M ) (cid:124) ∼ N (cid:18) ( K , µ M ) (cid:124) , (cid:20) Λ K Λ K Ψ (cid:124) M,K Ψ M,K Λ K Ψ M,K Λ K Ψ (cid:124) M,K + σ I M (cid:21)(cid:19) , where µ M = ( µ ( p ) , ..., µ ( p M )) (cid:124) , Λ K = Diag ( ρ , ..., ρ K ) , and Ψ M,K ∈ R M × K such that [ Ψ M,K ] mk = ψ k ( p m ) for m = 1 , ..., M and k = 1 , ..., K . Therefore, P ( ξ K | s M ) is a normaldistribution (Mardia et al. (1979), Theorem 3.2.4) with mean,(7) E [ ξ K | s M ] = Λ K Ψ (cid:124) M,K [ Ψ M,K Λ K Ψ (cid:124) M,K + σ I M ] − ( s M − µ M ) . Based on E [ ξ K | s M ] , we propose the following estimator of X :(8) ˜ X = E [ X ( p ) | s M ] = µ ( p ) + ψ K ( p ) (cid:124) E [ ξ K | s M ]= µ ( p ) + ψ K ( p ) (cid:124) Λ K Ψ (cid:124) M,K [ Ψ M,K Λ K Ψ (cid:124) M,K + σ I M ] − ( s M − µ M ) where ψ K ( p ) = ( ψ ( p ) , ..., ψ K ( p )) (cid:124) and I M is the M × M identity matrix.Yao et al. (2005) use the estimator (8) for the case of sparsely observed longitudinal dataon R . It is notable that (7) is both the best estimate of the coefﬁcients under the Gaussianassumption and the best linear predictor even when the Gaussian assumption is violated. Intheory, this provides robustness to some deviations from the Gaussian assumption, a propertythat will be veriﬁed in simulations in Section 4. Alternatively, the derivation above can bemotivated through a fully Bayesian framework under the prior that X ∼ GP ( µ, C ) , or equiv-alently ξ K ∼ N (0 , Λ K ) , where it can be shown that the mean of the posterior predictivedistribution for X ( p ) is exactly (8) (see the supplemental material for more details).To obtain ˜ X , the parameters µ , ψ K , Λ K , Ψ M,K and K are estimated from H . Moreover, σ , the measurement error variance for the current subject, also plays an important role inestimating X . It is important to estimate σ from the new subject’s data instead of estimatingit from H , since the scanner used for collecting the subject’s data may be different fromthe scanner used to collect the historical data. We propose a simple estimation based on thesubject’s b = 0 (non-diffusion weighted) data with the assumption that we have access to n ( > ) b = 0 images for the subject of interest, e.g. collected at some pre-screening. See thesupplemental material for further elaboration and evaluation of this estimator. CONSAGRA,VENKATARAMAN AND ZHANG

Optimal q-Space Sampling.

Given the estimator (8), we formulate the diffusionsampling, the process of obtaining s M in q-space, as an optimization problem: for a ﬁxedtotal budget of M samples, our goal is to select a set of directions P M = ( p , ..., p M ) (cid:124) suchthat the expected integrated squared error of the estimated diffusion signal, conditional on P M , is minimized. Denote the optimal locations as P ∗ M , which will be referred to as theoptimal q-space design. We ﬁrst consider this problem for a single voxel and then extend itto a set of voxels in a region of interest.3.3.1. Optimal Diffusion Sampling for One Voxel.

At a particular voxel, we identify theoptimal q-space design as the solution to the following optimization problem:(9) P ∗ M = argmin P M ∈ M × m =1 S E (cid:18)(cid:90) S (cid:16) X ( p ) − ˜ X ( p ) (cid:17) dp (cid:19) , where ˜ X is the estimator from Equation (8) which relies on P M , and the expectation is withrespect to the distribution of X at this voxel. Expanding the square and swapping the inte-gral and the expectation operators, it is straightforward to show that (9) can be equivalentlyformulated as the following maximization problem:(10) P ∗ M = argmax P M ∈ M × m =1 S (cid:90) S c M ( p ) (cid:124) Γ − M c M ( p ) dp. where, under the rank K assumption on C , we have c M ( p ) = Ψ M,K Λ K ψ K ( p ) and Γ M = Ψ M,K Λ K Ψ (cid:124) M,K + σ I M . Substituting these deﬁnitions into (10) and integrating, we arriveat the following ﬁnite dimensional optimization problem:(11) P ∗ M = argmax P M ∈ M × m =1 S trace ( Λ K Ψ (cid:124) M,K Γ − M Ψ M,K Λ K ) . In practice, it is often sufﬁcient to constrain the candidate design points to be in a large butﬁnite set

A ⊂ S , e.g. a dense equispaced grid on S to measure diffusion-weighted signals.In this scenario, the problem (11) can be written as(12) P ∗ M = argmax P M ∈P M ( A ) g ( P M ) , where P M ( A ) = { ( p , ..., p M ) (cid:124) : p i ∈ A} and g ( P M ) = trace ( Λ K Ψ (cid:124) M,K Γ − M Ψ M,K Λ K ) . The global problem (12) is an integer program and is known to be NP-hard. To circum-vent this computational bottleneck, we propose a sequential search strategy for producing agreedy approximation ˆ P ∗ M = (ˆ p , ..., ˆ p M ) (cid:124) . Assuming that we have selected ˆ p ∗ , ..., ˆ p ∗ m − inthe previous rounds, we propose solving the conditional optimization problem to obtain ˆ p ∗ m ,(13) ˆ p ∗ m = argmax p m ∈A trace ( Λ K Ψ (cid:124) m,K Γ − m Ψ m,K Λ K ) , where Ψ m,K ∈ R m × K is the eigenfunction evaluation matrix over (ˆ p ∗ , ..., ˆ p ∗ m − , p m ) and Γ m = Ψ m,K Λ K Ψ (cid:124) m,K + σ I m .Algorithm 1 presents an efﬁcient procedure for sequentially solving (13) by making use ofa rank one update scheme to compute Γ − m that leverages the block matrix inversion formula to avoid a series of costly matrix inversions, especially when m gets large. Speciﬁcally, it iseasy to see that Γ m can be partitioned as Γ m = (cid:18) Γ m − hh (cid:124) q (cid:19) , where h = Ψ m − ,K Λ K ψ K ( p m ) ∈ R m − and q = ψ K ( p m ) (cid:124) Λ K ψ K ( p m ) + σ ∈ R with ψ K ( p m ) = ( ψ ( p m ) , ..., ψ K ( p m )) (cid:124) , and p m is a candidate point in A for ˆ p ∗ m . As a result, theinverse of Γ m can be computed efﬁciently through(14) Γ − m = (cid:18) Γ − m − ( I m − + a hh (cid:124) Γ − m − ) − a Γ − m − h − a h (cid:124) Γ − m − a (cid:19) , where a = ( q − h (cid:124) Γ − m − h ) − ∈ R , and Γ − m − is precomputed based on the iteration of ﬁnd-ing ˆ p m − .A question that naturally arises is: how good is the approximation ˆ P ∗ M ? In Theorem 1,we provide a bound for the ratio of the performance between the greedy approximation fromAlgorithm 1 and and the global optimum from (12).T HEOREM Let ˆ P ∗ m be the approximate design obtained after m iterations of Algo-rithm 1. Deﬁne λ ∗ ψ = max p ∈A λ max ( ψ K ( p ) ψ K ( p ) (cid:124) ) , where λ max is the function which re-turns the maximum eigenvalue of its argument. Then (15) g ( ˆ P ∗ m ) ≥ (cid:34) − exp (cid:32) − /ρ /ρ K + mσ λ ∗ ψ mM (cid:33)(cid:35) g ( P ∗ M ) where P ∗ M is the solution to (12) . For a proof of Theorem 1 and an interpretation of (15) in terms of the distribution of therandom function X , see the supplemental material. Algorithm 1

Algorithm for greedy approximation to (9) Input : Diagonal matrix of eigenvalues Λ K , eigenfunctions { ψ , ..., ψ k } , possible design points A , measure-ment error variance σ , and total budget M ∈ N .2: Output : Vector of indices deﬁning the design i Initialize : i ← zeros ( M ) , list of indices of a = (1 , , ..., |A| ) for m = 1 : M do v optim ← for n in a do ψ n ← ( ψ ( p n ) , ..., ψ K ( p n )) (cid:124) for p n ∈ A if m = 1 then (cid:46) handle boundary case9: Ψ n ← ψ (cid:124) n ; Γ − n ← (cid:104) Ψ n Λ K Ψ (cid:124) n + σ (cid:105) − else

11: compute Γ − n using formula (14) and let Ψ n ← ( Ψ m − ,K , ψ (cid:124) n ) (cid:124) v n ← trace ( Λ K Ψ (cid:124) n Γ − n Ψ n Λ K ) if v n > v optim then i optim ← n ; v optim ← v n ; ˜Ψ optim ← ˜Ψ n ; Γ − optim ← Γ − n i ( m ) ← i optim and remove i ( m ) from a Ψ m ← ˜Ψ optim ; Γ − m ← Γ − optim CONSAGRA,VENKATARAMAN AND ZHANG

Extension to Multiple Voxels.

The previous formulation considers the optimal de-sign for recovery of the diffusion signal function for a single voxel. We now discuss how toextend our construction to the case when a single q-space design is needed for a collectionof voxels in a region of interest V s . When V s = V , we can create an acquisition scheme thatjointly optimizes over the whole WM, and when V s corresponds to voxels covering certainﬁber bundles, we can construct an optimal sampling scheme to accurately recover these ﬁberbundles. In any case, extending the objective function in (9) to perform joint optimization ∀ v ∈ V s may be accommodated by minimizing the sum over the region (cid:88) v ∈V s w v E ( (cid:90) S ( X v ( p ) − ˜ X v ( p )) dp ) , with (cid:88) v ∈V s w v = 1 . If no a-priori weighting is assumed, we can let w v = |V s | − , and by similar derivations as inSection 3.3.1, the sequential optimization problem in (13) now becomes(16) ˆ p ∗ m = argmax p m ∈A (cid:88) v ∈ V s trace (cid:16) Λ ( v ) K Ψ ( v ) m,K (cid:124) [ Ψ ( v ) m,K Λ ( v ) K Ψ ( v ) m,K (cid:124) + σ I m ] − Ψ ( v ) m,K Λ ( v ) K (cid:17) , where the superscript v indicates the voxel speciﬁc data. Simple augmentation of Algorithm1 can be made to accommodate this situation.

4. SIMULATION STUDY.

We compared the proposed diffusion signal sampling andﬁtting scheme with a commonly used sampling approach called the electrostatic repulsionmodel (ESR) (Jones et al., 1999) and signal ﬁtting by regularized spherical harmonic basisexpansion (Descoteaux et al., 2007). The ESR algorithm constructs a sampling design bymodeling the acquisition directions as points on S and attempting to ﬁnd a conﬁgurationwhich minimizes the electrostatic energy between point pairs. It has been widely used forconstructing acquisition schemes over S , e.g., in the HCP. In this work, we used the ESRimplementation from the dipy library (Garyfallidis et al., 2014). Without loss of generality,we considered diffusion signal in one voxel and assumed the signal has zero mean.Let { φ j } Jj =1 be the modiﬁed spherical harmonic basis. Within the space spanned by { φ j } Jj =1 , we generated eigenfunctions and eigenvalues as follows: eigenvalues were gener-ated by ρ k = exp ( − . k ) to mimic the decay phenomenon of the spectrum; the correspond-ing eigenfunctions were constructed from a random J × J orthogonal matrix B according to ψ k ( x ) = J (cid:80) j =1 B jk φ k ( x ) , for k = 1 , ..., J . We set J = 45 corresponding to a modiﬁed sphericalharmonic order of eight in accordance with the literature pertaining to the dMRI signal ap-proximation (Hess et al., 2006). From these basis systems, we simulated two sets of randomfunctions: the ﬁrst one admits the GP assumption and the second one does not.Under the GP assumption , for each simulated diffusion signal X i , we sampled the co-efﬁcients corresponding to { ψ k } according to ξ ik ∼ N (0 , ρ k ) , k = 1 , , ..., K . Under the non-GP assumption , we sampled ξ ik from a mixture distribution N ( − (cid:112) ρ k / , ρ k /

2) + N ( (cid:112) ρ k / , ρ k / . This distribution has been used in several simulation studies in the FDAliterature (Yao et al., 2005). To mimic the HCP acquisition protocol, we assumed signals X i can be observed at locations on S , and these locations were randomly sampled accordingto ( θ, φ ) ∼ Unif (0 , π ) × Unif (0 , π ) , which were then dispersed on the sphere using the ESRalgorithm. The simulated X i was then evaluated at each of these locations and recordedwith additive measurement error (cid:15) ij ∼ N (0 , σ ) . We considered σ = 0 . and . to simu-late both high and low signal to noise ratios (SNR), respectively. This scheme was used toconstruct an independent training set, i.e., the simulated H , and test set of sizes 100 and 50,respectively, for both the GP and non-GP cases.

20 40 60 802468 M I S E Gaussian High SNR

20 40 60 80123456 M I S E Non-Gaussian High SNR

20 40 60 80

Budget M I S E Gaussian Low SNR

20 40 60 80

Budget M I S E Non Gaussian Low SNR

SHLS-ESRSHLS-GDSCU-ESRCU-GDS

Fig 3: Mean integrated squared error and standard error bars estimated with the test setfor each of the four combinations of observation location selection and coefﬁcient estimation.We are interested in comparing the diffusion signal estimates under different approaches.The proposed greedy approach (Algorithm 1) is denoted as GDS (greedy design selector),and the proposed signal ﬁtting in Equation (8) is denoted as CU (conditional expectationupdate). The penalized regression approach using the modiﬁed spherical harmonics fromSection 3.1.1 is referred to as SHLS (spherical harmonic least squares). For both CU andGDS, the training data was used for prior summarization, as descried in Section 3.1. Forthe SHLS ﬁt, the regularization parameter was chosen via generalized cross validation. Weevaluated the signal ﬁt for each of the combined sampling and ﬁtting methods (SHLS-ESR,SHLS-GDS, CU-ESR, and CU-GDS) over a sequence of increasing budgets by the integratedsquared error (ISE) across the test set. For all methods, a modiﬁed spherical harmonic basisof order eight was used for ﬁtting.Figure 3 displays the test set performance of the four methods, for each combination ofGaussian vs. non Gaussian and high SNR vs. low SNR. Incorporating prior information intothe coefﬁcient estimation results in substantially better estimates, as both the CU-ESR andCU-GDS ﬁts result in lower mean ISE (MISE) than SHLS-ESR and SHLS-GDS methodsfor all budgets and signal distributions considered. Moreover, the proposed method CU-GDSdisplays uniformly better performance than CU-ESR, indicating an additional increase inthe quality of ﬁts when the proposed diffusion signal function estimator is used in tandemwith the proposed optimal design algorithm. The boost in performance is most signiﬁcantin the sparse sample regime, e.g. M ≤ . By comparing the results from SHLS-GDS andSHLS-ESR, we notice that the GDS design can be slightly detrimental to the ﬁt quality if notcoupled with the proposed CU estimator. This is expected since the GDS is derived under CUsignal ﬁtting and therefore in practice it is important to use them in conjunction. Additionally,we found the computational performance of CU-GDS ﬁtting to be comparable with that ofSHLS-ESR, see the supplemental material for more information. CONSAGRA,VENKATARAMAN AND ZHANG

5. REAL DATA EXPERIMENTS.

We evaluated our method using diffusion data fromthe HCP. The full imaging acquisition and minimal diffusion MRI image preprocessing stepsfor HCP data are documented in Glasser et al. (2013). Brieﬂy, all imaging was conductedon the 3T Siemens Connectom scanner (Erlangen, Germany). High-resolution T1w anatom-ical images were acquired with the 3D MPRAGE (magnetization prepared rapid gradientecho) sequence with a slice acceleration factor of 2 using 0.7 mm isotropic resolution. Diffu-sion imaging was performed using a 2D spin-echo EPI (echo planar imaging) sequence withapproximately 90 diffusion directions at three non-zero b-values (1,000, 2,000, and 3,000s/mm each) and 6 b = 0 reference scans at 1.25 mm isotropic resolution. A full diffusionMRI run includes 6 runs of about 9 mins 50 seconds each, representing 3 acquisitions ofdifferent b-vectors, each acquired once with right-to-left (RL) and left-to-right (LR) phaseencoding polarities; these are used to correct for susceptibility-induced distortion.For our study, we randomly selected 240 healthy HCP subjects (age range: 20-35 years) tobe used as training data, i.e. high resolution historical data H . An additional 10 subjects fromthe same demographic were used as independent test data. A template of the diffusion datawas created from the FA images of 20 randomly selected training subjects. Each subject’s FAimage was then warped to the template using the nonlinear registration algorithm in ANTs(Avants et al., 2009). A white matter mask was also applied in the template space. The b =1000 shell of the training data was used to estimate the voxel speciﬁc diffusion functions foreach subject. The estimates of the warping and diffusion functions for each training subjectwere used to construct mean and covariance functions at each voxel in the template spaceaccording to the procedure outlined in Section 3.1. The inverse warping function for eachsubject in the test data was used to map the template space mean and covariance functionsinto their subject space. For a sequence of increasing total budgets, we applied the GDSalgorithm to select diffusion directions from the set of 90 b = 1000 vectors for each testsubject independently. We determined an optimal set of directions to sample 1) for each voxel ,and 2) for an ROI . For the second case, we randomly selected , voxels within the whitematter mask to deﬁne the ROI V s . In analysis not reported here, we observed that the ﬁnalresults were insensitive to resamplings of V s , so long as the sample was sufﬁciently large.For diffusion signal function estimation, the GDS design was paired with the CU estimatorand compared with the ESR design combined with the SHLS estimator.Since in general, SHLS-ESR reconstruction enjoys good performance for dense samples,we ﬁrst evaluated the diffusion signal estimation for both techniques at various sparse budgetsby comparing them to the SHLS-ESR applied to the full 90 directions, denoted as SHLS-ESR90. Figure 4 displays boxplots of the ISE between the sparse sample reconstructions andSHLS-ESR90 over all of the white matter voxels, where CU-GDS-PV represents samplingscenario 1), and CU-GDS-COM represents sampling scenario 2). The CU-GDS frameworkproduced reconstructions which better approximate the signal estimates of SHLS-ESR90than does SHLS-ESR for each of the sparse budgets considered. The performance boostbecomes more pronounced in the sparser cases, e.g., when we only have the budget to samplefewer than 20 directions (b-vectors). The per voxel design (CU-GDS-PV) resulted in loweraverage ISE than the per subject design (CU-GDS-COM), as expected since the latter isequivalent to the former with an additional constraint, but the difference is minimal relativeto the performance of SHLS-ESR.For each total budget considered, fODFs were constructed from the diffusion signal esti-mates by applying ﬁrst the FRT and subsequently the spherical deconvolution algorithm fromDescoteaux et al. (2009). Figure 5 displays the fODFs from a coronal slice of a randomly se-lected test subject. Visual inspection afﬁrms that the fODFs generated from CU-GDS-COMmore accurately model the directional information estimated from the full data. Using only 5directions, CU-GDS-COM estimation already results in fODFs that very closely approximate Budget = 5 Budget = 10 Budget = 20 Budget = 30

Fig 4: Boxplots of the ISE between the sparse sample reconstructions (SHLS-ESR, CU-GDS-COM, and CU-GDS-PV) and SHLS-ESR90, where CU-GDS-COM represents CU ﬁt basedon per subject design, and CU-GDS-PV represents CU ﬁt based on per voxel design.those based on the full set of 90 directions. For instance, the single ﬁber region in the cor-pus callosum (bottom left pink area) as well as the crossing ﬁbers in the centrum semiovale(center blue area) are nearly identical between the budget = 5 and budget = 90 plots. Thisis not the case for SHLS-ESR, which struggles in both areas for sparse budgets. Boxplots ofthe per-voxel ISE between the sparse budget and full data reconstructions for both methodsare displayed in the right column. For each sparse budget considered, the CU-GDS-COMproduces estimates which more faithfully represent the full data reconstructions than doesthe SHLS-ESR.The motivating application for this paper is to develop a sparse diffusion weighted signalsampling framework for brain structural connectome analysis. Therefore, we investigated thequality of the tractogram and structural connectivity resulting from the reconstructed fODFsat various sampling budgets. The tractogram was computed from the fODFs using the parti-cle ﬁltering tractography algorithm (Girard et al., 2014) that is implemented in dipy (Garyfal-lidis et al., 2014). Connectome matrices were reconstructed using the Desikan atlas (Desikanet al., 2006) created by FreeSurfer (Fischl, 2012). Figure 6 displays a visual comparison ofthe tractography results from SHLS-ESR and CU-GDS-COM for a set of increasing budgets,for a randomly selected test subject. We can see that with as few as 5 directions, the proposedmethodology is able to produce meaningful streamlines, and the tractography results visu-ally converge to the tractography using the full set of 90 directions quickly. The SHLS-ESRframework does not produce meaningful streamlines of the whole brain until a budget of 30b-vectors or more. To quantify the structural connectivity matrix, we extracted and comparedseveral graph topological metrics including network density, global efﬁciency, transitivity,and characteristic path length. Detailed deﬁnitions of these metrics as well as their interpre-tation within the context of connectome analysis can be found in (Rubinov and Sporns, 2010).Figure 7 displays the average and errors bars for each of these metrics computed overthe 10 subjects in the test set at different sampling budgets. We can see that the results of theCU-GDS-COM method converge to the full data results faster than the SHLS-ESR method.The SHLS-ESR generally needs or more directions to obtain a connectivity matrix that iscomparable with the one obtained from 90 directions, while the CU-GDS-COM only needsaround .Finally, we studied the ability of the proposed reconstruction method to capture meaning-ful variability across subjects. Due to the incorporation of a prior in the sampling directionselection and signal ﬁtting, it is a valid concern that the CU-GDS could be “shrinking” thesignal estimates too much towards the “average brain,” especially in the sparse sample case.To evaluate this, we leveraged the test and retest data in HCP, i.e., those HCP subjects thathave multiple scans collected at different times. In total, non training-set subjects withtheir test and retest diffusion data ( scans) were involved in our analyses. Figure 8 shows CONSAGRA,VENKATARAMAN AND ZHANG

SHLS-ESR CU-GDS-COM ISEBudget51020 Fig 5: fODFs from a coronal slice of a randomly selected test subject reconstructed by SHLS-ESR (left column) and CU-GDS-COM (center column). The last column displays boxplots ofthe per voxel ISE between the sparse budget and full data reconstructions, for both methods.Each row corresponds to a different sampling budget. SHLS-ESR CU-GDS-COM

Budget Fig 6: WM ﬁber streamlines and connectivity matrices from a randomly selected test subject.Tractography was run on fODFs constructed from diffusion signal function estimates fromSHLS-ESR (left column) and CU-GDS-COM (right column) for a sequence of increasingtotal budgets. CONSAGRA,VENKATARAMAN AND ZHANG

Budget D en s i t y CU-GDS-COMSHLS-ESR 5 10 20 30 60 90

Budget G l oba l E ff i c i en cy CU-GDS-COMSHLS-ESR5 10 20 30 60 90

Budget T r an s i t i v i t y CU-GDS-COMSHLS-ESR 5 10 20 30 60 90

Budget C ha r a c t e r i s t i c P a t h Leng t h CU-GDS-COMSHLS-ESR

Fig 7: Graph topological metrics of brain networks computed based on SHLS-ESR and CU-GDS-COM methods over different budgets. Means and error bars are displayed. (a) M = 5 (b) M = 20 (c) M = 30 (d) M = 90 Fig 8: Each panel shows a pairwise distance matrix with elements being distances betweenconnectivity matrices restored from the test and retest data of HCP subjects using M diffusion directions. The diffusion signal is ﬁtted based on CU-GDS-COM.the pairwise distance matrices between the connectivity matrices restored under differentbudgets. Note that we put data from the same subject next to each other to form a block pat-tern along the diagonal of the pairwise distance matrix. We see that even in sparse samplecases ( M ≤ ), there are larger inter-subject (between subject) differences than intra-subject(within subjects) differences. Furthermore, the pairwise distance patterns for sparse cases arevery similar to the dense case, indicating the proposed method can preserve inter-subjectdifferences in sparse designs.

6. CONCLUSION AND DISCUSSION.

This paper introduces a novel methodologywhich incorporates high-resolution historical dMRI data to optimize diffusion weighted di-rections in dMRI acquisition and sparse sample diffusion signal estimation. Historical datais used to construct a prior distribution of water diffusion at each voxel in a template spacethrough a low-rank GP model. Given a new subject of interest for dMRI acquisition, theGP priors in the template space are mapped to the subject space and then used to deﬁne the subject-speciﬁc acquisition directions that minimize the expected integrated squared error ofthe diffusion signal estimator. A computationally efﬁcient greedy algorithm is proposed toselect the directions and is shown to approximate the performance of the true optimal setof direction within some computable bounds. The mapped GP prior also serves an impor-tant role in deﬁning the diffusion signal estimator from sparse samples. We show that thecombination of the proposed diffusion sampling and signal estimation generally outperformstraditional techniques. The proposed framework is signiﬁcantly better in the sparse samplecase, i.e, when we only have limited budget in the number of diffusion directions to acquire.In applying the proposed method to analyze HCP data, we show the boost in performanceof our method in diffusion signal estimation carries through to fODF reconstruction and ulti-mately to quality tractography estimation. The sparse sample tractography results are shownto both produce similar network characteristics as the densely sampled case, as well as cap-ture meaningful variation between subjects.Given the public availability of big brain imaging data sets with high-resolution dMRI,e.g., the different types of HCP sets, the Adolescent Brain Cognitive Development (ABCD)set, and the UK Biobank set, the requirement of ﬁnding a subset of subject’s data to servein constructing the prior is easy to accomplish. In order to use our methodology for dataacquisition of a new subject, a small number of b = 0 scans must ﬁrst be collected to providethe subject’s coordinate for prior injection and to estimate the measurement error variance σ . In this work, the registration between template and subject space was conducted usingthe FA image, but it is possible to instead use the b = 0 image for this purpose in a realapplication. For estimating the measurement error variance, although the historical samplecould be used to estimate a global value for σ , it is important to do so for each new subjectsince it captures the noise introduced by the scanner, which can vary substantially betweenmachines and scanning sites. Typically, only a few initial b = 0 acquisitions are required fora sufﬁciently good estimation of σ , and the proposed method is robust to some degree ofbias in the estimate (see the supplemental material for further discussion).We offer a couple potential applications of the proposed framework. First, in clinical scans,long scanning times can be infeasible for a variety of reasons. The proposed method canbe used to design optimal gradients to quickly acquire sparse q-space data for structuralconnectome study for important clinical purposes such as surgical planning. Second, there arestudies trying to increase the spatial resolution of diffusion MRI to more accurately detectwhite matter ﬁber tracts at a submillimeter isotropic resolution (Chang et al., 2015). Suchspatial resolution increase is at the cost of reduced q-space resolution (e.g., only around 12diffusion directions were acquired in Chang et al. (2015)) and increased scanning time. Givenour method’s outstanding performance in the sparse sampling case, this is an ideal applicationscenario: with 12 or fewer samples, our method can produce similar tractography results andconnectomes as data with 60 or more samples.This work can be extended in several interesting directions. Right now, we only considerdata on a single shell, so one extension is to augment the statistical model to accommodatefor signal attenuation in the multi-shell situation. This would allow the derivation of a jointoptimization scheme for both gradient direction and magnitude in the dMRI protocol. An-other extension involves multi-voxel analysis and information borrowing from neighborhoodregions. The white matter tracts often group in bundles that span many voxels, inducing cor-relations between the signal in neighboring voxels. Considering such correlation can increasethe efﬁciency of the estimation procedure.

7. SUPPLEMENTARY MATERIALS.

The supplementary materials include furtherelaborations and proofs of the results presented. CONSAGRA,VENKATARAMAN AND ZHANG

REFERENCES

Alexander, D., G. Barker, and S. Arridge (2002). Detection and modeling of non-gaussian apparent diffusioncoefﬁcient proﬁles in human brain data.

Magnetic Resonance in Medicine 48 (2), 331–340.Andersson, J. L. and S. N. Sotiropoulos (2015). Non-parametric representation and prediction of single- andmulti-shell diffusion-weighted MRI data using Gaussian processes.

NeuroImage 122 , 166 – 176.Arsigny, V., P. Fillard, X. Pennec, and N. Ayache (2007). Geometric means in a novel vector space structure onsymmetric positive-deﬁnite matrices.

SIAM Journal on Matrix Analysis and Applications 29 (1), 328–347.Avants, B. B., C. L. Epstein, M. Grossman, and J. C. Gee (2008, Feb). Symmetric diffeomorphic image regis-tration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain.

MedicalImage Analysis 12 (1), 26–41.Avants, B. B., N. Tustison, and G. Song (2009). Advanced normalization tools (ants).

Insight Journal 2 (365),1–35.Basser, P., J. Mattiello, and D. Lebihan (1994). Estimation of the effective self-diffusion tensor from the NMRspin echo.

Journal of Magnetic Resonance, Series B 103 , 247–254.Caruyer, E. and R. Deriche (2009, September). Adaptive design of sampling directions in diffusion tensor MRIand validation on human brain images. In

MICCAI Workshop on Diffusion Modelling and the Fiber Cup ,Londres, United Kingdom.Chamon, L. and A. Ribeiro (2017). Approximate supermodularity bounds for experimental design. In

Advancesin Neural Information Processing Systems , Volume 30, pp. 5403–5412.Chang, H.-C., M. Sundman, L. Petit, S. Guhaniyogi, M.-L. Chu, C. Petty, A. W. Song, and N.-k. Chen (2015).Human brain diffusion tensor imaging at submillimeter isotropic resolution on a 3 Tesla clinical MRI scanner.

Neuroimage 118 , 667–675.Chen, N.-K., A. Guidon, H.-C. Chang, and A. W. Song (2013). A robust multi-shot scan strategy for high-resolution diffusion weighted MRI enabled by multiplexed sensitivity-encoding (MUSE).

Neuroimage 72 ,41–47.Descoteaux, M., E. Angelino, S. Fitzgibbons, and R. Deriche (2007). Regularized, fast, and robust analyticalQ-ball imaging.

Magnetic Resonance in Medicine 58 , 497–510.Descoteaux, M., R. Deriche, T. R. Knösche, and A. Anwander (2009). Deterministic and probabilistic tractogra-phy based on complex ﬁbre orientation distributions.

IEEE Transactions on Medical Imaging 28 , 269–286.Desikan, R. S., F. Ségonne, B. Fischl, B. T. Quinn, B. C. Dickerson, D. Blacker, R. L. Buckner, A. M. Dale, R. P.Maguire, B. T. Hyman, M. S. Albert, and R. J. Killiany (2006). An automated labeling system for subdividingthe human cerebral cortex on MRI scans into gyral based regions of interest.

NeuroImage 31 , 968–980.Dryden, I. L., A. Koloydenko, and D. Zhou (2009, 09). Non-Euclidean statistics for covariance matrices, withapplications to diffusion tensor imaging.

Ann. Appl. Stat. 3 (3), 1102–1123.Fischl, B. (2012). FreeSurfer.

NeuroImage 62 , 774–781.Gao, T., S. Kovalsky, D. Boyer, and I. Daubechies (2018, 02). Gaussian process landmarking on manifolds.

SIAMJournal on Mathematics of Data Science 1 .Garyfallidis, E., M. Brett, B. Amirbekian, A. Rokem, S. Van Der Walt, M. Descoteaux, and I. Nimmo-Smith(2014). Dipy, a library for the analysis of diffusion mri data.

Frontiers in Neuroinformatics 8 , 8.Girard, G., K. Whittingstall, R. Deriche, and M. Descoteaux (2014). Towards quantitative connectivity analysis:Reducing tractography biases.

NeuroImage 98 , 266–278.Glasser, M. F., S. M. Smith, D. S. Marcus, J. L. Andersson, E. J. Auerbach, T. E. Behrens, T. S. Coalson, M. P.Harms, M. Jenkinson, S. Moeller, E. C. Robinson, S. N. Sotiropoulos, J. Xu, E. Yacoub, K. Ugurbil, and D. C.Van Essen (2016). The Human Connectome Project’s neuroimaging approach.

Nature Neuroscience 19 (9),1175–1187.Glasser, M. F., S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl, J. L. Andersson, J. Xu, S. Jbabdi,M. Webster, J. R. Polimeni, D. C. V. Essen, M. Jenkinson, and f. t. W.-M. H. Consortium (2013). The minimalpreprocessing pipelines for the Human Connectome Project.

NeuroImage 80 , 105–124.Golub, G. H., M. Heath, and G. Wahba (1979). Generalized cross-validation as a method for choosing a goodridge parameter.

Technometrics 21 (2), 215–223.Hess, C. P., P. Mukherjee, E. T. Han, D. Xu, and D. B. Vigneron (2006). Q-ball reconstruction of multimodalﬁber orientations using the spherical harmonic basis.

Magnetic Resonance in Medicine 56 , 104–117.Hsing, T. and R. Eubank (2013, 01).

Theoretical Foundations of Functional Data Analysis, with an Introductionto Linear Operators . Wiley.James, G. M., T. J. Hastie, and C. A. Sugar (2000). Principal component models for sparse functional data.

Biometrika 87 (3), 587–602.Ji, H. and H.-G. Müller (2017). Optimal designs for longitudinal and functional data.

Journal of the RoyalStatistical Society: Series B (Statistical Methodology) 79 (3), 859–876. Jones, D., M. Horsﬁeld, and A. Simmons (1999). Optimal strategies for measuring diffusion in anisotropicsystems by magnetic resonance imaging.

Magnetic Resonance in Medicine 42 (3), 515–525.Karhunen, K. (1946). Zur spektraltheorie stochastischer prozesse.

Ann. Acad. Sci. Fennicae, AI 34 .Maier-Hein, K. H., P. F. Neher, J.-C. Houde, M.-A. Côté, E. Garyfallidis, J. Zhong, M. Chamberland, F.-C. Yeh, Y.-C. Lin, Q. Ji, et al. (2017). The challenge of mapping the human connectome based on diffusion tractography.

Nature Communications 8 (1), 1–13.Mardia, K., J. Kent, and J. Bibby (1979).

Multivariate Analysis . Probability and mathematical statistics. London:Acad. Press.Poot, D. H. J., A. J. den Dekker, E. Achten, M. Verhoye, and J. Sijbers (2010). Optimal experimental design fordiffusion kurtosis imaging.

IEEE Transactions on Medical Imaging 29 (3), 819–829.Quiñonero-Candela, J. and C. E. Rasmussen (2005).

Analysis of Some Methods for Reduced Rank GaussianProcess Regression , pp. 98–127. Berlin, Heidelberg: Springer Berlin Heidelberg.Rubinov, M. and O. Sporns (2010). Complex network measures of brain connectivity: Uses and interpretations.

NeuroImage 52 , 1059–1069.Stejskal, E. O. and J. E. Tanner (1965). Spin diffusion measurements: spin echoes in the presence of a timedependent ﬁeld gradient.

The Journal of Chemical Physics 42 , 288–292.Tournier, J.-D., F. Calamante, and A. Connelly (2012). Mrtrix: Diffusion tractography in crossing ﬁber regions.

International Journal of Imaging Systems and Technology 22 (1), 53–66.Tuch, D. S. (2004). Q-ball imaging.

Magnetic Resonance in Medicine 52 , 1358–1372.Tuch, D. S., T. G. Reese, M. R. Wiegell, N. Makris, J. W. Belliveau, and V. J. Wedeen (2002). High angu-lar resolution diffusion imaging reveals intravoxel white matter ﬁber heterogeneity.

Magnetic Resonance inMedicine 48 (4), 577–582.Tuch, D. S., T. G. Reese, M. R. Wiegell, and V. J. Wedeen (2003). Diffusion MRI of complex neural architecture.

Neuron 40 , 885–895.Wu, M., A. Diez-Roux, T. E. Raghunathan, and B. N. Sánchez (2018). FPCA-based method to select optimalsampling schedules that capture between-subject variability in longitudinal studies.

Biometrics 74 (1), 229–238.Yao, F., H.-G. Müller, and J.-L. Wang (2005). Functional data analysis for sparse longitudinal data.

Journal ofthe American Statistical Association 100 (470), 577–590.

Supplemental Materials for “Optimized Diffusion Imagingfor Brain Structural Connectome Analysis”

APPENDIX S1: MODIFIED SPHERICAL HARMONIC BASIS FUNCTIONSThis section provides additional exposition on the diffusion signal estimation procedureoutlined in Section 3.1.1 of the main text.The spherical harmonics are deﬁned as follows Y ml ( θ , θ ) = (cid:115) (2 l + 1)( l − m )!4 π ( l + m )! P ml ( cos ( θ )) e imθ , where the polar angle θ ∈ [0 , π ] and azimuthal angle θ ∈ [0 , π ] , P lm are the Legendre poly-nomials with order indices l = 0 , , ..., and phase factors m = − l, ..., , ..., l . The sphericalharmonics form a complete orthogonal basis system for L ( S ) . A real-valued and symmet-ric basis that is suitable for modeling diffusion signals can be constructed from the sphericalharmonics according to(S.1) φ j =  √ Re ( Y mk ) − k ≤ m < Y k m = 0 √ Img ( Y mk ) 0 < m ≤ k for k = 0 , , , ..., l , m = − k, ..., , ..., k and j = ( k + k + 2) / m , where the total numberof basis functions is J = ( l + 1)( l + 2) / .To protect against over-ﬁtting, it is important to promote smoother solutions. For func-tion approximation on some Euclidean space, this is typically accomplished by penalizingthe L norm of the Laplacian applied to the candidate ﬁt. The Laplace-Beltrami operator isa generalization of the Laplacian to submanifolds of Euclidean space. It has an especiallysimple form for the spherical harmonics, satisfying the relationship ∆ S Y ml = − l ( l + 1) Y lm ,where ∆ S denotes the Laplace-Beltrami operator on S . Coupling this with the orthonor-mality of (S.1), it follows that for any g ∈ span { φ j } Jj =1 , || ∆ S g || L = c (cid:124) g Rc g , where R = Diag ( l ( l + 1) , ..., l J ( l J + 1) ) with l j being the order associated with basis function φ j ,and c g ∈ R J being the coordinates of g with respect to φ ( p ) = ( φ ( p ) , ..., φ J ( p )) (cid:124) .For voxel v in subject i , its diffusion signal can be estimated by solving the followingregularized least squares problem ˆ c vi = argmin c ∈ R J M i (cid:88) m =1 ( s vim − c (cid:124) φ ( p im )) + λ i (cid:90) S (∆ S ( c (cid:124) φ ( p ))) dp, for penalty λ i > . Denoting the M i × J matrix of basis function evaluations [ Φ i ] mj = φ j ( p im ) and the vector of observed signal s vi = ( s vi , ..., s viM i ) (cid:124) . This optimization problemhas a closed form solution: ˆ c vi = [ Φ (cid:124) i Φ i + λ i R ] − Φ (cid:124) i s vi , and the resulting smoothed estimate of diffusion signal is given by ˆ X vi ( p ) = ˆ c (cid:124) vi φ ( p ) at p ∈ S .The roughness penalty parameter λ i controls the trade-off between the candidate solu-tion’s ﬁt to the observed data and its “wigglyness.” In the literature, the penalty parameteris typically chosen through either cross validation or by minimizing the generalized crossvalidation criteria (GCV), which can be interpreted as an estimator of the expected predictedresidual error sum of squares from leave-one-out cross validation (Golub et al., 1979). In thiswork, we opted for the GCV primarily due to its computational efﬁciency, a property that isparamount given the massive data size encountered in a whole brain dMRI study. CONSAGRA,VENKATARAMAN AND ZHANG

APPENDIX S2: MEASUREMENT ERROR VARIANCE ESTIMATIONOur statistical model for the observed diffusion signal, normalized with respect to themean non diffusion weighted signal, i.e. mean b = 0 signal, assumes that the measurementerror is independent of both b-vector and b-value. Let µ v be the true non-diffusion weightedsignal in voxel v and let s vl be l -th ( l = 1 , ..., n ) b = 0 diffusion signal measurement at voxel v . Then the natural extension of our statistical model to the b = 0 data is given by s v /µ v = 1 + (cid:15) where (cid:15) ∼ N (0 , σ ) is the measurement error variance. Hence, the distribution of the b = 0 signal is s v ∼ N ( µ v , [ µ v ] σ ) and we choose to estimate σ with ˆ σ = 1 |V| (cid:88) v ∈V (cid:99) Var ( s v ) / (¯ s v ) , where (cid:99) Var ( s v ) is the empirical variance of measurements s v , ..., s vn and ¯ s v is the empiricalmean.We investigate the properties of ˆ σ under the extension of the statistical model for the b = 0 diffusion signal. Using a multivariate Taylor expansion and truncating after the ﬁrstorder terms, it is easy to show that(S.2) E (cid:34) (cid:99) Var ( s v )(¯ s v ) (cid:35) ≈ E [ (cid:99) Var ( s v )] E [(¯ s v ) ] Since ¯ s v ∼ N ( µ v , [ µ v ] σ /n ) , we have E [(¯ s v ) ] = Var (¯ s v ) + E [¯ s v ] = ( µ v ) ( σ /n + 1) and E [ (cid:99) Var ( s v )] is an unbiased estimator of [ µ v ] σ . Plugging these back into (S.2) we obtain theapproximation E (cid:34) (cid:99) Var ( s v )(¯ s v ) (cid:35) ≈ σ σ /n + 1 Again using the ﬁrst order Taylor approximation, we obtain the following approximation forthe variance Var (cid:34) (cid:99)

Var ( s v )(¯ s v ) (cid:35) ≈ (cid:18) σ σ /n + 1 (cid:19) (cid:20) n − − ( σ /n + 1) − ) (cid:21) Therefore, we obtain the following approximations of the mean and variance of ˆ σ , E [ˆ σ ] ≈ σ σ /n + 1 and Var (ˆ σ ) ≈ |V| (cid:18) σ σ /n + 1 (cid:19) (cid:20) n − − ( σ /n + 1) − ) (cid:21) The approximate bias of the estimator is given by E [ˆ σ ] − σ ≈ σ (cid:20) σ /n + 1 − (cid:21) and therefore, while the estimator is asymptotically unbiased, we expect some ﬁnite sampledownward bias. The variance decays like |V| − , which is typically very small (i.e. we averageover many voxels) and hence the estimator is stable. Fig S1: (left) Estimates of the measurement error variance for 50 randomly selected trainingsubjects over a sequence of increasing number of b = 0 images. (right) MISE and errorsbars of CU-GDS reconstructions computed over the Gaussian low SNR simulation data usingexact σ (red), and over and under estimations (blue and green).In practical applications we assume that n , the number of b = 0 images in the pre-scanningperiod, is small and therefore we expect to incur some bias as a result. To assess this bias inpractice, we studied the estimator’s performance on each of the subjects in the high resolutionhistorical training set. Each subject has a total of n = 18 b = 0 images and for each j =4 , , ..., n , we selected j of the subject’s b = 0 images and applied the estimator ˆ σ . Theleft panel of ﬁgure S1 displays the results for a sample of 50 randomly selected subjects.As predicted, the small sample estimate appear to have slight downward bias. That said,comparing the estimates using only 4 b = 0 to the full set of n = 18 ; 33 ( are within of the ﬁnal estimated value and, 47 ( ) are within and all are within .To assess the proposed method’s sensitivity to bias in the estimation of σ , we used theGaussian low SNR simulation set-up from Section 4 and compared the reconstruction per-formance of CU-GDS when σ is both under and overestimated by . The right panel ofﬁgure S1 shows that there is not much degradation in performance, as measured by the meanISE across the test set, as the biased reconstructions still outperform all other reconstructionmethods in the sparse sample case (compare to bottom left panel in ﬁgure 2).APPENDIX S3: BAYESIAN PERSPECTIVE OF DIFFUSION SIGNAL ESTIMATIONFROM SPARSE DATAThe derivation of equation (7) can be motivated from a fully Bayesian perspective. Fromthe Bayes rule, the posterior distribution of the coefﬁcients P ( ξ K | s M , P M , H ) ∝ P ( s M | P M , ξ K ) P ( ξ K |H ) where the prior P ( ξ K |H ) ∼ N ( , Λ ) is empirically estimated from H , and the likelihood s M | ξ K follows a multivariate Gaussian distribution N ( µ M + Ψ M,K ξ K , σ I M ) , which canbe derived from equation (3). It is easy to show that the mean of this posterior distribution isexactly the conditional expectation in (7). Equivalently, we can consider the joint distribution CONSAGRA,VENKATARAMAN AND ZHANG of the observations and the process at a new point, i.e. the distribution of the random vector ( s M , X ( p )) (cid:124) . Through conditioning arguments on the resulting joint Gaussian distribution,the posterior predictive distribution for X ( p ) can be shown to have mean µ ( p ) + ( C ( p , p ) , ..., C ( p M , p )) (cid:124)  C ( p , p ) . . . C ( p , p M ) ... ... C ( p M , p ) . . . C ( p M , p M )  + σ I M  − ( s M − µ M ) Assuming the reduced rank approximation of the covariance function C in terms of the ﬁrst K eigenvalue-eigenfunction pairs, it is easy to show that the above is equivalent to equation(8). APPENDIX S4: OPTIMAL Q-SPACE SAMPLING S4.1. Proof of Theorem 1. L EMMA Deﬁne the auxiliary function f ( P M ) = trace (( Λ − K + 1 σ Ψ (cid:124) M,K Ψ M,K ) − ) − trace ( Λ K ) The maximization problem (12) is equivalent to the minimization problem (S.3) P ∗ M = argmin P M ∈P M ( A ) f ( P M ) P ROOF . Using the Woodbury identity, we have that Λ K − Λ K Ψ (cid:124) M,K [ Ψ M,K Λ K Ψ (cid:124) M,K + σ I M ] − Ψ M,K Λ K = ( Λ − K + 1 σ Ψ (cid:124) M,K Ψ M,K ) − Taking the trace of both sides shows g ( P M ) = − f ( P M ) and the result follows.L EMMA Let f be deﬁned as in Lemma 1, then f ( ˆ P ∗ m ) ≤ (cid:34) − exp (cid:32) − ρ − ρ − K + mσ λ ∗ φ mM (cid:33)(cid:35) f ( P ∗ M ) P ROOF OF L EMMA

1. The matrix ( Λ − K + σ Ψ (cid:124) M,K Ψ M,K ) − is the covariance ofthe posterior distribution of ξ K given the prior P ( ξ K |H ) ≡ N ( , Λ K ) and P ( (cid:15) | σ ) ≡N ( , σ I M ) under the reduced rank GP model. Therefore, (S.3) deﬁnes the A-optimal de-sign for (7), the MAP estimator of ξ K . Using Theorems 1 and 3 from Chamon and Ribeiro(2017) and Lemma 1 above, it is easy to see that f ( ˆ P ∗ m ) ≤ (cid:104) − exp (cid:16) − ¯ αmM (cid:17)(cid:105) f ( P ∗ M ) where ¯ α = min l ≤ m λ min ( Λ K − ) λ max ( Λ K − ) + lσ λ ∗ ψ = ρ − ρ − K + mσ λ ∗ ψ giving the desired result.P ROOF OF T HEOREM

1. The result follows immediately from Lemma 2 using the iden-tity f ( P M ) = − g ( P M ) . S4.2. Interpretation of the Performance Bound.

To facilitate interpretation, assumethat we have run Algorithm 1 for M iterations. Using the bound from Theorem 1, we havethe ratio g ( ˆ P ∗ M ) g ( P ∗ M ) ≥ (cid:20) − exp (cid:18) − (cid:18) ρ K ρ (cid:19) (cid:16) M λ ∗ ψ (cid:16) ρ K σ (cid:17)(cid:17) − (cid:19)(cid:21) . We now explore how this bound behaves in different scenarios. Consider that the rangesof terms in the exponential component are ρ K ρ ∈ (0 , , and (1 + M λ ∗ ψ ( ρ K σ )) − ∈ (0 , .The bound reaches its tightest, − exp ( − , as both terms approach 1. Alternatively, thebound is loosest, approaching the trivial bound of , when either of the terms approaches0. We have the following interpretations. In terms of the random function X , the bound istighter when the eigenvalues of the reduced rank covariance function are clustered near toone another. In other words, the functional variation is nearly uniformly distributed amongthe { ψ , ..., ψ K } , i.e. there is large uncertainty in the underlying local diffusion direction. Ad-ditionally, due to the connection between the trace of a matrix and its eigenvalues, we havethat λ ∗ ψ = max p ∈A (cid:80) Kk =1 ψ k ( p ) , which is clearly bounded by (cid:80) Kk =1 || ψ k || ∞ . Since the sup-norm increases with higher order spherical harmonics, very large values of || ψ k || ∞ indicatea relatively large coefﬁcient on a high order φ j . Said another way, the less the contributionof high order components to the ψ k ’s, the tighter the bound on the ratio between the approx-imate and globally optimal designs. Finally, notice that when ρ K σ is large, the second termin the exponential goes to . Therefore, the bound is tighter when the signal to noise ratio islow. APPENDIX S5: COMPUTATION SPEED COMPARISONThe computational speed of the dipy implementation of the ESR algorithm and our imple-mentation of Algorithm 1 (GDS) were compared over a sequence of increasing total budgets.The comparisons were run on a machine with 1.80 GHz CPU and 16 GB of RAM 10 timesfor each budget. Table 1 displays the mean compute time for both methods at each budgetconsidered, along with the standard deviation in parentheses. The results show that the GDSis reasonably fast, which is critically important in practice since we need to calculate the op-timal gradient table for each subject of interest in real time before the diffusion imaging. Thecomputational performance of the GDS and ESR are comparable, though the latter need notbe computed per-subject. Computational Time (s)Budget ESR GDS10 0.4299 (0.0201) 0.7584 (0.0159)20 0.7405 (0.0272) 1.1068 (0.0487)30 1.1755 (0.0194) 1.3890 (0.0061)50 2.7060 (0.0820) 2.2080 (0.0677)70 4.9674 (0.1823) 3.6329 (0.1754)90 7.6678 (0.0789) 6.1353 (0.1289)T

ABLE1