[PDF] Towards a quantitative assessment of neurodegeneration in Alzheimer's disease

Abstract

Alzheimer's disease (AD) is an irreversible neurodegenerative disorder that progressively destroys memory and other cognitive domains of the brain. While effective therapeutic management of AD is still in development, it seems reasonable to expect their prospective outcomes to depend on the severity of baseline pathology. For this reason, substantial research efforts have been invested in the development of effective means of non-invasive diagnosis of AD at its earliest possible stages. In pursuit of the same objective, the present paper addresses the problem of the quantitative diagnosis of AD by means of Diffusion Magnetic Resonance Imaging (dMRI). In particular, the paper introduces the notion of a pathology specific imaging contrast (PSIC), which, in addition to supplying a valuable diagnostic score, can serve as a means of visual representation of the spatial extent of neurodegeneration. The values of PSIC are computed by a dedicated deep neural network (DNN), which has been specially adapted to the processing of dMRI signals. Once available, such values can be used for several important purposes, including stratification of study subjects. In particular, experiments confirm the DNN-based classification can outperform a wide range of alternative approaches in application to the basic problem of stratification of cognitively normal (CN) and AD subjects. Notwithstanding its preliminary nature, this result suggests a strong rationale for further extension and improvement of the explorative methodology described in this paper.

Full PDF

TTOWARDS A QUANTITATIVE ASSESSMENT OFNEURODEGENERATION IN ALZHEIMER’S DISEASE

OLEG MICHAILOVICH AND RINAT MUKHOMETZIANOV

The Department of Electrical and Computer Engineering, University ofWaterloo, Waterloo, ON N2L 3GL, Canada

Submitted to

IEEE Access

Abstract

Alzheimer’s disease (AD) is an irreversible neurodegenerative disorder that pro-gressively destroys memory and other cognitive domains of the brain. While eﬀec-tive therapeutic management of AD is still in development, it seems reasonable toexpect their prospective outcomes to depend on the severity of baseline pathology.For this reason, substantial research eﬀorts have been invested in the developmentof eﬀective means of non-invasive diagnosis of AD at its earliest possible stages.In pursuit of the same objective, the present paper addresses the problem of thequantitative diagnosis of AD by means of Diﬀusion Magnetic Resonance Imaging(dMRI). In particular, the paper introduces the notion of a pathology speciﬁc imag-ing contrast (PSIC), which, in addition to supplying a valuable diagnostic score,can serve as a means of visual representation of the spatial extent of neurodegen-eration. The values of PSIC are computed by a dedicated deep neural network(DNN), which has been specially adapted to the processing of dMRI signals. Onceavailable, such values can be used for several important purposes, including strat-iﬁcation of study subjects. In particular, experiments conﬁrm the DNN-basedclassiﬁcation can outperform a wide range of alternative approaches in applicationto the basic problem of stratiﬁcation of cognitively normal (CN) and AD subjects.Notwithstanding its preliminary nature, this result suggests a strong rationale forfurther extension and improvement of the explorative methodology described inthis paper.

Keywords: diﬀusion MRI, deep learning, convolutional neural networks, earlydiagnosis, Alzheimer’s disease. 1.

Introduction

The world population is steadily ageing, and with advanced age comes a higherrisk of dementia. At the present time, the dementia of Alzheimer’s type, or

Alzheimer’s disease (AD), accounts for almost two-thirds of all prevalent cases of a r X i v : . [ ee ss . I V ] N ov TOWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE dementia in the elderly. AD is an irreversible, progressive disease that slowly de-stroys memory and other cognitive domains, eventually leaving the patient bedrid-den. The course of AD pathology is likely to span around two to three decades [1].Unfortunately, by the time when the ﬁrst symptoms emerge, it is usually too lateto save the brain. For this reason, over the last two decades, considerable eﬀortshave been directed towards ﬁnding eﬀective means of the earliest possible diagnosisof AD [2].The current arsenal of methods for quantitative diagnosis of AD is impressivelybroad, ranging from advanced proteomics to state-of-the-art neuroimaging. In thelatter case, particularly promising results have been demonstrated by both nuclearand Magnetic Resonance Imaging (MRI) [2–4].Among various methods of MRI, diﬀusion

MRI (dMRI) is exceptional for itsunique ability to generate imaging contrast based on the microscopic (rather thanmacroscopic) properties of neurological tissue, which makes it singularly ﬁt for thetask of detecting the earliest signs of neurodegeneration [5,6]. This ability of dMRIhas been investigated in a number of studies [7–11], which predominantly focusedon the problem of classiﬁcation (aka stratiﬁcation ) of three groups of subjects, viz. cognitively normal (CN) subjects, AD subjects, and the subjects diagnosed with mild cognitive impairment (MCI). Note that the latter is broadly recognized as aprodromal condition that frequently heralds the onset of “full-blown” AD [12, 13].In virtually all earlier studies on dMRI-based stratiﬁcation of AD, MCI, and CNsubjects, the protocol of choice has been

Diﬀusion Tensor Imaging (DTI) [5, 6, 14].The latter is known to provide an adequate characterization of diﬀusion dynamicsin the white matter associated with non-crossing bundles of neural ﬁbre tracts.Unfortunately, its dependence on Gaussian modelling curbs the ability of DTI todelineate more complex diﬀusion processes, e.g., within crossing ﬁbres [15, 16].It is thus no wonder that, even though many DTI metrics have demonstratedconsiderable sensitivity across multiple brain regions, the most consistent ﬁndingshave been conﬁned to the corpus callosum [7–9, 11, 17–21]. At the same time, fewDTI studies have been able to stratify CN, AD, and MCI subjects based on DTIanalysis of the medial-temporal white matter, which is known to be abundant inboth crossing and “kissing” ﬁbre tracts. The problem here has obviously beenin the intrinsic modelling limitations of DTI, which is rather discouraging out-turn in view of the known involvement of the above region in the early stages ofneuropathological AD [22].The limitations of DTI have prompted the development of more advanced meth-ods of dMRI, among which Neurite Orientation Dispersion and Density Imaging(NODDI) is considered to be one of the most comprehensive approaches to quan-titative characterization of cerebral diﬀusion [23]. Naturally, several studies haveinvestigated the applicability of NODDI to early diagnosis of AD. However, whentrying to correlate the spatial distribution of NODDI metrics with histopathologi-cal evidence of AD, it was observed in [24] that NODDI oﬀered somewhat marginal

OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE3 advantages over DTI. A similar conclusion was reached in [25], in which NODDIwas used in application to diagnosis of young-onset AD.Needless to say, DTI and NODDI are only two speciﬁc examples among a widerange of methods available under the umbrella of dMRI. However, regardless oftheir speciﬁc modelling assumptions, all these methods share a tendency to pro-duce more accurate results at the expense of higher complexity of parametrization.At the same time, the use of parametric spaces of progressively higher dimension-ality requires a proportional increase in the number of data points, which mightnot always be possible due to practical constraints. Thus, given the melange ofavailable protocols and models, the question of which of the existing dMRI methodsis “the best” for early diagnosis of AD appears to be rather non-trivial.Before going any further, it is important to note that not all methods of dMRIare equally feasible from the viewpoint of clinical implementation. In particular,for practical reasons, the typical duration of a clinical dMRI examination rarelyexceeds 15-20 mins. This constraint puts a strict upper bound on the amountof acquirable data, and, consequently, on the maximal order of numerically stableparametrization. For this reason, most of the studies on the dMRI-based diagnosisof AD have predominantly relied on DTI. Despite its numerous limitations, DTIremains “the method of choice” in many ongoing studies thanks to the minimalityof its technical requirements and its time eﬃciency.It is also worthwhile noting that, in quotidian exchanges, the term “DTI” is usu-ally used in two diﬀerent connections. In particular, it could refer to the Gaussian(i.e., 2 nd -order tensor) diﬀusion model which lies in the foundation of DTI analysis.This model is described by a total of seven parameters , which could be estimatedbased on a minimum of seven independent measurements. Although the practicalnumber of diﬀusion measurements normally exceeds this low bound, their acqui-sition is still rare to require more than 15 mins of scanning time, which makessuch imaging protocols clinically feasible . It is probably due to the association be-tween the low dimensionality of DTI modelling and its dependence on a relativelysmall number of measurements that the term “DTI” has also been used to refer todiﬀusion data comprised of a comparatively small number of diﬀusion encodings.Hence, to avoid misconception, the terms DTI data and DTI modelling need to bedistinguished.For the reasons explained above, virtually all clinical dMRI data could be char-acterized as “DTI”. From the practical point of view, therefore, it seems rea-sonable to restrict the scope of available dMRI models to only those which couldbe reliably ﬁtted based on clinical DTI data. Unfortunately, this would havemainly left us with low-parametric models of a descriptive power similar to that ofDTI. This out-turn reveals a critical methodological predicament, where the use The DTI model is parametrized by a symmetric 3 × TOWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE of more advanced models is fraught with estimation artefacts, whereas suppress-ing such artefacts by including additional measurements is precluded by practicalconstraints. In such conditions, the use of low-parametric dMRI models becomesthe only option, which, unfortunately, comes at the cost of reduced accuracy.A particularly promising way to improve over the performance of low-parametricdMRI modelling is oﬀered by data-driven inference and, in particular, by its recentrealization in the theory of

Deep Learning (DL) [26, 27]. The modern methods ofDL make it possible to discern subtle and complex dependencies in experimentaldata, which would have been impossible to describe in mechanistic terms. In thiscase, the actual (unknown) model is replaced by its phenomenological representa-tion in terms of a

Deep Neural Network (DNN) that is capable of “learning” to predict future outcomes based on past observations.Although the idea of using DL in the imaging-based diagnosis of AD is notoriginal, much of the work along this direction has mainly focused on structural

MRI. The latter has proven instrumental in assessing the cerebral atrophy due toAD in both cortical and subcortical regions of the brain [28–32]. In longitudinalstudies, the substantial potential for prediction of ensuing cognitive decline inMCI and AD subjects has been demonstrated through the use of recurrent DNNs[33–38], both with and without augmenting the structural MRI data with othersources of diagnostic information [34, 39–41]. However, the application of DLto a dMRI-based diagnosis of AD remains a barely tapped in area of research,notwithstanding the abundant evidence of its successful use in other applications[42, 43]. Accordingly, the primary goal of this work has been to leverage thecombined power of dMRI and DL towards the early diagnosis of AD.The key idea of the proposed methodology is built around the notion of pathologyspeciﬁc imaging contrast (PSIC). Similarly to other imaging-based markers, PSICis a scalar score that indicates the presence of a suspected pathology (such as, e.g.,AD). However, instead of characterising an entire dataset, the values of PSIC arecomputed at each spatial coordinate within a speciﬁed region-of-interest (ROI). Inthis way, PSIC can serve as a local indicator of the degree to which AD pathologyaﬀects various anatomical sites.Once available, the values of PSIC can be converted into regional statistics,which can, in turn, be used for the purpose of subject classiﬁcation. In thecase of multiple ROIs, the performance of such classiﬁcation could be furtherimproved through exploring the interdependencies between diﬀerent regional sta-tistics, which are known to undergo sizeable changes in AD [44]. Furthermore, thelocal PSIC values can be displayed in superposition with anatomical images in theform of a contrast . In this way, PSIC also oﬀers a means for visual analysis of theextent and severity of suspected pathology.In this work, the values of PSIC have been generated by a dedicated DNN.Although the overall architecture of the DNN is based on a standard feed-forwardconﬁguration, its principal operations (such as, e.g., convolution, pooling, etc.)

OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE5 have been properly adjusted to the physical and analytical properties of diﬀusionsignals. The adjustment made it possible to minimize the number of networkparameters and, consequently, to reduce the amount of training data substantially.In particular, the results of this paper have been obtained based on only 40 dMRIdatasets.It is important to emphasize that the results reported here should be regardedas neither exhaustive nor ﬁnal, but rather describing a novel concept which admitsmany possible extensions and improvements. For this reason, no attempts havebeen made to “push the limits” of the proposed method by testing its performancein deliberately diﬃcult scenarios. Instead, the present experimental study has beendeliberately limited to the basic task of stratiﬁcation of CN and AD subjects, whilefocusing instead on a comparative performance between the proposed method anda range of existing alternatives.The remainder of this paper is organized as follows. The principal idea of theproposed method is described in Section 2, while Section 3 introduces some nec-essary technical preliminaries, followed by a description of the proposed networkdesign in Section 4. Subsequently, Section 5 provides details on the experimentalsetup, study data, and performance metrics used in this work, while experimentalresults are summarized in Section 6. Finally, Section 7 concludes the paper witha discussion of its main ﬁndings along with an outline of possible directions forfuture research. 2.

Principal idea and data structure

The experimental study of this work has been based on the dMRI data availablethrough the continuing eﬀorts of the Alzheimer’s Disease Neuroimaging Initiative(ADNI) . Over the last few years, the ADNI database has been extended to includedMRI data acquired by means of relatively advanced protocols. For the purposesof this paper, however, we used the dMRI data collected during an earlier phaseof the ADNI study – known as ADNI-II. At that time, the data acquisition reliedon more standard protocols which are more common in present-day clinical DTI.Thus, working with the earlier data should provide a more objective demonstrationof the practical value of the proposed methodology.In what follows, we consider a typical setting in which a DTI dataset consists of K diﬀusion-encoded MRI volumes which encode the values of apparent diﬀusivityalong diﬀerent spatial orientations. Such data are usually acquired at a ﬁxed levelof diﬀusion sensitization controlled by the b -value [14] . In particular, in the case ofADNI-II data, the number of diﬀusion encodings was set to K = 41, with b = 1000s/mm . For more details on various ADNI programs, visit adni.loni.usc.edu . For relatively large K , this acquisition scheme is also known as High Angular ResolutionDiﬀusion Imaging (HARDI) [45].

TOWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE

Formally, DTI signals can be considered to be functions of both spatial andspherical coordinates. In practice, the spatial coordinate is sampled over a regularCartesian grid Ω := { n = ( n , n , n ) | ≤ n i < N i , i = 1 , , } which representsthe (anatomical) image domain. On the other hand, the process of diﬀusion en-coding restricts the signal values to K points { u k | (cid:107) u k (cid:107) = 1 } Kk =1 over the unitsphere S in the diﬀusion q -space. Thus, from a practical point of view, a DTIdataset can be viewed as a 4-D numerical array of size N × N × N × K , withthree “anatomical” and one “diﬀusion” dimension.For some r ∈ Ω, let N r be a symmetric neighbourhood of r consisting of all n such that (cid:107) n − r (cid:107) ∞ = max ≤ i ≤ | n i − r i | ≤ L for some (small) radius L > . Theterm “diﬀusion cube (DC) at r ” will be used below to refer to a segment of DTIdata spatially restricted to N r . Formally, the DC at r is deﬁned as s r = s r ( n , u k ) := { s ( n , u k ) | n ∈ N r , ≤ k ≤ K } , where s ( n , u k ) denotes the signal value at position n ∈ N r and diﬀusion-encodingorientation u k . Thus, similarly to the entire dataset, s r can be viewed as a 4-Darray of size M × M × M × K , with M = 2 L + 1.As the next step, we refer to Fig. 1 that illustrates various stages of comput-ing the PSIC values in application to the stratiﬁcation of CN and AD subjects.Suppose, for a given study subject (Subplot A); we are interested in making aninference based on DTI data conﬁned to some prescribed ROI R ⊂

Ω (SubplotB). Then, for each r ∈ R , such that N r ⊂ R (Subplot C), its associated DC s r (Subplot D) is passed to a dedicated DNN f ( s r | θ ) that yields a positive PSICscore 0 ≤ γ R ( r ) ≤

1. For the sake of argument, at the moment, the networkparameters θ are assumed to be set to their optimal values θ ∗R . In this case, byvirtue of network design and training, γ R ( r ) is set up to scale proportionally tothe likelihood of the neural tissue at r ∈ R to be aﬀected by AD. Thus, in par-ticular, the CN cases would be associated with relatively low values of PSIC (e.g., γ R ( r ) = 0 . γ R ( r ) = 0 . γ R ( r ) computed at all r ∈ R can also be used as an imag-ing contrast, which can be superimposed over a structural display of underlyinganatomy. Such contrast is expected to be comparatively weak and scarce in CN(Subﬁgure F) while being intense and densely concentrated in AD (Subﬁgure G).In practice, the optimal network parameters are estimated from training data.To this end, we hypothesize that, for a proper choice of R , the eﬀects of neurode-generation are manifested in some hidden diﬀusion characteristics which perseverethroughout the entire ROI. Moreover, such hidden characteristics are likely to beshared between diﬀerent subjects within the same diagnostic group.Under the above hypothesis, the training data can be formed by stockpiling theDCs from all voxels r within a speciﬁed ROI across all subjects within the same For example, with L = 1, N ( r ) represents a standard 27-connected neighbourhood of r . OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE7

Figure 1.

Computation of PSIC: (A) subject under examination;(B) DTI dataset with a speciﬁed ROI

R ∈

Ω; (C) “inner” point r and its neighbourhood N r ; (D) corresponding DC s r ; (F) and (G)regional PSIC values in the case of CN and AD (G), respectively.diagnostic group. Speciﬁcally, in application to the binary problem of stratiﬁcationof CN and AD subjects, such training data would consist of a set of DCs obtainedfrom all available CN subjects, on the one hand, and a similar set of DCs comingfrom all available AD subjects, on the other hand.More formally, let N CN and N AD be the number of subjects in the CN and ADgroups, respectively. Also, let { s CN ,i } N CN i =1 (resp., { s AD ,i } N AD i =1 ) denote the diﬀusiondatasets collected from the CN (resp., AD) subjects. In this case, the trainingdata would consist of two subsets of samples deﬁned as X R CN := (cid:8) { s CN ,i r } r ∈R (cid:9) N CN i =1 , X R AD := (cid:8) { s AD ,i r } r ∈R (cid:9) N AD i =1 , along with their corresponding (target) labels γ = 0 and γ = 1, respectively. Aconceptual illustration of the process of formation of the training data is shown inFig. 2.In general, the above procedure can be applied to M diﬀerent ROIs {R j } Mj =1 ,in which case one would have M diﬀerent training sets, {X R j CN } Mj =1 and {X R j AD } Mj =1 .Each such set could then be used to estimate the optimal parameters θ (cid:63) R j for itsrelated R j independently. TOWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE

Figure 2.

Conceptual illustration of the process of formation ofthe training data for a speciﬁed ROI.3.

Analytical tools and principal operations

Composite Convolution.

At a conceptual level, the DNN proposed in thispaper is based on a standard feed-forward conﬁguration, consisting of a typicalsuccession of convolutional operations interleaved with nonlinearities and resizing.However, when deﬁning such operations, it would be amiss to disregard the phys-ical properties of diﬀusion signals which oﬀer a number of important advantages.As discussed in Section 2, at its input, the network receives a single DC, whichcan be viewed as a 4-D numeric array of size M × M × M × K . Alternatively, theDC could be considered to be an array of the discrete values of a diﬀusion signal S ( x , u ) measured over Ω L = { n | (cid:107) n (cid:107) ∞ ≤ L } and spherical orientations { u k } Kk =1 .Formally, S ( x , u ) can be deﬁned over the combined domain ¯Ω := Conv(Ω L ) × S ,where Conv(Ω L ) stands for the convex hull of Ω L . Thus, in order to incorporateconvolution into the network design, one needs a proper deﬁnition of such operationfor the signals deﬁned over ¯Ω.Generally speaking, the convolution of ¯Ω-domain signals is deﬁned over the spe-cial Euclidean group SE (3), working with which might be excessively complicatedfrom a computational point of view. In what follows, we introduce a particular def-inition of this operation, which oﬀers a number of important practical advantages. OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE9

In this way, the present results expand on the simpler case of spherical signals over S , which have been successfully addressed in a number of recent studies [46, 47].The proposed method relies on a simpliﬁed interpretation of SE (3) convolution,which is derived under the assumptions of separability and zonality . In particular,the former requires the convolution kernel to be a separable product of a spatially-dependent and a spherically-dependent component. The assumption of zonality,on the other hand, requires the spherical component to be a zonal function, whichimplies its invariance to azimuthal rotations. Such functions admit representationin terms of Legendre polynomials { p n } ∞ n =0 as given by(1) ξ ( t ) = ∞ (cid:88) n =0 n + 14 π ξ n p n ( t ) , t ∈ [ − , +1] , with ξ n = 2 π (cid:82) − ξ ( t ) p n ( t ) dt known as the Legendre coeﬃcient of degree n .The use of zonal kernels considerably simpliﬁes the deﬁnition of spherical con-volution. To see that, we ﬁrst assume that, at each x ∈ Ω L , S ( x , u ) can be closelyapproximated by its truncated Spherical Harmonic (SH) expansion of the form(2) S ( x , u ) (cid:39) n max (cid:88) n =0 , ,... n (cid:88) l = − n c n,l ( x ) Y n,l ( u ) , with Y n,l ( u ) and c n,l ( x ) being the l -th order SH of degree n and its correspondingexpansion coeﬃcient, respectively. Note that, due to the spherical symmetry ofDTI signals, the summation in (2) is restricted to the even values of n , in which casethe total number of expansion coeﬃcients is equal to P = 0 . n max + 1)( n max + 2),for a predeﬁned maximal degree n max > x ∈ Ω L , the availability of the SH coeﬃcients c n,l ( x ) and the knowledgeof the Legendre coeﬃcients ξ n of the zonal kernel ξ ( u ) can be used to deﬁne thespherical convolution of S ( x , · ) and ξ according to(3) ( S ( x , · ) ∗ u ξ ) ( u ) = n max (cid:88) n =0 , ,... n (cid:88) l = − n ˜ c n,l ( x ) Y n,l ( u ) , where ˜ c n,l ( x ) = ξ n c n,l ( x ) , for all n = 0 , , . . . , n max and | l | ≤ n .Next, we note that at each x ∈ Ω L , c n,l ( x ) is just a real vector of length P . Whencomputed over the entire Ω L ; all such vectors can be conveniently assembled into a4-D array c of size M × M × M × P . Alternatively, c can be viewed as a collectionof P volumetric “coeﬃcient images” c n,l ∈ R M × M × M , where each c n,l comprisesthe spatially-dependent values of the SH coeﬃcient of degree n and order l .It is also convenient to split c into ( n max + 2) / c k of 3-D arrays ac-cording to the value of their degree n , i.e., c k = { c k,l } kl = − k , with k = 0 , , . . . , n max .In this case, the spherical convolution in (3) amounts to scaling each “coeﬃcient image” in c n by the same scalar ξ n . While computationally eﬃcient, however, thisoperation has the disadvantage of ignoring the spatial behaviour of input signals.The coordinate-wise spherical convolution in (3) can be generalized into a com-posite spatial-spherical convolution as follows. For n ≤ n max , let w n = { w kn,l } | l |≤ n, | k |≤ n be a set of P n spatial-domain ﬁlters of size J × J × J . Then, based on the as-sumption of separability, the operation of n th - band spatio-spherical ﬁltering can bedeﬁned as(4) ˆ c n,l = n (cid:88) k = − n c n,k ∗ r w kn,l , for all | l | ≤ n, where ∗ r stands for discrete convolution in the spatial domain. The deﬁnition in(4) suggests that, for each spherical order l , the output ˆ c n,l is computed as a linear convolutional combination of the “input images” in c n with ﬁlters w − nn,l , ..., w n,l , ..., w nn,l . In this way, the action of (4) extends across the entire Ω L , as opposed tothe case of (3).Extending the n th -band spatio-spherical ﬁltering in (4) to all n = 0 , , . . . , n max gives rise to a convolution-type operator that takes eﬀect across both of the spher-ical and spatial domains. In what follows, this operation will be referred belowto as composite convolution . Formally, for a 4-D array of SH coeﬃcients c and abank of ﬁlters w = { w , w , . . . , w n max } , their composite convolution(5) ˜ c = c ∗ u , r w is deﬁned according to (4) for each n = 0 , , . . . , n max and | l | ≤ n .3.2. SH coeﬃcients.

The composite convolutional in (5) implies the availabilityof SH coeﬃcients c , which can be estimated directly from DTI data, as described,e.g., in [48]. The estimation is carried out independently on each of the M signalscomprising a given DC, resulting in an M × M × M × P array of associated SHcoeﬃcients c . The proposed DNN has been designed to work directly with such c ,which are assumed to be precomputed prior to network training. For the sake ofnotational simplicity, such input arrays will also be referred bellow to as “DCs”.Finally, the question of setting n max should not be overlooked. Usually, n max isdeﬁned in accordance with a required b -value. In particular, for b = 1000 s/mm (as used in ADNI-II), setting n max = 6 seems to be a conventional choice [49–51].Note that, in this case, the total number of SHs is equal to P = 28.4. Proposed network architecture

Convolutional layers.

The proposed DNN consists of several convolutionallayers, each of which is parameterized by (convolutional) weights w = { w n } n =0 , ,... For each n , the computation of ˜ c n can be identiﬁed with the action of multi-channel convo-lution that is a standard computational routine included in many existing DL frameworks, suchas, e.g., TensorFlow ® (which has been used in this study). OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE11 and a vector of P (scalar) biases b = { b n,l } , with n = 0 , , . . . , n max and | l | ≤ n .Given an input array c in , each such layer computes its output c out according to(6) c out = (cid:0) c in ∗ u , r w (cid:1) + b, where the plus sign is assumed to broadcast the values of b so that every “coeﬃcientimage” in ( c in ∗ u , r w ) is summed with a diﬀerent b n,l . Note that the aﬃne operationin (6) is not exclusive to volumetric data, since it can be reduced to its 2-D and1-D versions by merely replacing all w kn,l in w by their 2-D and 1-D counterparts,respectively.4.2. Activation.

The scope of activation functions currently used in DL is broad.In this work, all activation functions have had the form of a basic rectiﬁed linearunit (ReLU) [26], with its input-output relation deﬁned as given by(7) c out = ReLU( c in ) = max(0 , c in ) , where the maximization is carried out independently for all values in the inputarray. Note that, over the last years, (7) has been modiﬁed in several ways (result-ing in, e.g., leaky ReLu, noisy ReLu, and exponential ReLU). In our experiments,however, the basic deﬁnition in (7) was observed to work more than adequately.4.3. Pooling.

Pooling is a standard method of data aggregation which can be usedto suppress redundancies in input data, reduce the number of network parameters,and to minimize the risk of overﬁtting [27]. The special structure of DC samples c ,however, requires a proper adaptation of this operation. Speciﬁcally, in this work,the operation of pooling consisted of max pooling applied along a singular spatialdimension . Speciﬁcally, depending on the direction of maximization, the poolingcan be deﬁned in three possible ways as given by c outn,l ( y, z ) = P x ( c inn,l ( x, y, z )) := max x c inn,l ( x, y, z ) , (8a) c outn,l ( x, z ) = P y ( c inn,l ( x, y, z )) := max y c inn,l ( x, y, z ) , (8b) c outn,l ( x, y ) = P z ( c inn,l ( x, y, z )) := max z c inn,l ( x, y, z ) , (8c)for each ( n, l ) and ( x, y, z ) ∈ Ω L . Above, P denotes the pooling operator, with itssub- and superscripts indicating the spatial dimensionality of c inn,l and the directionof maximization, respectively.It is important to emphasize, while c inn,l in (8) depends on three spatial variables,the number of spatial dimensions of c outn,l is reduced to two, and, consequently, eachof the P “coeﬃcient images” composing c out is now an array of size M × M × P .The resulting outputs could be subjected to 2-D pooling deﬁned according to c outn,l ( y ) = P x ( c inn,l ( x, y ) = max x c inn,l ( x, y ) , (9a) c outn,l ( x ) = P y ( c inn,l ( x, y )) = max i (cid:48) y c inn,l ( x, y ) , (9b) Figure 3.

Proposed network architecture.for each ( n, l ) and ( x, y ), with | x | , | y | ≤ L . Note that, in both variants above, thespatial dimension of c out has been reduced to one (i.e., each c outn,l is now an arrayof size M × P ).Proceeding analogously, one can ﬁnally deﬁne the operation of 1-D pooling as(10) c outn,l = P x ( c inn,l ( x )) = max x c inn,l ( x ) , for each ( n, l ) and | x | ≤ L . This type of pooling collapses the spatial dimension of c in , resulting in a length- P vector c out of modiﬁed SH coeﬃcients.4.4. Network Architecture.

Apart from using the purpose-built operations ofconvolution and pooling, the proposed DNN relies on a standard feed-forwardarchitecture, which is depicted in Fig. 3 for the case of L = 1 and n max = 6(i.e., M = 3 and P = 28). The network consists of six neural layers which, inthe ﬁgure, are shown separated by vertical dotted lines. The dimensions speciﬁedat the top of these lines indicate the size of the inputs received by each layer.Speciﬁcally, the input layer of the network receives a DC array c of size 3 × × × × ×

28, each of which isprocessed in an analogous manner, using (9) and (6) (with a 2-D version of (5)).Similar computations are applied to all of the six inputs of the third layer, whichare subjected to transformation (6) (with a 1-D version of (5)), ReLU activation,and 1-D pooling according to (10). Subsequently, each of the resulting pairs of P -length vectors is fused into a single 28-length vector by means of a fully connectedlayer (FCL). An additional FCL is used to transform its inputs into a single vectorof length 28.The ﬁnal layer of the DNN consists of a linear transformation into the spaceof diagnostic outcomes, followed by the softmax operator. In the case of binary OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE13 stratiﬁcation (i.e., CN vs AD), the latter can be deﬁned as [27](11) γ = e α e α + e β , with α and β being the two outputs of the linear transformation. In the contextof this paper, the score 0 ≤ γ ≤

1, thus computed, is referred to as PSIC.4.5.

Network Optimization.

The proposed DNN is parameterized by the valuesof convolution kernels, biases and weight matrices used across various componentsof the network. Altogether, these parameters can be gathered into a single vector θ , in which case, for a given θ , the DNN acts as a forward mapping γ = f ( c | θ ),associating each DC c with its PSIC score. In this case, it is a standard practice toestimate the optimal value of θ via solving a cross-entropy minimization problemof the form(12) θ ∗ = arg min θ E c {− γ log f ( c | θ ) } , where expectation E is computed over the empirical distribution of c .Finally, let {R j } Mj =1 be a set of relevant ROIs. Also, for each R j , let θ ∗R j denotethe optimal values of its associated DNN parameters. Then, given an unlabelledset of DTI measurements, f ( · | θ ∗R j ) can be used to compute the PSIC valuesacross the entire R j . Subsequently, the resulting scores can be summarized intoregional statistics for the purpose of subject stratiﬁcation, as described next.5. Experimental study design

Study data.

As stated earlier, the experimental study of this work has beenfocused on the problem of stratiﬁcation of CN and AD subjects. To this end, 20CN and 20 AD age-matched subjects (mean age 72.6 ± . The dataset of each subject consisted of K = 41 diﬀusion-encoded volumetric scans, acquired at b = 1000 s/mm , as wellas ﬁve b -volumes (i.e., scans acquired in the absence of diﬀusion sensitization). Inaddition, each dataset was supplemented with its associated T - and T -weightedscans required for structural image alignment and segmentation.5.2. Data preprocessing.

For each dataset, its b -volumes were ﬁrst co-registeredand then merged into a single (average) b -volume, which was subsequently usedfor normalization of the K diﬀusion-encoded volumes. After that, the normalizeddata were subjected to preprocessing by means of a custom pipeline which had beendesigned based on the recommendations of [52]. The technical implementation ofthe pipeline relied on the NiPype framework of NiPy ( nipype.readthedocs.io ),which allowed convenient integration of many well-established tools of computa-tional imaging, including FSL ( https://fsl.fmrib.ox.ac.uk/ ), ANTs ( picsl. For more details on the inclusion/exclusion criteria and other design parameters used by theADNI-II study, please refer to adni.loni.usc.edu . upenn.edu/software/ants/ ), and SPM ( ). Asper usual, the principal purpose of the preprocessing pipeline has been to compen-sate for various imaging artefacts caused by the eﬀects of subject motion, variablemagnetic susceptibility, eddy currents, etc.Additionally, FreeSurfer [53] ( surfer.nmr.mgh.harvard.edu/ ) was used for thepurpose of segmentation of grey and white matter, with their subsequent parcella-tion into smaller anatomical regions. Subsequently, the anatomic labels providedby FreeSurfer were used to partition the image domain Ω of the DTI volumes intoa set of predeﬁned ROIs {R j } Mj =1 , as detailed below.5.3. Deﬁnition of ROIs.

The two left columns of Table 1 summarize the namesof the anatomical regions computed by FreeSurfer, along with their acronyms. Inaddition, for each ROI, the leftmost columns of the table indicate the total numberof DC samples available in its respective CN ( X R j CN ) and AD ( X R j AD ) subsets.The ROIs have been chosen to correspond to diﬀerent parts of white matteranatomy, which are known to be implicated in the pathogenesis of AD. It shouldbe noted, however, that FreeSurfer lacks means of direct delineation of white mat-ter. Instead, various elements of the latter are labelled based on their proximityto the nearby anatomical structures of grey matter. Thus, for instance, all voxelsdesignated as white matter and located, e.g., within a 5 mm ribbon around the leftsuperior frontal gyrus would be labelled as lh-SFW. This approach is obviouslynot without limitations, the rectiﬁcation of which has been in the focus of ongoingresearch [54]. Nevertheless, considering the comparative nature of the results re-ported here, the choice of a speciﬁc method of whole-brain segmentation does notseem to be particularly critical.Using the methodology of Section 2, each of the M = 14 regions in Table 1was used to assemble its associated input samples X R j CN and X R j AD , correspondingto the target labels γ = 0 and γ = 1, respectively. Subsequently, the processof network training was carried out for each j independently, yielding a vector ofoptimal network parameters for each of the chosen ROIs.5.4. Network training.

Prior to training, for each j = 1 , , . . . ,

14, the samples in X R j CN and X R j AD were randomized and split into a training and a validation dataset (inproportion 4:1), which were subsequently used for the purpose of estimation of θ ∗R j and ﬁnal performance evaluation, respectively. In all cases, the optimization wasperformed by means of the adaptive moment estimation (Adam) algorithm [55],with a ﬁxed learning rate of 0 . · − , batch size of 256 samples, and 200 epochs.The network training procedure was augmented with dropout regularization, withthe value of keep probability set to 0.7 to alleviate the eﬀects of overﬁtting.The convergence of optimization has been monitored in terms of empirical pre-diction accuracy (PA). Given a set of N L labelled DC samples { c l , γ l } N L l =1 , with OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE15 j ROI name Acronym X R j AD X R j CN Total

Table 1.

Selected ROIs and their associated number of input sam-ples in X R j CN and X R j AD (shown in the three rightmost columns). Notethat the abbreviations “wm”, “lh”, and “rh” stand for white matter,left and right hemisphere, respectively. γ l ∈ { , } , the latter can be deﬁned as(13) PA( θ ) = 1 − N L N L (cid:88) l =1 | ˆ γ l ( c l | θ ) − γ l | , where ˆ γ l ( c l | θ ) = 1, if f ( c l | θ ) ≥ . γ l ( c l | θ ) = 0, otherwise.In the context of network training, the PA criterion can be a useful indication ofthe convergence of stochastic gradient. On the other hand, when used on validationdata, PA provides a more objective measure of both optimality and generalizability in view of its virtual independence of the eﬀects of overﬁtting. Thus, the highervalues of validation PA are typically indicative of more accurate performance, ingeneral [27].5.5. Binary classiﬁcation.

Once the training is complete, and the values of θ ∗R j are available for each ROI R j , the optimal forward mappings f ( · | θ ∗R j ) can beused for stratiﬁcation of unclassiﬁed subjects. In particular, an unlabelled datasetof DTI measurements can be converted into M subsets of DC samples { c r } r ∈R j inaccordance with the selected ROIs. Subsequently, for each R j , its respective DCsamples can be converted into a set of PSIC scores { γ r } r ∈R j , with γ r = f ( c r | θ ∗R j ). In this case, the subject could be stratiﬁed based on(14) median r ∈R j ( γ r ) AD ≷ CN . , with the decision made independently for each j . In such case, the median PSICscore on the left side of (14) can be viewed as a cumulative regional marker of AD.Furthermore, for the same subject, the PSIC values in {{ γ r } r ∈R j } Mj =1 can servein the role of imaging contrast. Particularly, these values can be superimposedon structural MRI scans, thus oﬀering the possibility of visual exploration of thespatial variability of PSIC values.5.6. Reference methods.

The performance of the proposed classiﬁer has beencompared against region-based classiﬁcation based on multiple diﬀusion metricsand their combination. The selected metrics included four standard DTI mea-sures, viz. : mean diﬀusivity (MD), fractional anisotropy (FA), as well as diﬀusion linearity (CL) and planarity [14]. The list of metrics also included diﬀusion volume (DV), average sample diﬀusion (ASD), diﬀusion energy (DE), and the coeﬃcientof variation of diﬀusion (CVD), which can provide more general characterizationof diﬀusion dynamics, independent of DTI modelling [56]. It goes without saying,the above list is by no means exclusive, and other useful characteristics of cerebraldiﬀusion could have been included as well [57]. This being the case, however, be-sides covering both basic and advanced options, the selected metrics have had animportant advantage of estimability based on relatively small datasets, as it is thecase in the present study.In this paper, the region-based classiﬁcation was based on the likelihood ratiotest [58]. Speciﬁcally, for each metric µ i , with i = 1 , , . . . ,

8, its values inside R j were assumed to be independent realizations of a random variable, with itsprobability densities in the CN and AD subgroups given by p CN R j ( µ i ) and p AD R j ( µ i ),respectively. Then, given an unlabelled dataset, the observed values of { µ i ( r ) } r ∈R j were used to stratify the subject according to(15) (cid:80) r ∈R j log p AD R j ( µ i ( r )) (cid:80) r ∈R j log p CN R j ( µ i ( r )) AD ≷ CN η, for some decision variable η . For each µ i and R j , η had been set to its optimal valuethrough maximizing the area under the receiver operating characteristic (ROC)curve of its corresponding classiﬁer. In each case, the probability densities in (15)were assumed to be Gaussian, with their means and variances estimated from theavailable data, following a standard leave-one-out cross-validation procedure [59].An addition reference method was based on concurrent use of multiple diﬀusionmetrics within the framework of logistic regression (LR). Similarly to the proposed OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE17

Table 2.

PA values produced by diﬀerent classiﬁers for various R j .Note that, for each j , the two best results are outlined in bold. j ROI, R j MD FA CL CP DV ASD DE CVD LR

DNN

10 rh-MFWros

11 lh-PCUNW 0.72 0.59 0.59 0.44

12 rh-PCUNW 0.79 0.54 0.49 0.46 0.82 0.79

13 lh-SPW 0.56 0.64 0.62 0.49 0.54 0.56 0.56 0.54

14 rh-SPW

DNN-based classiﬁer, LR was used to map the input values of { µ i } i =1 into a scalarclassiﬁcation score [60]. 6. Experimental results

The main experimental results of this study are summarized in Tables 2 and 3,where the former shows the PA scores produced by diﬀerent classiﬁers for diﬀerentROIs (with the proposed method of classiﬁcation denoted by “DNN”). One can seethat the DNN-based classiﬁer considerably outperforms the reference approachesacross all ROIs. There is, however, a slight decrease in DNN’s performance for R j with j >

10, which is likely due to the reduction in the total number of trainingsamples available for these ROIs (as indicated in Table 1).As evidenced by Table 2, among the reference methods, the LR classiﬁer showedsuperior performance in less than half of the cases. In other cases, the accuracyof classiﬁcation was observed to depend on a particular metric/ROI combination.Interestingly enough, in about half of such cases, the most basic DTI metrics (suchas MD and FA) demonstrated better performance in comparison to more advancedoptions.In addition, leave-one-out cross-validation [59] was used to compare the proposedand reference classiﬁers in terms of their respective ROC curves. Speciﬁcally, theoptimality of classiﬁcation was assessed in terms of the area-under-curve (AUC)criterion, whose values are shown in Table 3. Here, the higher values of AUC indi-cate a higher accuracy of classiﬁcation, with the upper bound of 1 corresponding tothe case of perfect classiﬁcation. Thus, as demonstrated by the table, the proposed

Table 3.

AUC values produced by diﬀerent classiﬁers for various R j . Note that, for each j , the two best results are outlined in bold. j ROI, R j MD FA CL CP DV ASD DE CVD LR

DNN

10 rh-MFWros

11 lh-PCUNW 0.81 0.41 0.38 0.49 0.83 0.81

12 rh-PCUNW 0.85 0.45 0.42 0.62

13 lh-SPW 0.74 0.36 0.31 0.44 0.74 0.74 0.73 0.40

14 rh-SPW 0.82 0.27 0.23 0.54 0.82 0.82 0.82 0.31 classiﬁer oﬀers a notably better solution to the problem of CN/AD stratiﬁcationin comparison with the reference ones.Finally, Fig. 4 and Fig. 5 exemplify the use of PSIC in its capacity as imagingcontrast. It should be emphasized that, as opposed to diﬀusion metrics, the valuesof PSIC have no physiological interpretation. Instead, PSIC could be viewed asa pathology-speciﬁc risk indicator, whose higher values reﬂect a higher probabilityof the brain to be aﬀected by the disease. This property of PSIC is evident inSubplots A of Fig. 4 and Fig. 5 which show the PSIC-enhanced structural scansof two AD subjects. In this case, the spatial distribution of PSIC values appearsto be both intense and spatially pervasive. On the other hand, Subplots B of thesame ﬁgures show results for two CN subjects. One can see that, in this case, themagnitude and spatial spread of PSIC appear to be much more “diluted”.Due to the preliminary nature of the present paper, an in-detail exploration ofthe spatial characteristics of PSIC as well as its correlation with underlying brainanatomy and its possible etiological explanations are left beyond the scope of thisreport. However, in view of the empirical evidence provided by Fig. 4 and Fig. 5, itis reasonable to expect the proposed contrast mechanism to be “worth a thousandwords”, both as an adjunct to establishing a conﬁdent diagnosis and as a meansto facilitate post hoc discoveries.7.

Discussion and Conclusions

The main objective of this work has been to explore the potential of dMRI inapplication to early diagnosis of AD. As opposed to alternative means of medical

OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE19

Figure 4. (Subplot A) PSIC values of an AD subject superimposedon structural MRI scans. (Subplot B) PSIC values of a CN subjectsuperimposed on structural MRI scans.

Figure 5. (Subplot A) PSIC values of an AD subject superimposedon structural MRI scans. (Subplot B) PSIC values of a CN subjectsuperimposed on structural MRI scans.

OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE21 imaging, the physics of dMRI happens to be uniquely suited for the detection andassessment of microscopic damage to brain tissue, which is known to precede theensuing morphological changes in cortical grey matter due to AD [1, 61]. This iswhat endows dMRI with the unique ability to detect the presence of neurodegen-eration at its earliest pathological stages.The proposed method has been derived based on the concept of data-driveninference , which allows overcoming some critical limitations of model-based anal-ysis of diﬀusion signals, especially in situations with relatively small DTI datasets(i.e., when K (cid:46) direct correspondence betweenDTI data and a quantitative measure of health risks due to AD.The proposed DNN has been designed to process spatially localized segments ofDTI data, i.e., 4-D “diﬀusion cubes” corresponding to diﬀerent spatial coordinates.The local deﬁnition of input DC samples has served two important purposes. First,it gave the means to use the output scores as a spatially dependent “risk indicator”,which has been referred to as PSIC (in view of its purposive speciﬁcity to suspectedpathology). Second, the same locality made it possible to collect tens of thousandsof training samples from as few as only 40 DTI datasets. More importantly, thetraining data thus obtained have been suﬃcient for reliable optimization of thenetwork parameters. Needless to say, such a result would have been impossible toattain, had, in accordance with established practice, each of the datasets been dealtwith as a single observation. Thus, the proposed methodology can be particularlyadvantageous in situations when larger sets of training data are not available.It goes without saying; the present work has barely scratched the surface ofthe possibilities oﬀered through the combination of dMRI measurements and DL,with many important questions left yet to be addressed. In particular, althoughsuﬃcient as a “proof-of-concept” in comparative studies, the problem of DTI-basedstratiﬁcation of CN and AD subjects is clearly of limited practical importance.Thus, to further attest to its viability, the proposed method needs to be extendedto the classiﬁcation of multiple diagnostic groups. Along this direction of research,a particularly enticing question would be to ﬁnd correspondence between PSICand diﬀerent pathological stages of AD [61]. In a similar vein, it would also beinteresting to apply the proposed solution to the problem of classiﬁcation of varioussubtypes of MCI [12].Addressing more complex classiﬁcation problems is likely to require a propor-tional increase in the complexity of the DNN. Thus, in the case of simple CN/ADstratiﬁcation, it was neither necessary to involve more complex data nor to extendthe network architecture beyond a basic feed-forward conﬁguration. Moreover, theminimality of the employed DNN architecture (with the total number of trainableparameters equal to 50,376) has been instrumental in preventing overﬁtting during its training on the small set of 40 subjects. In more complicated clinical scenarios,however, the architecture could be extended through, e.g., processing the spatialneighbourhoods Ω L of diﬀerent sizes, with a corresponding increase in the numberof hidden layers. Another way to enhance the predictive power of the DNN wouldbe to take advantage of dMRI data acquired at multiple b -values (as has been donein the ADNI-III study). Needless to add, all such extensions would come with anincrease in the number of network parameters, which should be accompanied by a pro-rata increase in the size of training data.Finally, although working with SH coeﬃcients is not the only way to “pla-narize” the operation of spherical convolution, it oﬀers an important advantagein the context of between-site variability of classiﬁcation scores. Speciﬁcally, dueto discrepancies in the design and settings of MRI scanners, similar dMRI signalsacquired at diﬀerent sites are not uncommon to have notably diﬀerent spectralcharacteristics (which is often the reason behind conﬂicting reports in clinical ap-plications of dMRI). This problem has been addressed by a range of approaches,among which a particularly eﬀective way to counteract the eﬀects of between-sitevariability is lent by means of spectral data harmonization of the SH coeﬃcientsof DTI data, as detailed in [62]. This approach suggests a straightforward wayof combining the proposed method with the compatible means of normalization,which should render its performance consistent across diﬀerent clinical sites. Thisexpectation, however, still needs to be validated via proper experimental studies,which constitutes another objective of our future research. References [1] Anil Kumar, Arti Singh, and Ekavali. A review on Alzheimer’s disease pathophysiology andits management: An update.

Pharmacol. Rep. , 67(2):195 – 203, 2015.[2] 2018 Alzheimer’s disease facts and ﬁgures.

Alzheimer’s & Dementia , 14(3):367 – 429, 2018.[3] Giovanni B. Frisoni, Nick C. Fox, Cliﬀord R. Jack Jr, Philip Scheltens, and Paul M. Thomp-son. The clinical use of structural MRI in Alzheimer disease.

Nature Rev Neurol , 6:67, Feb-ruary 2010.[4] M. W. Weiner, D. P. Veitch, P. S. Aisen, L. A. Beckett, N. J. Cairns, R. C. Green, D. Harvey,C. R. Jack, W. Jagust, E. Liu, J. C. Morris, R. C. Petersen, A. J. Saykin, M. E. Schmidt,L. Shaw, J. A. Siuciak, H. Soares, A. W. Toga, and J. Q. Trojanowski. The Alzheimer’s Dis-ease Neuroimaging Initiative: A review of papers published since its inception.

AlzheimersDement , 8(1):1–68, February 2012.[5] P J Basser, J Mattiello, and D LeBihan. Estimation of the eﬀective self-diﬀusion tensorfrom the NMR spin echo.

J Magn Reson B , 103(3):247–54, Mar 1994.[6] D Le Bihan, E Breton, D Lallemand, P Grenier, E Cabanis, and M Laval-Jeantet. MR imag-ing of intravoxel incoherent motions: Application to diﬀusion and perfusion in neurologicdisorders.

Radiology , 161(2):401–7, Nov 1986.[7] Julio Acosta-Cabronero, Stephanie Alley, Guy B Williams, George Pengas, and Peter JNestor. Diﬀusion tensor metrics as biomarkers in Alzheimer’s disease.

PLoS One , 7:e49072,2012.

OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE23 [8] I K Amlien and A M Fjell. Diﬀusion tensor imaging of white matter degeneration inAlzheimer’s disease and mild cognitive impairment.

Neuroscience , 276:206–215, September2014.[9] Tijn M Schouten, Marisa Koini, Frank de Vos, Stephan Seiler, Mark de Rooij, Anita Lech-ner, Reinhold Schmidt, Martijn van den Heuvel, Jeroen van der Grond, and Serge A R BRombouts. Individual classiﬁcation of Alzheimer’s disease with diﬀusion magnetic resonanceimaging.

NeuroImage , 152:476–481, May 2017.[10] Bing Zhang, Yun Xu, Bin Zhu, and Kejal Kantarci. The role of diﬀusion tensor imaging indetecting microstructural changes in prodromal Alzheimer’s disease.

CNS Neurosci Ther ,20:3–9, January 2014.[11] Chantel D Mayo, Mauricio A Garcia-Barrera, Erin L Mazerolle, Lesley J Ritchie, John DFisk, Jodie R Gawryluk, and ADNI. Relationship between DTI metrics and cognitive func-tion in Alzheimer’s disease.

Front Aging Neurosci , 10:436, 2018.[12] R. C. Petersen. Mild cognitive impairament.

Continuum , 22(2):404–418, 2016.[13] Emily C Edmonds, Carrie R McDonald, Anisa Marshall, Kelsey R Thomas, Joel Eppig,Alexandra J Weigand, Lisa Delano-Wood, Douglas R Galasko, David P Salmon, Mark WBondi, and ADNI. Early versus late MCI: Improved MCI staging using a neuropsychologicalapproach.

Alzheimers Dement , Feb 2019.[14] S Mori and J-Donald Tournier.

Introduction to diﬀusion tensor imaging and higher ordermodels . Academic Press, 2nd edition edition, 2014.[15] Yaniv Assaf and Ofer Pasternak. Diﬀusion tensor imaging (dti)-based white matter mappingin brain research: A review.

J Mol Neurosci , 34(1):51–61, 2008.[16] D. K. Jones and M. Cercignani. Twenty-ﬁve pitfalls in the analysis of diﬀusion MRI data.

NMR Biomed. , 23:803–820, 2010.[17] M Bozzali, A Falini, M Franceschi, M Cercignani, M Zuﬃ, G Scotti, G Comi, and M Filippi.White matter damage in Alzheimer’s disease assessed in vivo using diﬀusion tensor magneticresonance imaging.

J Neurol Neurosurg Psychiatry , 72(6):742–6, Jun 2002.[18] W. Lee, B. Park, and K. Han. Classiﬁcation of diﬀusion tensor images for the early detectionof Alzheimer’s disease.

Comput. Biol. Med. , 43:1313–1320, October 2013.[19] Stefan J Teipel, Michel J Grothe, Massimo Filippi, Andreas Fellgiebel, Martin Dyrba, Gio-vanni B Frisoni, Thomas Meindl, Arun L W Bokde, Harald Hampel, Stefan Kloppel, Karl-heinz Hauenstein, and EDSD study group. Fractional anisotropy changes in Alzheimer’sdisease depend on the underlying ﬁber tract architecture: A multiparametric DTI studyusing joint independent component analysis.

J. Alzheimers Dis. , 41:69–83, 2014.[20] Kara M Hawkins, Aman I Goyal, and Lauren E Sergio. Diﬀusion tensor imaging correlates ofcognitive-motor decline in normal aging and increased Alzheimer’s disease risk.

J. AlzheimersDis. , 44:867–878, 2015.[21] Chantel D Mayo, Erin L Mazerolle, Lesley Ritchie, John D Fisk, Jodie R Gawryluk, andADNI. Longitudinal changes in microstructural white matter metrics in Alzheimer’s disease.

NeuroImage , 13:330–338, 2017.[22] E Englund, A Brun, and C Alling. White matter changes in dementia of Alzheimer’s type:Biochemical and neuropathological correlates.

Brain , 111 ( Pt 6):1425–39, Dec 1988.[23] Hui Zhang, Torben Schneider, Claudia A Wheeler-Kingshott, and Daniel C Alexander.NODDI: Practical in vivo neurite orientation dispersion and density imaging of the humanbrain.

NeuroImage , 61(4):1000–1016, Jul 2012.[24] N Colgan, B Siow, J M O’Callaghan, I F Harrison, J A Wells, H E Holmes, O Ismail,S Richardson, D C Alexander, E C Collins, E M Fisher, R Johnson, A J Schwarz, Z Ahmed,M J O’Neill, T K Murray, H Zhang, and M F Lythgoe. Application of neurite orientation dispersion and density imaging (NODDI) to a tau pathology model of Alzheimer’s disease.

NeuroImage , 125:739–744, Jan 2016.[25] Catherine F. Slattery, Jiaying Zhang, Ross W. Paterson, Alexander J. M. Foulkes, LauraMancini, David L. Thomas, Marc Modat, Nicolas Toussaint, David M. Cash, John S. Thorn-ton, Daniel C. Alexander, Sebastien Ourselin, Nick C. Fox, Hui Zhang, and Jonathan M.Schott. Neurite orientation dispersion and density imaging (NODDI) in young-onsetAlzheimer’s disease and its syndromic variants.

Alzheimers Dement , 11(7):P91, 2015.[26] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning.

Nature , 521:436–444, 2015.[27] I. Goodfellow, Y. Bengio, and A. Courville.

Deep Learning . MIT Press, 2016.[28] N Bhagwat, J Pipitone, AN Voineskos, and MM Chakravarty. An artiﬁcial neural networkmodel for clinical score prediction in Alzheimer’s disease using structural neuroimagingmeasures.

J Psychiatry Neurosci , 44(2):1–15, 2019.[29] Rachna Jain, Nikita Jain, Akshay Aggarwal, and D Jude Hemanth. Convolutional neu-ral network based Alzheimer’s disease classiﬁcation from magnetic resonance brain images.

Cogn. Syst. Res. , 2019.[30] Hongfei Wang, Yanyan Shen, Shuqiang Wang, Tengfei Xiao, Liming Deng, Xiangyu Wang,and Xinyan Zhao. Ensemble of 3D densely connected convolutional network for diagnosis ofmild cognitive impairment and Alzheimer’s disease.

Neurocomputing , 333:145–156, 2019.[31] Jun-Sik Choi, Eunho Lee, and Heung-Il Suk. Regional abnormality representation learningin structural MRI for AD/MCI diagnosis. In

International Workshop on Machine Learningin Medical Imaging , pages 64–72. Springer, 2018.[32] Chunfeng Lian, Mingxia Liu, Jun Zhang, and Dinggang Shen. Hierarchical fully convolu-tional network for joint atrophy localization and Alzheimer’s disease diagnosis using struc-tural MRI.

IEEE Trans Pattern Anal Mach Intell , 2018.[33] Mostafa Mehdipour Ghazi, Mads Nielsen, Akshay Pai, M Jorge Cardoso, Marc Modat,S´ebastien Ourselin, Lauge Sørensen, Alzheimer’s Disease Neuroimaging Initiative, et al.Training recurrent neural networks robust to incomplete data: Application to Alzheimer’sdisease progression modeling.

Med Image Anal , 53:39–46, 2019.[34] Garam Lee, Kwangsik Nho, Byungkon Kang, Kyung-Ah Sohn, and Dokyoon Kim. Predict-ing Alzheimer’s disease progression using multi-modal deep learning approach.

ScientiﬁcReports , 9(1), 2019.[35] Mingliang Wang, Daoqiang Zhang, Dinggang Shen, and Mingxia Liu. Multi-task exclusiverelationship learning for Alzheimer’s disease progression prediction with longitudinal data.

Med Image Anal , 53:111–122, 2019.[36] Maryamossadat Aghili, Solale Tabarestani, Malek Adjouadi, and Ehsan Adeli. Predictivemodeling of longitudinal data for Alzheimer’s disease diagnosis using RNNs. In

InternationalWorkshop on Predictive Intelligence in Medicine , pages 112–119. Springer, 2018.[37] Weiming Lin, Tong Tong, Qinquan Gao, Di Guo, Xiaofeng Du, Yonggui Yang, Gang Guo,Min Xiao, Min Du, Xiaobo Qu, et al. Convolutional neural networks-based MRI image anal-ysis for the Alzheimer’s disease prediction from mild cognitive impairment.

Front Neurosci ,12, 2018.[38] Jeremy Kawahara, Colin J Brown, Steven P Miller, Brian G Booth, Vann Chau, Ruth EGrunau, Jill G Zwicker, and Ghassan Hamarneh. BrainNetCNN: Convolutional neural net-works for brain networks; towards predicting neurodevelopment.

NeuroImage , 146:1038–1049, 2017. We propose BrainNetCNN, a convolutional neural network (CNN) frameworkto predict clinical neurodevelopmental outcomes from brain networks.[39] Tao Zhou, Kim-Han Thung, Xiaofeng Zhu, and Dinggang Shen. Eﬀective feature learn-ing and fusion of multimodality data using stage-wise deep neural network for dementiadiagnosis.

Hum Brain Mapp , 40(3):1001–1016, 2019.

OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE25 [40] Parisa Forouzannezhad, Alireza Abbaspour, Chunfei Li, Mercedes Cabrerizo, and MalekAdjouadi. A deep neural network approach for early diagnosis of mild cognitive impairmentusing multiple features. In , pages 1341–1346. IEEE, 2018.[41] Donghuan Lu, Karteek Popuri, Gavin Weiguang Ding, Rakesh Balachandar, andMirza Faisal Beg. Multimodal and multiscale deep neural networks for the early diagno-sis of Alzheimer’s disease using structural MR and FDG-PET images.

Scientiﬁc Reports ,8(1), 2018. In this paper, we propose a novel deep-learning-based framework to discriminateindividuals with AD utilizing a multimodal and multiscale deep neural network.[42] Vladimir Golkov, Alexey Dosovitskiy, Jonathan I Sperl, Marion I Menzel, Michael Czisch,Philipp S¨amann, Thomas Brox, and Daniel Cremers. Q-space deep learning: Twelve-foldshorter and model-free diﬀusion MRI scans.

IEEE Trans Med Imaging , 35(5):1344–1351,2016. We demonstrate how deep learning, a group of algorithms based on recent advances inthe ﬁeld of artiﬁcial neural networks, can be applied to reduce diﬀusion MRI data processingto a single optimized step.[43] Simon Koppers and Dorit Merhof. Direct estimation of ﬁber orientations using deep learningin diﬀusion imaging. In

International Workshop on Machine Learning in Medical Imaging ,pages 53–60. Springer, 2016. We present a novel approach for estimating the ﬁber orientationdirectly from raw data, by converting the model ﬁtting process into a classiﬁcation problembased on a convolutional Deep Neural Network, which is able to identify correlated diﬀusioninformation within a single voxel.[44] McCrackin L. Crowley M. Rathi Y. Maryam, S. and O. Michailovich. Application ofprobabilistically-weighted graphs to image-based diagnosis of alzheimers disease using dif-fusion mri. In

Proceed SPIE Medical Imaging , pages 1–10, 2017.[45] D. S. Tuch. Q-ball imaging.

Magn. Reson. Med. , 52:1358–1372, 2004.[46] A. Poulenard, M. Rakotosaona, Y. Ponty, and M. Ovsjanikov. Eﬀective rotation-invariantpoint cnn with spherical harmonics kernels. In , pages 47–56, 2019.[47] Ayman Mukhaimar, Ruwan B. Tennakoon, C. Y. Lai, R. Hoseinnezhad, and A. Bab-Hadiashar. Robust object classiﬁcation approach using spherical harmonics.

ArXiv ,abs/2009.01369, 2020.[48] S. Fitzgibbons R. Deriche M. Descoteaux, E. Angelino. Apparent diﬀusion coeﬃcients fromhigh angular resolution diﬀusion images: Estimation and applications.

Magn. Reson. Med. ,56(2):395–410, 2006.[49] P. J. Basser S. Marenco C. Pierpaoli G. K. Rohde, A. S. Barnett. Comprehensive ap-proach for correction of motion and distortion in diﬀusion-weighted mri.

Magn. Reson.Med. , 51:103–114, 2004.[50] D. K. Jones. The eﬀect of gradient sampling schemes on measures derived from DiﬀusionTensor MRI: A Monte Carlo study.

Magn. Reson. Med. , 51:807–815, 2004.[51] M. Lazar A. S. Field A. L. Alexander, J. E. Lee. Diﬀusion tensor imaging of the brain.

Neurotherapeutics , 4:316–329, 2007.[52] Matthew F Glasser, Stamatios N Sotiropoulos, J Anthony Wilson, Timothy S Coalson,Bruce Fischl, Jesper L Andersson, Junqian Xu, Saad Jbabdi, Matthew Webster, Jonathan RPolimeni, David C Van Essen, Mark Jenkinson, and WU-Minn HCP Consortium. The mini-mal preprocessing pipelines for the human connectome project.

Neuroimage , 80:105–24, Oct2013.[53] Schmansky N.J. Rosas H.D. Fischl B. Reuter, M. Within-subject template estimation forunbiased longitudinal image analysis.

Neuroimage , 61(5):1402–1418, 2012. [54] Yuankai Huo, Shunxing Bao, Prasanna Parvathaneni, and Bennett A Landman. Improvedstability of whole brain surface parcellation with multi-atlas segmentation.

Proc SPIE IntSoc Opt Eng , 10574, Mar 2018.[55] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv.org , 1412.6980,2017.[56] S. Aja-Fernandez, T. Pieciak, A. Tristan-Vega, G. Vegas-Sanchez-Ferrero, V. Molinaf, andR. de Luis-Garcia. Scalar diﬀusion-MRI measures invariant to acquisition parameters: Aﬁrst step towards imaging biomarkers.

Magn. Reson. Imag. , 54:194–213, 2018.[57] D. S. Novikov, J. Veraart, I. O. Jelescu, and E. Fieremans. Rotationally-invariant mapping ofscalar and orientational metrics of neuronal microstructure with diﬀusion MRI.

NeuroImage ,174(1):518–538, 2018.[58] Philippe Rigollet Xin Tong. Neyman-Pearson classiﬁcation, convexity and stochastic con-straints.

J. Mach. Learn. Res. , 12:2831–2855, 2011.[59] S. Konishi and M. Honda. Bootstrap methods for error rate estimation in discriminantanalysis.

Japanese Society of Applied Statistics , 21(2):67–100, 1992.[60] A. Agresti.

Categorical Data Analysis . Wiley-Interscience, New York, 2002.[61] H. Braak and E. Braak. Neuropathological stageing of Alzheimer-related changes.

ActaNeuropathol , 82:239–259, 1991.[62] Ning L. Savadjiev P. et al. Mirzaalian, H. Multi-site harmonization of diﬀusion mri data ina registration framework.