Towards a quantitative assessment of neurodegeneration in Alzheimer's disease
TTOWARDS A QUANTITATIVE ASSESSMENT OFNEURODEGENERATION IN ALZHEIMER’S DISEASE
OLEG MICHAILOVICH AND RINAT MUKHOMETZIANOV
The Department of Electrical and Computer Engineering, University ofWaterloo, Waterloo, ON N2L 3GL, Canada
Submitted to
IEEE Access
Abstract
Alzheimer’s disease (AD) is an irreversible neurodegenerative disorder that pro-gressively destroys memory and other cognitive domains of the brain. While effec-tive therapeutic management of AD is still in development, it seems reasonable toexpect their prospective outcomes to depend on the severity of baseline pathology.For this reason, substantial research efforts have been invested in the developmentof effective means of non-invasive diagnosis of AD at its earliest possible stages.In pursuit of the same objective, the present paper addresses the problem of thequantitative diagnosis of AD by means of Diffusion Magnetic Resonance Imaging(dMRI). In particular, the paper introduces the notion of a pathology specific imag-ing contrast (PSIC), which, in addition to supplying a valuable diagnostic score,can serve as a means of visual representation of the spatial extent of neurodegen-eration. The values of PSIC are computed by a dedicated deep neural network(DNN), which has been specially adapted to the processing of dMRI signals. Onceavailable, such values can be used for several important purposes, including strat-ification of study subjects. In particular, experiments confirm the DNN-basedclassification can outperform a wide range of alternative approaches in applicationto the basic problem of stratification of cognitively normal (CN) and AD subjects.Notwithstanding its preliminary nature, this result suggests a strong rationale forfurther extension and improvement of the explorative methodology described inthis paper.
Keywords: diffusion MRI, deep learning, convolutional neural networks, earlydiagnosis, Alzheimer’s disease. 1.
Introduction
The world population is steadily ageing, and with advanced age comes a higherrisk of dementia. At the present time, the dementia of Alzheimer’s type, or
Alzheimer’s disease (AD), accounts for almost two-thirds of all prevalent cases of a r X i v : . [ ee ss . I V ] N ov TOWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE dementia in the elderly. AD is an irreversible, progressive disease that slowly de-stroys memory and other cognitive domains, eventually leaving the patient bedrid-den. The course of AD pathology is likely to span around two to three decades [1].Unfortunately, by the time when the first symptoms emerge, it is usually too lateto save the brain. For this reason, over the last two decades, considerable effortshave been directed towards finding effective means of the earliest possible diagnosisof AD [2].The current arsenal of methods for quantitative diagnosis of AD is impressivelybroad, ranging from advanced proteomics to state-of-the-art neuroimaging. In thelatter case, particularly promising results have been demonstrated by both nuclearand Magnetic Resonance Imaging (MRI) [2–4].Among various methods of MRI, diffusion
MRI (dMRI) is exceptional for itsunique ability to generate imaging contrast based on the microscopic (rather thanmacroscopic) properties of neurological tissue, which makes it singularly fit for thetask of detecting the earliest signs of neurodegeneration [5,6]. This ability of dMRIhas been investigated in a number of studies [7–11], which predominantly focusedon the problem of classification (aka stratification ) of three groups of subjects, viz. cognitively normal (CN) subjects, AD subjects, and the subjects diagnosed with mild cognitive impairment (MCI). Note that the latter is broadly recognized as aprodromal condition that frequently heralds the onset of “full-blown” AD [12, 13].In virtually all earlier studies on dMRI-based stratification of AD, MCI, and CNsubjects, the protocol of choice has been
Diffusion Tensor Imaging (DTI) [5, 6, 14].The latter is known to provide an adequate characterization of diffusion dynamicsin the white matter associated with non-crossing bundles of neural fibre tracts.Unfortunately, its dependence on Gaussian modelling curbs the ability of DTI todelineate more complex diffusion processes, e.g., within crossing fibres [15, 16].It is thus no wonder that, even though many DTI metrics have demonstratedconsiderable sensitivity across multiple brain regions, the most consistent findingshave been confined to the corpus callosum [7–9, 11, 17–21]. At the same time, fewDTI studies have been able to stratify CN, AD, and MCI subjects based on DTIanalysis of the medial-temporal white matter, which is known to be abundant inboth crossing and “kissing” fibre tracts. The problem here has obviously beenin the intrinsic modelling limitations of DTI, which is rather discouraging out-turn in view of the known involvement of the above region in the early stages ofneuropathological AD [22].The limitations of DTI have prompted the development of more advanced meth-ods of dMRI, among which Neurite Orientation Dispersion and Density Imaging(NODDI) is considered to be one of the most comprehensive approaches to quan-titative characterization of cerebral diffusion [23]. Naturally, several studies haveinvestigated the applicability of NODDI to early diagnosis of AD. However, whentrying to correlate the spatial distribution of NODDI metrics with histopathologi-cal evidence of AD, it was observed in [24] that NODDI offered somewhat marginal
OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE3 advantages over DTI. A similar conclusion was reached in [25], in which NODDIwas used in application to diagnosis of young-onset AD.Needless to say, DTI and NODDI are only two specific examples among a widerange of methods available under the umbrella of dMRI. However, regardless oftheir specific modelling assumptions, all these methods share a tendency to pro-duce more accurate results at the expense of higher complexity of parametrization.At the same time, the use of parametric spaces of progressively higher dimension-ality requires a proportional increase in the number of data points, which mightnot always be possible due to practical constraints. Thus, given the melange ofavailable protocols and models, the question of which of the existing dMRI methodsis “the best” for early diagnosis of AD appears to be rather non-trivial.Before going any further, it is important to note that not all methods of dMRIare equally feasible from the viewpoint of clinical implementation. In particular,for practical reasons, the typical duration of a clinical dMRI examination rarelyexceeds 15-20 mins. This constraint puts a strict upper bound on the amountof acquirable data, and, consequently, on the maximal order of numerically stableparametrization. For this reason, most of the studies on the dMRI-based diagnosisof AD have predominantly relied on DTI. Despite its numerous limitations, DTIremains “the method of choice” in many ongoing studies thanks to the minimalityof its technical requirements and its time efficiency.It is also worthwhile noting that, in quotidian exchanges, the term “DTI” is usu-ally used in two different connections. In particular, it could refer to the Gaussian(i.e., 2 nd -order tensor) diffusion model which lies in the foundation of DTI analysis.This model is described by a total of seven parameters , which could be estimatedbased on a minimum of seven independent measurements. Although the practicalnumber of diffusion measurements normally exceeds this low bound, their acqui-sition is still rare to require more than 15 mins of scanning time, which makessuch imaging protocols clinically feasible . It is probably due to the association be-tween the low dimensionality of DTI modelling and its dependence on a relativelysmall number of measurements that the term “DTI” has also been used to refer todiffusion data comprised of a comparatively small number of diffusion encodings.Hence, to avoid misconception, the terms DTI data and DTI modelling need to bedistinguished.For the reasons explained above, virtually all clinical dMRI data could be char-acterized as “DTI”. From the practical point of view, therefore, it seems rea-sonable to restrict the scope of available dMRI models to only those which couldbe reliably fitted based on clinical DTI data. Unfortunately, this would havemainly left us with low-parametric models of a descriptive power similar to that ofDTI. This out-turn reveals a critical methodological predicament, where the use The DTI model is parametrized by a symmetric 3 × TOWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE of more advanced models is fraught with estimation artefacts, whereas suppress-ing such artefacts by including additional measurements is precluded by practicalconstraints. In such conditions, the use of low-parametric dMRI models becomesthe only option, which, unfortunately, comes at the cost of reduced accuracy.A particularly promising way to improve over the performance of low-parametricdMRI modelling is offered by data-driven inference and, in particular, by its recentrealization in the theory of
Deep Learning (DL) [26, 27]. The modern methods ofDL make it possible to discern subtle and complex dependencies in experimentaldata, which would have been impossible to describe in mechanistic terms. In thiscase, the actual (unknown) model is replaced by its phenomenological representa-tion in terms of a
Deep Neural Network (DNN) that is capable of “learning” to predict future outcomes based on past observations.Although the idea of using DL in the imaging-based diagnosis of AD is notoriginal, much of the work along this direction has mainly focused on structural
MRI. The latter has proven instrumental in assessing the cerebral atrophy due toAD in both cortical and subcortical regions of the brain [28–32]. In longitudinalstudies, the substantial potential for prediction of ensuing cognitive decline inMCI and AD subjects has been demonstrated through the use of recurrent DNNs[33–38], both with and without augmenting the structural MRI data with othersources of diagnostic information [34, 39–41]. However, the application of DLto a dMRI-based diagnosis of AD remains a barely tapped in area of research,notwithstanding the abundant evidence of its successful use in other applications[42, 43]. Accordingly, the primary goal of this work has been to leverage thecombined power of dMRI and DL towards the early diagnosis of AD.The key idea of the proposed methodology is built around the notion of pathologyspecific imaging contrast (PSIC). Similarly to other imaging-based markers, PSICis a scalar score that indicates the presence of a suspected pathology (such as, e.g.,AD). However, instead of characterising an entire dataset, the values of PSIC arecomputed at each spatial coordinate within a specified region-of-interest (ROI). Inthis way, PSIC can serve as a local indicator of the degree to which AD pathologyaffects various anatomical sites.Once available, the values of PSIC can be converted into regional statistics,which can, in turn, be used for the purpose of subject classification. In thecase of multiple ROIs, the performance of such classification could be furtherimproved through exploring the interdependencies between different regional sta-tistics, which are known to undergo sizeable changes in AD [44]. Furthermore, thelocal PSIC values can be displayed in superposition with anatomical images in theform of a contrast . In this way, PSIC also offers a means for visual analysis of theextent and severity of suspected pathology.In this work, the values of PSIC have been generated by a dedicated DNN.Although the overall architecture of the DNN is based on a standard feed-forwardconfiguration, its principal operations (such as, e.g., convolution, pooling, etc.)
OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE5 have been properly adjusted to the physical and analytical properties of diffusionsignals. The adjustment made it possible to minimize the number of networkparameters and, consequently, to reduce the amount of training data substantially.In particular, the results of this paper have been obtained based on only 40 dMRIdatasets.It is important to emphasize that the results reported here should be regardedas neither exhaustive nor final, but rather describing a novel concept which admitsmany possible extensions and improvements. For this reason, no attempts havebeen made to “push the limits” of the proposed method by testing its performancein deliberately difficult scenarios. Instead, the present experimental study has beendeliberately limited to the basic task of stratification of CN and AD subjects, whilefocusing instead on a comparative performance between the proposed method anda range of existing alternatives.The remainder of this paper is organized as follows. The principal idea of theproposed method is described in Section 2, while Section 3 introduces some nec-essary technical preliminaries, followed by a description of the proposed networkdesign in Section 4. Subsequently, Section 5 provides details on the experimentalsetup, study data, and performance metrics used in this work, while experimentalresults are summarized in Section 6. Finally, Section 7 concludes the paper witha discussion of its main findings along with an outline of possible directions forfuture research. 2.
Principal idea and data structure
The experimental study of this work has been based on the dMRI data availablethrough the continuing efforts of the Alzheimer’s Disease Neuroimaging Initiative(ADNI) . Over the last few years, the ADNI database has been extended to includedMRI data acquired by means of relatively advanced protocols. For the purposesof this paper, however, we used the dMRI data collected during an earlier phaseof the ADNI study – known as ADNI-II. At that time, the data acquisition reliedon more standard protocols which are more common in present-day clinical DTI.Thus, working with the earlier data should provide a more objective demonstrationof the practical value of the proposed methodology.In what follows, we consider a typical setting in which a DTI dataset consists of K diffusion-encoded MRI volumes which encode the values of apparent diffusivityalong different spatial orientations. Such data are usually acquired at a fixed levelof diffusion sensitization controlled by the b -value [14] . In particular, in the case ofADNI-II data, the number of diffusion encodings was set to K = 41, with b = 1000s/mm . For more details on various ADNI programs, visit adni.loni.usc.edu . For relatively large K , this acquisition scheme is also known as High Angular ResolutionDiffusion Imaging (HARDI) [45].
TOWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE
Formally, DTI signals can be considered to be functions of both spatial andspherical coordinates. In practice, the spatial coordinate is sampled over a regularCartesian grid Ω := { n = ( n , n , n ) | ≤ n i < N i , i = 1 , , } which representsthe (anatomical) image domain. On the other hand, the process of diffusion en-coding restricts the signal values to K points { u k | (cid:107) u k (cid:107) = 1 } Kk =1 over the unitsphere S in the diffusion q -space. Thus, from a practical point of view, a DTIdataset can be viewed as a 4-D numerical array of size N × N × N × K , withthree “anatomical” and one “diffusion” dimension.For some r ∈ Ω, let N r be a symmetric neighbourhood of r consisting of all n such that (cid:107) n − r (cid:107) ∞ = max ≤ i ≤ | n i − r i | ≤ L for some (small) radius L > . Theterm “diffusion cube (DC) at r ” will be used below to refer to a segment of DTIdata spatially restricted to N r . Formally, the DC at r is defined as s r = s r ( n , u k ) := { s ( n , u k ) | n ∈ N r , ≤ k ≤ K } , where s ( n , u k ) denotes the signal value at position n ∈ N r and diffusion-encodingorientation u k . Thus, similarly to the entire dataset, s r can be viewed as a 4-Darray of size M × M × M × K , with M = 2 L + 1.As the next step, we refer to Fig. 1 that illustrates various stages of comput-ing the PSIC values in application to the stratification of CN and AD subjects.Suppose, for a given study subject (Subplot A); we are interested in making aninference based on DTI data confined to some prescribed ROI R ⊂
Ω (SubplotB). Then, for each r ∈ R , such that N r ⊂ R (Subplot C), its associated DC s r (Subplot D) is passed to a dedicated DNN f ( s r | θ ) that yields a positive PSICscore 0 ≤ γ R ( r ) ≤
1. For the sake of argument, at the moment, the networkparameters θ are assumed to be set to their optimal values θ ∗R . In this case, byvirtue of network design and training, γ R ( r ) is set up to scale proportionally tothe likelihood of the neural tissue at r ∈ R to be affected by AD. Thus, in par-ticular, the CN cases would be associated with relatively low values of PSIC (e.g., γ R ( r ) = 0 . γ R ( r ) = 0 . γ R ( r ) computed at all r ∈ R can also be used as an imag-ing contrast, which can be superimposed over a structural display of underlyinganatomy. Such contrast is expected to be comparatively weak and scarce in CN(Subfigure F) while being intense and densely concentrated in AD (Subfigure G).In practice, the optimal network parameters are estimated from training data.To this end, we hypothesize that, for a proper choice of R , the effects of neurode-generation are manifested in some hidden diffusion characteristics which perseverethroughout the entire ROI. Moreover, such hidden characteristics are likely to beshared between different subjects within the same diagnostic group.Under the above hypothesis, the training data can be formed by stockpiling theDCs from all voxels r within a specified ROI across all subjects within the same For example, with L = 1, N ( r ) represents a standard 27-connected neighbourhood of r . OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE7
Figure 1.
Computation of PSIC: (A) subject under examination;(B) DTI dataset with a specified ROI
R ∈
Ω; (C) “inner” point r and its neighbourhood N r ; (D) corresponding DC s r ; (F) and (G)regional PSIC values in the case of CN and AD (G), respectively.diagnostic group. Specifically, in application to the binary problem of stratificationof CN and AD subjects, such training data would consist of a set of DCs obtainedfrom all available CN subjects, on the one hand, and a similar set of DCs comingfrom all available AD subjects, on the other hand.More formally, let N CN and N AD be the number of subjects in the CN and ADgroups, respectively. Also, let { s CN ,i } N CN i =1 (resp., { s AD ,i } N AD i =1 ) denote the diffusiondatasets collected from the CN (resp., AD) subjects. In this case, the trainingdata would consist of two subsets of samples defined as X R CN := (cid:8) { s CN ,i r } r ∈R (cid:9) N CN i =1 , X R AD := (cid:8) { s AD ,i r } r ∈R (cid:9) N AD i =1 , along with their corresponding (target) labels γ = 0 and γ = 1, respectively. Aconceptual illustration of the process of formation of the training data is shown inFig. 2.In general, the above procedure can be applied to M different ROIs {R j } Mj =1 ,in which case one would have M different training sets, {X R j CN } Mj =1 and {X R j AD } Mj =1 .Each such set could then be used to estimate the optimal parameters θ (cid:63) R j for itsrelated R j independently. TOWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE
Figure 2.
Conceptual illustration of the process of formation ofthe training data for a specified ROI.3.
Analytical tools and principal operations
Composite Convolution.
At a conceptual level, the DNN proposed in thispaper is based on a standard feed-forward configuration, consisting of a typicalsuccession of convolutional operations interleaved with nonlinearities and resizing.However, when defining such operations, it would be amiss to disregard the phys-ical properties of diffusion signals which offer a number of important advantages.As discussed in Section 2, at its input, the network receives a single DC, whichcan be viewed as a 4-D numeric array of size M × M × M × K . Alternatively, theDC could be considered to be an array of the discrete values of a diffusion signal S ( x , u ) measured over Ω L = { n | (cid:107) n (cid:107) ∞ ≤ L } and spherical orientations { u k } Kk =1 .Formally, S ( x , u ) can be defined over the combined domain ¯Ω := Conv(Ω L ) × S ,where Conv(Ω L ) stands for the convex hull of Ω L . Thus, in order to incorporateconvolution into the network design, one needs a proper definition of such operationfor the signals defined over ¯Ω.Generally speaking, the convolution of ¯Ω-domain signals is defined over the spe-cial Euclidean group SE (3), working with which might be excessively complicatedfrom a computational point of view. In what follows, we introduce a particular def-inition of this operation, which offers a number of important practical advantages. OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE9
In this way, the present results expand on the simpler case of spherical signals over S , which have been successfully addressed in a number of recent studies [46, 47].The proposed method relies on a simplified interpretation of SE (3) convolution,which is derived under the assumptions of separability and zonality . In particular,the former requires the convolution kernel to be a separable product of a spatially-dependent and a spherically-dependent component. The assumption of zonality,on the other hand, requires the spherical component to be a zonal function, whichimplies its invariance to azimuthal rotations. Such functions admit representationin terms of Legendre polynomials { p n } ∞ n =0 as given by(1) ξ ( t ) = ∞ (cid:88) n =0 n + 14 π ξ n p n ( t ) , t ∈ [ − , +1] , with ξ n = 2 π (cid:82) − ξ ( t ) p n ( t ) dt known as the Legendre coefficient of degree n .The use of zonal kernels considerably simplifies the definition of spherical con-volution. To see that, we first assume that, at each x ∈ Ω L , S ( x , u ) can be closelyapproximated by its truncated Spherical Harmonic (SH) expansion of the form(2) S ( x , u ) (cid:39) n max (cid:88) n =0 , ,... n (cid:88) l = − n c n,l ( x ) Y n,l ( u ) , with Y n,l ( u ) and c n,l ( x ) being the l -th order SH of degree n and its correspondingexpansion coefficient, respectively. Note that, due to the spherical symmetry ofDTI signals, the summation in (2) is restricted to the even values of n , in which casethe total number of expansion coefficients is equal to P = 0 . n max + 1)( n max + 2),for a predefined maximal degree n max > x ∈ Ω L , the availability of the SH coefficients c n,l ( x ) and the knowledgeof the Legendre coefficients ξ n of the zonal kernel ξ ( u ) can be used to define thespherical convolution of S ( x , · ) and ξ according to(3) ( S ( x , · ) ∗ u ξ ) ( u ) = n max (cid:88) n =0 , ,... n (cid:88) l = − n ˜ c n,l ( x ) Y n,l ( u ) , where ˜ c n,l ( x ) = ξ n c n,l ( x ) , for all n = 0 , , . . . , n max and | l | ≤ n .Next, we note that at each x ∈ Ω L , c n,l ( x ) is just a real vector of length P . Whencomputed over the entire Ω L ; all such vectors can be conveniently assembled into a4-D array c of size M × M × M × P . Alternatively, c can be viewed as a collectionof P volumetric “coefficient images” c n,l ∈ R M × M × M , where each c n,l comprisesthe spatially-dependent values of the SH coefficient of degree n and order l .It is also convenient to split c into ( n max + 2) / c k of 3-D arrays ac-cording to the value of their degree n , i.e., c k = { c k,l } kl = − k , with k = 0 , , . . . , n max .In this case, the spherical convolution in (3) amounts to scaling each “coefficient image” in c n by the same scalar ξ n . While computationally efficient, however, thisoperation has the disadvantage of ignoring the spatial behaviour of input signals.The coordinate-wise spherical convolution in (3) can be generalized into a com-posite spatial-spherical convolution as follows. For n ≤ n max , let w n = { w kn,l } | l |≤ n, | k |≤ n be a set of P n spatial-domain filters of size J × J × J . Then, based on the as-sumption of separability, the operation of n th - band spatio-spherical filtering can bedefined as(4) ˆ c n,l = n (cid:88) k = − n c n,k ∗ r w kn,l , for all | l | ≤ n, where ∗ r stands for discrete convolution in the spatial domain. The definition in(4) suggests that, for each spherical order l , the output ˆ c n,l is computed as a linear convolutional combination of the “input images” in c n with filters w − nn,l , ..., w n,l , ..., w nn,l . In this way, the action of (4) extends across the entire Ω L , as opposed tothe case of (3).Extending the n th -band spatio-spherical filtering in (4) to all n = 0 , , . . . , n max gives rise to a convolution-type operator that takes effect across both of the spher-ical and spatial domains. In what follows, this operation will be referred belowto as composite convolution . Formally, for a 4-D array of SH coefficients c and abank of filters w = { w , w , . . . , w n max } , their composite convolution(5) ˜ c = c ∗ u , r w is defined according to (4) for each n = 0 , , . . . , n max and | l | ≤ n .3.2. SH coefficients.
The composite convolutional in (5) implies the availabilityof SH coefficients c , which can be estimated directly from DTI data, as described,e.g., in [48]. The estimation is carried out independently on each of the M signalscomprising a given DC, resulting in an M × M × M × P array of associated SHcoefficients c . The proposed DNN has been designed to work directly with such c ,which are assumed to be precomputed prior to network training. For the sake ofnotational simplicity, such input arrays will also be referred bellow to as “DCs”.Finally, the question of setting n max should not be overlooked. Usually, n max isdefined in accordance with a required b -value. In particular, for b = 1000 s/mm (as used in ADNI-II), setting n max = 6 seems to be a conventional choice [49–51].Note that, in this case, the total number of SHs is equal to P = 28.4. Proposed network architecture
Convolutional layers.
The proposed DNN consists of several convolutionallayers, each of which is parameterized by (convolutional) weights w = { w n } n =0 , ,... For each n , the computation of ˜ c n can be identified with the action of multi-channel convo-lution that is a standard computational routine included in many existing DL frameworks, suchas, e.g., TensorFlow ® (which has been used in this study). OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE11 and a vector of P (scalar) biases b = { b n,l } , with n = 0 , , . . . , n max and | l | ≤ n .Given an input array c in , each such layer computes its output c out according to(6) c out = (cid:0) c in ∗ u , r w (cid:1) + b, where the plus sign is assumed to broadcast the values of b so that every “coefficientimage” in ( c in ∗ u , r w ) is summed with a different b n,l . Note that the affine operationin (6) is not exclusive to volumetric data, since it can be reduced to its 2-D and1-D versions by merely replacing all w kn,l in w by their 2-D and 1-D counterparts,respectively.4.2. Activation.
The scope of activation functions currently used in DL is broad.In this work, all activation functions have had the form of a basic rectified linearunit (ReLU) [26], with its input-output relation defined as given by(7) c out = ReLU( c in ) = max(0 , c in ) , where the maximization is carried out independently for all values in the inputarray. Note that, over the last years, (7) has been modified in several ways (result-ing in, e.g., leaky ReLu, noisy ReLu, and exponential ReLU). In our experiments,however, the basic definition in (7) was observed to work more than adequately.4.3. Pooling.
Pooling is a standard method of data aggregation which can be usedto suppress redundancies in input data, reduce the number of network parameters,and to minimize the risk of overfitting [27]. The special structure of DC samples c ,however, requires a proper adaptation of this operation. Specifically, in this work,the operation of pooling consisted of max pooling applied along a singular spatialdimension . Specifically, depending on the direction of maximization, the poolingcan be defined in three possible ways as given by c outn,l ( y, z ) = P x ( c inn,l ( x, y, z )) := max x c inn,l ( x, y, z ) , (8a) c outn,l ( x, z ) = P y ( c inn,l ( x, y, z )) := max y c inn,l ( x, y, z ) , (8b) c outn,l ( x, y ) = P z ( c inn,l ( x, y, z )) := max z c inn,l ( x, y, z ) , (8c)for each ( n, l ) and ( x, y, z ) ∈ Ω L . Above, P denotes the pooling operator, with itssub- and superscripts indicating the spatial dimensionality of c inn,l and the directionof maximization, respectively.It is important to emphasize, while c inn,l in (8) depends on three spatial variables,the number of spatial dimensions of c outn,l is reduced to two, and, consequently, eachof the P “coefficient images” composing c out is now an array of size M × M × P .The resulting outputs could be subjected to 2-D pooling defined according to c outn,l ( y ) = P x ( c inn,l ( x, y ) = max x c inn,l ( x, y ) , (9a) c outn,l ( x ) = P y ( c inn,l ( x, y )) = max i (cid:48) y c inn,l ( x, y ) , (9b) Figure 3.
Proposed network architecture.for each ( n, l ) and ( x, y ), with | x | , | y | ≤ L . Note that, in both variants above, thespatial dimension of c out has been reduced to one (i.e., each c outn,l is now an arrayof size M × P ).Proceeding analogously, one can finally define the operation of 1-D pooling as(10) c outn,l = P x ( c inn,l ( x )) = max x c inn,l ( x ) , for each ( n, l ) and | x | ≤ L . This type of pooling collapses the spatial dimension of c in , resulting in a length- P vector c out of modified SH coefficients.4.4. Network Architecture.
Apart from using the purpose-built operations ofconvolution and pooling, the proposed DNN relies on a standard feed-forwardarchitecture, which is depicted in Fig. 3 for the case of L = 1 and n max = 6(i.e., M = 3 and P = 28). The network consists of six neural layers which, inthe figure, are shown separated by vertical dotted lines. The dimensions specifiedat the top of these lines indicate the size of the inputs received by each layer.Specifically, the input layer of the network receives a DC array c of size 3 × × × × ×
28, each of which isprocessed in an analogous manner, using (9) and (6) (with a 2-D version of (5)).Similar computations are applied to all of the six inputs of the third layer, whichare subjected to transformation (6) (with a 1-D version of (5)), ReLU activation,and 1-D pooling according to (10). Subsequently, each of the resulting pairs of P -length vectors is fused into a single 28-length vector by means of a fully connectedlayer (FCL). An additional FCL is used to transform its inputs into a single vectorof length 28.The final layer of the DNN consists of a linear transformation into the spaceof diagnostic outcomes, followed by the softmax operator. In the case of binary OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE13 stratification (i.e., CN vs AD), the latter can be defined as [27](11) γ = e α e α + e β , with α and β being the two outputs of the linear transformation. In the contextof this paper, the score 0 ≤ γ ≤
1, thus computed, is referred to as PSIC.4.5.
Network Optimization.
The proposed DNN is parameterized by the valuesof convolution kernels, biases and weight matrices used across various componentsof the network. Altogether, these parameters can be gathered into a single vector θ , in which case, for a given θ , the DNN acts as a forward mapping γ = f ( c | θ ),associating each DC c with its PSIC score. In this case, it is a standard practice toestimate the optimal value of θ via solving a cross-entropy minimization problemof the form(12) θ ∗ = arg min θ E c {− γ log f ( c | θ ) } , where expectation E is computed over the empirical distribution of c .Finally, let {R j } Mj =1 be a set of relevant ROIs. Also, for each R j , let θ ∗R j denotethe optimal values of its associated DNN parameters. Then, given an unlabelledset of DTI measurements, f ( · | θ ∗R j ) can be used to compute the PSIC valuesacross the entire R j . Subsequently, the resulting scores can be summarized intoregional statistics for the purpose of subject stratification, as described next.5. Experimental study design
Study data.
As stated earlier, the experimental study of this work has beenfocused on the problem of stratification of CN and AD subjects. To this end, 20CN and 20 AD age-matched subjects (mean age 72.6 ± . The dataset of each subject consisted of K = 41 diffusion-encoded volumetric scans, acquired at b = 1000 s/mm , as wellas five b -volumes (i.e., scans acquired in the absence of diffusion sensitization). Inaddition, each dataset was supplemented with its associated T - and T -weightedscans required for structural image alignment and segmentation.5.2. Data preprocessing.
For each dataset, its b -volumes were first co-registeredand then merged into a single (average) b -volume, which was subsequently usedfor normalization of the K diffusion-encoded volumes. After that, the normalizeddata were subjected to preprocessing by means of a custom pipeline which had beendesigned based on the recommendations of [52]. The technical implementation ofthe pipeline relied on the NiPype framework of NiPy ( nipype.readthedocs.io ),which allowed convenient integration of many well-established tools of computa-tional imaging, including FSL ( https://fsl.fmrib.ox.ac.uk/ ), ANTs ( picsl. For more details on the inclusion/exclusion criteria and other design parameters used by theADNI-II study, please refer to adni.loni.usc.edu . upenn.edu/software/ants/ ), and SPM ( ). Asper usual, the principal purpose of the preprocessing pipeline has been to compen-sate for various imaging artefacts caused by the effects of subject motion, variablemagnetic susceptibility, eddy currents, etc.Additionally, FreeSurfer [53] ( surfer.nmr.mgh.harvard.edu/ ) was used for thepurpose of segmentation of grey and white matter, with their subsequent parcella-tion into smaller anatomical regions. Subsequently, the anatomic labels providedby FreeSurfer were used to partition the image domain Ω of the DTI volumes intoa set of predefined ROIs {R j } Mj =1 , as detailed below.5.3. Definition of ROIs.
The two left columns of Table 1 summarize the namesof the anatomical regions computed by FreeSurfer, along with their acronyms. Inaddition, for each ROI, the leftmost columns of the table indicate the total numberof DC samples available in its respective CN ( X R j CN ) and AD ( X R j AD ) subsets.The ROIs have been chosen to correspond to different parts of white matteranatomy, which are known to be implicated in the pathogenesis of AD. It shouldbe noted, however, that FreeSurfer lacks means of direct delineation of white mat-ter. Instead, various elements of the latter are labelled based on their proximityto the nearby anatomical structures of grey matter. Thus, for instance, all voxelsdesignated as white matter and located, e.g., within a 5 mm ribbon around the leftsuperior frontal gyrus would be labelled as lh-SFW. This approach is obviouslynot without limitations, the rectification of which has been in the focus of ongoingresearch [54]. Nevertheless, considering the comparative nature of the results re-ported here, the choice of a specific method of whole-brain segmentation does notseem to be particularly critical.Using the methodology of Section 2, each of the M = 14 regions in Table 1was used to assemble its associated input samples X R j CN and X R j AD , correspondingto the target labels γ = 0 and γ = 1, respectively. Subsequently, the processof network training was carried out for each j independently, yielding a vector ofoptimal network parameters for each of the chosen ROIs.5.4. Network training.
Prior to training, for each j = 1 , , . . . ,
14, the samples in X R j CN and X R j AD were randomized and split into a training and a validation dataset (inproportion 4:1), which were subsequently used for the purpose of estimation of θ ∗R j and final performance evaluation, respectively. In all cases, the optimization wasperformed by means of the adaptive moment estimation (Adam) algorithm [55],with a fixed learning rate of 0 . · − , batch size of 256 samples, and 200 epochs.The network training procedure was augmented with dropout regularization, withthe value of keep probability set to 0.7 to alleviate the effects of overfitting.The convergence of optimization has been monitored in terms of empirical pre-diction accuracy (PA). Given a set of N L labelled DC samples { c l , γ l } N L l =1 , with OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE15 j ROI name Acronym X R j AD X R j CN Total
Table 1.
Selected ROIs and their associated number of input sam-ples in X R j CN and X R j AD (shown in the three rightmost columns). Notethat the abbreviations “wm”, “lh”, and “rh” stand for white matter,left and right hemisphere, respectively. γ l ∈ { , } , the latter can be defined as(13) PA( θ ) = 1 − N L N L (cid:88) l =1 | ˆ γ l ( c l | θ ) − γ l | , where ˆ γ l ( c l | θ ) = 1, if f ( c l | θ ) ≥ . γ l ( c l | θ ) = 0, otherwise.In the context of network training, the PA criterion can be a useful indication ofthe convergence of stochastic gradient. On the other hand, when used on validationdata, PA provides a more objective measure of both optimality and generalizability in view of its virtual independence of the effects of overfitting. Thus, the highervalues of validation PA are typically indicative of more accurate performance, ingeneral [27].5.5. Binary classification.
Once the training is complete, and the values of θ ∗R j are available for each ROI R j , the optimal forward mappings f ( · | θ ∗R j ) can beused for stratification of unclassified subjects. In particular, an unlabelled datasetof DTI measurements can be converted into M subsets of DC samples { c r } r ∈R j inaccordance with the selected ROIs. Subsequently, for each R j , its respective DCsamples can be converted into a set of PSIC scores { γ r } r ∈R j , with γ r = f ( c r | θ ∗R j ). In this case, the subject could be stratified based on(14) median r ∈R j ( γ r ) AD ≷ CN . , with the decision made independently for each j . In such case, the median PSICscore on the left side of (14) can be viewed as a cumulative regional marker of AD.Furthermore, for the same subject, the PSIC values in {{ γ r } r ∈R j } Mj =1 can servein the role of imaging contrast. Particularly, these values can be superimposedon structural MRI scans, thus offering the possibility of visual exploration of thespatial variability of PSIC values.5.6. Reference methods.
The performance of the proposed classifier has beencompared against region-based classification based on multiple diffusion metricsand their combination. The selected metrics included four standard DTI mea-sures, viz. : mean diffusivity (MD), fractional anisotropy (FA), as well as diffusion linearity (CL) and planarity [14]. The list of metrics also included diffusion volume (DV), average sample diffusion (ASD), diffusion energy (DE), and the coefficientof variation of diffusion (CVD), which can provide more general characterizationof diffusion dynamics, independent of DTI modelling [56]. It goes without saying,the above list is by no means exclusive, and other useful characteristics of cerebraldiffusion could have been included as well [57]. This being the case, however, be-sides covering both basic and advanced options, the selected metrics have had animportant advantage of estimability based on relatively small datasets, as it is thecase in the present study.In this paper, the region-based classification was based on the likelihood ratiotest [58]. Specifically, for each metric µ i , with i = 1 , , . . . ,
8, its values inside R j were assumed to be independent realizations of a random variable, with itsprobability densities in the CN and AD subgroups given by p CN R j ( µ i ) and p AD R j ( µ i ),respectively. Then, given an unlabelled dataset, the observed values of { µ i ( r ) } r ∈R j were used to stratify the subject according to(15) (cid:80) r ∈R j log p AD R j ( µ i ( r )) (cid:80) r ∈R j log p CN R j ( µ i ( r )) AD ≷ CN η, for some decision variable η . For each µ i and R j , η had been set to its optimal valuethrough maximizing the area under the receiver operating characteristic (ROC)curve of its corresponding classifier. In each case, the probability densities in (15)were assumed to be Gaussian, with their means and variances estimated from theavailable data, following a standard leave-one-out cross-validation procedure [59].An addition reference method was based on concurrent use of multiple diffusionmetrics within the framework of logistic regression (LR). Similarly to the proposed OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE17
Table 2.
PA values produced by different classifiers for various R j .Note that, for each j , the two best results are outlined in bold. j ROI, R j MD FA CL CP DV ASD DE CVD LR
DNN
10 rh-MFWros
11 lh-PCUNW 0.72 0.59 0.59 0.44
12 rh-PCUNW 0.79 0.54 0.49 0.46 0.82 0.79
13 lh-SPW 0.56 0.64 0.62 0.49 0.54 0.56 0.56 0.54
14 rh-SPW
DNN-based classifier, LR was used to map the input values of { µ i } i =1 into a scalarclassification score [60]. 6. Experimental results
The main experimental results of this study are summarized in Tables 2 and 3,where the former shows the PA scores produced by different classifiers for differentROIs (with the proposed method of classification denoted by “DNN”). One can seethat the DNN-based classifier considerably outperforms the reference approachesacross all ROIs. There is, however, a slight decrease in DNN’s performance for R j with j >
10, which is likely due to the reduction in the total number of trainingsamples available for these ROIs (as indicated in Table 1).As evidenced by Table 2, among the reference methods, the LR classifier showedsuperior performance in less than half of the cases. In other cases, the accuracyof classification was observed to depend on a particular metric/ROI combination.Interestingly enough, in about half of such cases, the most basic DTI metrics (suchas MD and FA) demonstrated better performance in comparison to more advancedoptions.In addition, leave-one-out cross-validation [59] was used to compare the proposedand reference classifiers in terms of their respective ROC curves. Specifically, theoptimality of classification was assessed in terms of the area-under-curve (AUC)criterion, whose values are shown in Table 3. Here, the higher values of AUC indi-cate a higher accuracy of classification, with the upper bound of 1 corresponding tothe case of perfect classification. Thus, as demonstrated by the table, the proposed
Table 3.
AUC values produced by different classifiers for various R j . Note that, for each j , the two best results are outlined in bold. j ROI, R j MD FA CL CP DV ASD DE CVD LR
DNN
10 rh-MFWros
11 lh-PCUNW 0.81 0.41 0.38 0.49 0.83 0.81
12 rh-PCUNW 0.85 0.45 0.42 0.62
13 lh-SPW 0.74 0.36 0.31 0.44 0.74 0.74 0.73 0.40
14 rh-SPW 0.82 0.27 0.23 0.54 0.82 0.82 0.82 0.31 classifier offers a notably better solution to the problem of CN/AD stratificationin comparison with the reference ones.Finally, Fig. 4 and Fig. 5 exemplify the use of PSIC in its capacity as imagingcontrast. It should be emphasized that, as opposed to diffusion metrics, the valuesof PSIC have no physiological interpretation. Instead, PSIC could be viewed asa pathology-specific risk indicator, whose higher values reflect a higher probabilityof the brain to be affected by the disease. This property of PSIC is evident inSubplots A of Fig. 4 and Fig. 5 which show the PSIC-enhanced structural scansof two AD subjects. In this case, the spatial distribution of PSIC values appearsto be both intense and spatially pervasive. On the other hand, Subplots B of thesame figures show results for two CN subjects. One can see that, in this case, themagnitude and spatial spread of PSIC appear to be much more “diluted”.Due to the preliminary nature of the present paper, an in-detail exploration ofthe spatial characteristics of PSIC as well as its correlation with underlying brainanatomy and its possible etiological explanations are left beyond the scope of thisreport. However, in view of the empirical evidence provided by Fig. 4 and Fig. 5, itis reasonable to expect the proposed contrast mechanism to be “worth a thousandwords”, both as an adjunct to establishing a confident diagnosis and as a meansto facilitate post hoc discoveries.7.
Discussion and Conclusions
The main objective of this work has been to explore the potential of dMRI inapplication to early diagnosis of AD. As opposed to alternative means of medical
OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE19
Figure 4. (Subplot A) PSIC values of an AD subject superimposedon structural MRI scans. (Subplot B) PSIC values of a CN subjectsuperimposed on structural MRI scans.
Figure 5. (Subplot A) PSIC values of an AD subject superimposedon structural MRI scans. (Subplot B) PSIC values of a CN subjectsuperimposed on structural MRI scans.
OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE21 imaging, the physics of dMRI happens to be uniquely suited for the detection andassessment of microscopic damage to brain tissue, which is known to precede theensuing morphological changes in cortical grey matter due to AD [1, 61]. This iswhat endows dMRI with the unique ability to detect the presence of neurodegen-eration at its earliest pathological stages.The proposed method has been derived based on the concept of data-driveninference , which allows overcoming some critical limitations of model-based anal-ysis of diffusion signals, especially in situations with relatively small DTI datasets(i.e., when K (cid:46) direct correspondence betweenDTI data and a quantitative measure of health risks due to AD.The proposed DNN has been designed to process spatially localized segments ofDTI data, i.e., 4-D “diffusion cubes” corresponding to different spatial coordinates.The local definition of input DC samples has served two important purposes. First,it gave the means to use the output scores as a spatially dependent “risk indicator”,which has been referred to as PSIC (in view of its purposive specificity to suspectedpathology). Second, the same locality made it possible to collect tens of thousandsof training samples from as few as only 40 DTI datasets. More importantly, thetraining data thus obtained have been sufficient for reliable optimization of thenetwork parameters. Needless to say, such a result would have been impossible toattain, had, in accordance with established practice, each of the datasets been dealtwith as a single observation. Thus, the proposed methodology can be particularlyadvantageous in situations when larger sets of training data are not available.It goes without saying; the present work has barely scratched the surface ofthe possibilities offered through the combination of dMRI measurements and DL,with many important questions left yet to be addressed. In particular, althoughsufficient as a “proof-of-concept” in comparative studies, the problem of DTI-basedstratification of CN and AD subjects is clearly of limited practical importance.Thus, to further attest to its viability, the proposed method needs to be extendedto the classification of multiple diagnostic groups. Along this direction of research,a particularly enticing question would be to find correspondence between PSICand different pathological stages of AD [61]. In a similar vein, it would also beinteresting to apply the proposed solution to the problem of classification of varioussubtypes of MCI [12].Addressing more complex classification problems is likely to require a propor-tional increase in the complexity of the DNN. Thus, in the case of simple CN/ADstratification, it was neither necessary to involve more complex data nor to extendthe network architecture beyond a basic feed-forward configuration. Moreover, theminimality of the employed DNN architecture (with the total number of trainableparameters equal to 50,376) has been instrumental in preventing overfitting during its training on the small set of 40 subjects. In more complicated clinical scenarios,however, the architecture could be extended through, e.g., processing the spatialneighbourhoods Ω L of different sizes, with a corresponding increase in the numberof hidden layers. Another way to enhance the predictive power of the DNN wouldbe to take advantage of dMRI data acquired at multiple b -values (as has been donein the ADNI-III study). Needless to add, all such extensions would come with anincrease in the number of network parameters, which should be accompanied by a pro-rata increase in the size of training data.Finally, although working with SH coefficients is not the only way to “pla-narize” the operation of spherical convolution, it offers an important advantagein the context of between-site variability of classification scores. Specifically, dueto discrepancies in the design and settings of MRI scanners, similar dMRI signalsacquired at different sites are not uncommon to have notably different spectralcharacteristics (which is often the reason behind conflicting reports in clinical ap-plications of dMRI). This problem has been addressed by a range of approaches,among which a particularly effective way to counteract the effects of between-sitevariability is lent by means of spectral data harmonization of the SH coefficientsof DTI data, as detailed in [62]. This approach suggests a straightforward wayof combining the proposed method with the compatible means of normalization,which should render its performance consistent across different clinical sites. Thisexpectation, however, still needs to be validated via proper experimental studies,which constitutes another objective of our future research. References [1] Anil Kumar, Arti Singh, and Ekavali. A review on Alzheimer’s disease pathophysiology andits management: An update.
Pharmacol. Rep. , 67(2):195 – 203, 2015.[2] 2018 Alzheimer’s disease facts and figures.
Alzheimer’s & Dementia , 14(3):367 – 429, 2018.[3] Giovanni B. Frisoni, Nick C. Fox, Clifford R. Jack Jr, Philip Scheltens, and Paul M. Thomp-son. The clinical use of structural MRI in Alzheimer disease.
Nature Rev Neurol , 6:67, Feb-ruary 2010.[4] M. W. Weiner, D. P. Veitch, P. S. Aisen, L. A. Beckett, N. J. Cairns, R. C. Green, D. Harvey,C. R. Jack, W. Jagust, E. Liu, J. C. Morris, R. C. Petersen, A. J. Saykin, M. E. Schmidt,L. Shaw, J. A. Siuciak, H. Soares, A. W. Toga, and J. Q. Trojanowski. The Alzheimer’s Dis-ease Neuroimaging Initiative: A review of papers published since its inception.
AlzheimersDement , 8(1):1–68, February 2012.[5] P J Basser, J Mattiello, and D LeBihan. Estimation of the effective self-diffusion tensorfrom the NMR spin echo.
J Magn Reson B , 103(3):247–54, Mar 1994.[6] D Le Bihan, E Breton, D Lallemand, P Grenier, E Cabanis, and M Laval-Jeantet. MR imag-ing of intravoxel incoherent motions: Application to diffusion and perfusion in neurologicdisorders.
Radiology , 161(2):401–7, Nov 1986.[7] Julio Acosta-Cabronero, Stephanie Alley, Guy B Williams, George Pengas, and Peter JNestor. Diffusion tensor metrics as biomarkers in Alzheimer’s disease.
PLoS One , 7:e49072,2012.
OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE23 [8] I K Amlien and A M Fjell. Diffusion tensor imaging of white matter degeneration inAlzheimer’s disease and mild cognitive impairment.
Neuroscience , 276:206–215, September2014.[9] Tijn M Schouten, Marisa Koini, Frank de Vos, Stephan Seiler, Mark de Rooij, Anita Lech-ner, Reinhold Schmidt, Martijn van den Heuvel, Jeroen van der Grond, and Serge A R BRombouts. Individual classification of Alzheimer’s disease with diffusion magnetic resonanceimaging.
NeuroImage , 152:476–481, May 2017.[10] Bing Zhang, Yun Xu, Bin Zhu, and Kejal Kantarci. The role of diffusion tensor imaging indetecting microstructural changes in prodromal Alzheimer’s disease.
CNS Neurosci Ther ,20:3–9, January 2014.[11] Chantel D Mayo, Mauricio A Garcia-Barrera, Erin L Mazerolle, Lesley J Ritchie, John DFisk, Jodie R Gawryluk, and ADNI. Relationship between DTI metrics and cognitive func-tion in Alzheimer’s disease.
Front Aging Neurosci , 10:436, 2018.[12] R. C. Petersen. Mild cognitive impairament.
Continuum , 22(2):404–418, 2016.[13] Emily C Edmonds, Carrie R McDonald, Anisa Marshall, Kelsey R Thomas, Joel Eppig,Alexandra J Weigand, Lisa Delano-Wood, Douglas R Galasko, David P Salmon, Mark WBondi, and ADNI. Early versus late MCI: Improved MCI staging using a neuropsychologicalapproach.
Alzheimers Dement , Feb 2019.[14] S Mori and J-Donald Tournier.
Introduction to diffusion tensor imaging and higher ordermodels . Academic Press, 2nd edition edition, 2014.[15] Yaniv Assaf and Ofer Pasternak. Diffusion tensor imaging (dti)-based white matter mappingin brain research: A review.
J Mol Neurosci , 34(1):51–61, 2008.[16] D. K. Jones and M. Cercignani. Twenty-five pitfalls in the analysis of diffusion MRI data.
NMR Biomed. , 23:803–820, 2010.[17] M Bozzali, A Falini, M Franceschi, M Cercignani, M Zuffi, G Scotti, G Comi, and M Filippi.White matter damage in Alzheimer’s disease assessed in vivo using diffusion tensor magneticresonance imaging.
J Neurol Neurosurg Psychiatry , 72(6):742–6, Jun 2002.[18] W. Lee, B. Park, and K. Han. Classification of diffusion tensor images for the early detectionof Alzheimer’s disease.
Comput. Biol. Med. , 43:1313–1320, October 2013.[19] Stefan J Teipel, Michel J Grothe, Massimo Filippi, Andreas Fellgiebel, Martin Dyrba, Gio-vanni B Frisoni, Thomas Meindl, Arun L W Bokde, Harald Hampel, Stefan Kloppel, Karl-heinz Hauenstein, and EDSD study group. Fractional anisotropy changes in Alzheimer’sdisease depend on the underlying fiber tract architecture: A multiparametric DTI studyusing joint independent component analysis.
J. Alzheimers Dis. , 41:69–83, 2014.[20] Kara M Hawkins, Aman I Goyal, and Lauren E Sergio. Diffusion tensor imaging correlates ofcognitive-motor decline in normal aging and increased Alzheimer’s disease risk.
J. AlzheimersDis. , 44:867–878, 2015.[21] Chantel D Mayo, Erin L Mazerolle, Lesley Ritchie, John D Fisk, Jodie R Gawryluk, andADNI. Longitudinal changes in microstructural white matter metrics in Alzheimer’s disease.
NeuroImage , 13:330–338, 2017.[22] E Englund, A Brun, and C Alling. White matter changes in dementia of Alzheimer’s type:Biochemical and neuropathological correlates.
Brain , 111 ( Pt 6):1425–39, Dec 1988.[23] Hui Zhang, Torben Schneider, Claudia A Wheeler-Kingshott, and Daniel C Alexander.NODDI: Practical in vivo neurite orientation dispersion and density imaging of the humanbrain.
NeuroImage , 61(4):1000–1016, Jul 2012.[24] N Colgan, B Siow, J M O’Callaghan, I F Harrison, J A Wells, H E Holmes, O Ismail,S Richardson, D C Alexander, E C Collins, E M Fisher, R Johnson, A J Schwarz, Z Ahmed,M J O’Neill, T K Murray, H Zhang, and M F Lythgoe. Application of neurite orientation dispersion and density imaging (NODDI) to a tau pathology model of Alzheimer’s disease.
NeuroImage , 125:739–744, Jan 2016.[25] Catherine F. Slattery, Jiaying Zhang, Ross W. Paterson, Alexander J. M. Foulkes, LauraMancini, David L. Thomas, Marc Modat, Nicolas Toussaint, David M. Cash, John S. Thorn-ton, Daniel C. Alexander, Sebastien Ourselin, Nick C. Fox, Hui Zhang, and Jonathan M.Schott. Neurite orientation dispersion and density imaging (NODDI) in young-onsetAlzheimer’s disease and its syndromic variants.
Alzheimers Dement , 11(7):P91, 2015.[26] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning.
Nature , 521:436–444, 2015.[27] I. Goodfellow, Y. Bengio, and A. Courville.
Deep Learning . MIT Press, 2016.[28] N Bhagwat, J Pipitone, AN Voineskos, and MM Chakravarty. An artificial neural networkmodel for clinical score prediction in Alzheimer’s disease using structural neuroimagingmeasures.
J Psychiatry Neurosci , 44(2):1–15, 2019.[29] Rachna Jain, Nikita Jain, Akshay Aggarwal, and D Jude Hemanth. Convolutional neu-ral network based Alzheimer’s disease classification from magnetic resonance brain images.
Cogn. Syst. Res. , 2019.[30] Hongfei Wang, Yanyan Shen, Shuqiang Wang, Tengfei Xiao, Liming Deng, Xiangyu Wang,and Xinyan Zhao. Ensemble of 3D densely connected convolutional network for diagnosis ofmild cognitive impairment and Alzheimer’s disease.
Neurocomputing , 333:145–156, 2019.[31] Jun-Sik Choi, Eunho Lee, and Heung-Il Suk. Regional abnormality representation learningin structural MRI for AD/MCI diagnosis. In
International Workshop on Machine Learningin Medical Imaging , pages 64–72. Springer, 2018.[32] Chunfeng Lian, Mingxia Liu, Jun Zhang, and Dinggang Shen. Hierarchical fully convolu-tional network for joint atrophy localization and Alzheimer’s disease diagnosis using struc-tural MRI.
IEEE Trans Pattern Anal Mach Intell , 2018.[33] Mostafa Mehdipour Ghazi, Mads Nielsen, Akshay Pai, M Jorge Cardoso, Marc Modat,S´ebastien Ourselin, Lauge Sørensen, Alzheimer’s Disease Neuroimaging Initiative, et al.Training recurrent neural networks robust to incomplete data: Application to Alzheimer’sdisease progression modeling.
Med Image Anal , 53:39–46, 2019.[34] Garam Lee, Kwangsik Nho, Byungkon Kang, Kyung-Ah Sohn, and Dokyoon Kim. Predict-ing Alzheimer’s disease progression using multi-modal deep learning approach.
ScientificReports , 9(1), 2019.[35] Mingliang Wang, Daoqiang Zhang, Dinggang Shen, and Mingxia Liu. Multi-task exclusiverelationship learning for Alzheimer’s disease progression prediction with longitudinal data.
Med Image Anal , 53:111–122, 2019.[36] Maryamossadat Aghili, Solale Tabarestani, Malek Adjouadi, and Ehsan Adeli. Predictivemodeling of longitudinal data for Alzheimer’s disease diagnosis using RNNs. In
InternationalWorkshop on Predictive Intelligence in Medicine , pages 112–119. Springer, 2018.[37] Weiming Lin, Tong Tong, Qinquan Gao, Di Guo, Xiaofeng Du, Yonggui Yang, Gang Guo,Min Xiao, Min Du, Xiaobo Qu, et al. Convolutional neural networks-based MRI image anal-ysis for the Alzheimer’s disease prediction from mild cognitive impairment.
Front Neurosci ,12, 2018.[38] Jeremy Kawahara, Colin J Brown, Steven P Miller, Brian G Booth, Vann Chau, Ruth EGrunau, Jill G Zwicker, and Ghassan Hamarneh. BrainNetCNN: Convolutional neural net-works for brain networks; towards predicting neurodevelopment.
NeuroImage , 146:1038–1049, 2017. We propose BrainNetCNN, a convolutional neural network (CNN) frameworkto predict clinical neurodevelopmental outcomes from brain networks.[39] Tao Zhou, Kim-Han Thung, Xiaofeng Zhu, and Dinggang Shen. Effective feature learn-ing and fusion of multimodality data using stage-wise deep neural network for dementiadiagnosis.
Hum Brain Mapp , 40(3):1001–1016, 2019.
OWARDS A QUANTITATIVE ASSESSMENT OF NEURODEGENERATION IN ALZHEIMER’S DISEASE25 [40] Parisa Forouzannezhad, Alireza Abbaspour, Chunfei Li, Mercedes Cabrerizo, and MalekAdjouadi. A deep neural network approach for early diagnosis of mild cognitive impairmentusing multiple features. In , pages 1341–1346. IEEE, 2018.[41] Donghuan Lu, Karteek Popuri, Gavin Weiguang Ding, Rakesh Balachandar, andMirza Faisal Beg. Multimodal and multiscale deep neural networks for the early diagno-sis of Alzheimer’s disease using structural MR and FDG-PET images.
Scientific Reports ,8(1), 2018. In this paper, we propose a novel deep-learning-based framework to discriminateindividuals with AD utilizing a multimodal and multiscale deep neural network.[42] Vladimir Golkov, Alexey Dosovitskiy, Jonathan I Sperl, Marion I Menzel, Michael Czisch,Philipp S¨amann, Thomas Brox, and Daniel Cremers. Q-space deep learning: Twelve-foldshorter and model-free diffusion MRI scans.
IEEE Trans Med Imaging , 35(5):1344–1351,2016. We demonstrate how deep learning, a group of algorithms based on recent advances inthe field of artificial neural networks, can be applied to reduce diffusion MRI data processingto a single optimized step.[43] Simon Koppers and Dorit Merhof. Direct estimation of fiber orientations using deep learningin diffusion imaging. In
International Workshop on Machine Learning in Medical Imaging ,pages 53–60. Springer, 2016. We present a novel approach for estimating the fiber orientationdirectly from raw data, by converting the model fitting process into a classification problembased on a convolutional Deep Neural Network, which is able to identify correlated diffusioninformation within a single voxel.[44] McCrackin L. Crowley M. Rathi Y. Maryam, S. and O. Michailovich. Application ofprobabilistically-weighted graphs to image-based diagnosis of alzheimers disease using dif-fusion mri. In
Proceed SPIE Medical Imaging , pages 1–10, 2017.[45] D. S. Tuch. Q-ball imaging.
Magn. Reson. Med. , 52:1358–1372, 2004.[46] A. Poulenard, M. Rakotosaona, Y. Ponty, and M. Ovsjanikov. Effective rotation-invariantpoint cnn with spherical harmonics kernels. In , pages 47–56, 2019.[47] Ayman Mukhaimar, Ruwan B. Tennakoon, C. Y. Lai, R. Hoseinnezhad, and A. Bab-Hadiashar. Robust object classification approach using spherical harmonics.
ArXiv ,abs/2009.01369, 2020.[48] S. Fitzgibbons R. Deriche M. Descoteaux, E. Angelino. Apparent diffusion coefficients fromhigh angular resolution diffusion images: Estimation and applications.
Magn. Reson. Med. ,56(2):395–410, 2006.[49] P. J. Basser S. Marenco C. Pierpaoli G. K. Rohde, A. S. Barnett. Comprehensive ap-proach for correction of motion and distortion in diffusion-weighted mri.
Magn. Reson.Med. , 51:103–114, 2004.[50] D. K. Jones. The effect of gradient sampling schemes on measures derived from DiffusionTensor MRI: A Monte Carlo study.
Magn. Reson. Med. , 51:807–815, 2004.[51] M. Lazar A. S. Field A. L. Alexander, J. E. Lee. Diffusion tensor imaging of the brain.
Neurotherapeutics , 4:316–329, 2007.[52] Matthew F Glasser, Stamatios N Sotiropoulos, J Anthony Wilson, Timothy S Coalson,Bruce Fischl, Jesper L Andersson, Junqian Xu, Saad Jbabdi, Matthew Webster, Jonathan RPolimeni, David C Van Essen, Mark Jenkinson, and WU-Minn HCP Consortium. The mini-mal preprocessing pipelines for the human connectome project.
Neuroimage , 80:105–24, Oct2013.[53] Schmansky N.J. Rosas H.D. Fischl B. Reuter, M. Within-subject template estimation forunbiased longitudinal image analysis.
Neuroimage , 61(5):1402–1418, 2012. [54] Yuankai Huo, Shunxing Bao, Prasanna Parvathaneni, and Bennett A Landman. Improvedstability of whole brain surface parcellation with multi-atlas segmentation.
Proc SPIE IntSoc Opt Eng , 10574, Mar 2018.[55] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv.org , 1412.6980,2017.[56] S. Aja-Fernandez, T. Pieciak, A. Tristan-Vega, G. Vegas-Sanchez-Ferrero, V. Molinaf, andR. de Luis-Garcia. Scalar diffusion-MRI measures invariant to acquisition parameters: Afirst step towards imaging biomarkers.
Magn. Reson. Imag. , 54:194–213, 2018.[57] D. S. Novikov, J. Veraart, I. O. Jelescu, and E. Fieremans. Rotationally-invariant mapping ofscalar and orientational metrics of neuronal microstructure with diffusion MRI.
NeuroImage ,174(1):518–538, 2018.[58] Philippe Rigollet Xin Tong. Neyman-Pearson classification, convexity and stochastic con-straints.
J. Mach. Learn. Res. , 12:2831–2855, 2011.[59] S. Konishi and M. Honda. Bootstrap methods for error rate estimation in discriminantanalysis.
Japanese Society of Applied Statistics , 21(2):67–100, 1992.[60] A. Agresti.
Categorical Data Analysis . Wiley-Interscience, New York, 2002.[61] H. Braak and E. Braak. Neuropathological stageing of Alzheimer-related changes.
ActaNeuropathol , 82:239–259, 1991.[62] Ning L. Savadjiev P. et al. Mirzaalian, H. Multi-site harmonization of diffusion mri data ina registration framework.