[PDF] Representing Alzheimer's Disease Progression via Deep Prototype Tree

Abstract

For decades, a variety of predictive approaches have been proposed and evaluated in terms of their predicting capability for Alzheimer's Disease (AD) and its precursor - mild cognitive impairment (MCI). Most of them focused on prediction or identification of statistical differences among different clinical groups or phases (e.g., longitudinal studies). The continuous nature of AD development and transition states between successive AD related stages have been overlooked, especially in binary or multi-class classification. Though a few progression models of AD have been studied recently, they mainly designed to determine and compare the order of specific biomarkers. How to effectively predict the individual patient's status within a wide spectrum of AD progression has been understudied. In this work, we developed a novel structure learning method to computationally model the continuum of AD progression as a tree structure. By conducting a novel prototype learning with a deep manner, we are able to capture intrinsic relations among different clinical groups as prototypes and represent them in a continuous process for AD development. We named this method as Deep Prototype Learning and the learned tree structure as Deep Prototype Tree - DPTree. DPTree represents different clinical stages as a trajectory reflecting AD progression and predict clinical status by projecting individuals onto this continuous trajectory. Through this way, DPTree can not only perform efficient prediction for patients at any stages of AD development (77.8% accuracy for five groups), but also provide more information by examining the projecting locations within the entire AD progression process.

Full PDF

RRepresenting Alzheimer's Disease Progression via Deep Prototype Tree

Lu Zhang , Li Wang , Dajiang Zhu Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, USA [email protected] Mathematics, University of Texas at Arlington, Arlington, TX, USA

Abstract.

For decades, a variety of predictive approaches have been proposed and evaluated in terms of their predicting capability for Alzheimer’s Disease (AD) and its precursor – mild cognitive impairment (MCI). Most of them focused on prediction or identification of statistical differences among different clinical groups or phases (e.g., longitudinal studies). The continuous nature of AD devel-opment and transition states between successive AD related stages have been overlooked, especially in binary or multi-class classification. Though a few pro-gression models of AD have been studied recently, they mainly designed to de-termine and compare the order of specific biomarkers. How to effectively predict the individual patient’s status within a wide spectrum of AD progression has been understudied. In this work, we developed a novel structure learning method to computationally model the continuum of AD progression as a tree structure. By conducting a novel prototype learning with a deep manner, we are able to capture intrinsic relations among different clinical groups as prototypes and represent them in a continuous process for AD development. We named this method as Deep Prototype Learning and the learned tree structure as Deep Prototype Tree – DPTree. DPTree represents different clinical stages as a trajectory reflecting AD progression and predict clinical status by projecting individuals onto this contin-uous trajectory. Through this way, DPTree can not only perform efficient predic-tion for patients at any stages of AD development (77.8% accuracy for five groups), but also provide more information by examining the projecting locations within the entire AD progression process.

Keywords:

AD progression, Prototype Learning, Deep Prototype Tree. Introduction

Alzheimer’s disease (AD) is the most common cause of dementia that cannot be pre-vented, cured, or even slowed. Earlier studies have shown that AD pathogenesis involve widespread alterations in brain structure and/or function, such as hippocampi [1], gray matter atrophy [2], white matter disruption [3] and abnormal functional connectivity in default mode network (DMN) [4]. Based on these brain alterations, many approaches have been developed for early diagnosis of AD and its prodromal stage – mild cognitive impairment (MCI), such as voxel-based analysis [5], tract-based spatial statistics [6], machine learning-based algorithms [7] and recently developed deep learning based models [8]. However, as a neurodegenerative disorder with a long preclinical period, the spectrum of AD spans from clinically asymptomatic to severely impaired [9]. For example, heterogeneity in clinical presentation, rate of atrophy and cognitive decline [10] may occur in the prodromal stage of AD [11]. Furthermore, individual variations may also contribute to the heterogeneity of AD: earlier studies suggested that the gap between cognitive function and brain pathology (i.e. cognitive reserve) is typically larger in highly educated individuals [12]. In general, traditional predictive approaches (e.g., classification based models) may be limited in describing the continuum of AD development and individual variations in clinical prediction. To address this potential limitation, hypothetical models [13] for AD progression have been proposed and fol-lowed by various progression studies using cross-sectional or short-term follow-up da-taset. These attempts include regression-based models [14], event-based models [15] and other computational models [16]. Nevertheless, most of them were designed to de-termine and compare the orders of specific biomarkers.

Fig. 1.

Training : we adopted the Destrieux Atlas and the whole brain was labeled with 148 ROIs. We used functional connectivity as input and learned a Deep Prototype Tree (DPTree) to model the entire progression of AD in the latent space. In the tree structure, each small bubble represents a single subject and the colors indicate different clinical groups, including normal control – NC ( green ), significant memory concern – SMC ( yellow ), early MCI – EMCI ( orange ), late MCI – LMCI ( pink ) and AD ( red ). Each edge in the DPTree represents the connecting two nodes have higher similarity in the latent space. The five bigger bubbles represent the learned prototypes.

Prediction : During the prediction, a new patient will be projected into the latent space which is represented by a bubble around the tree. The color of the bubble indicates the true label, the location of the bubble shows its state in the entire development process from NC to AD, and the prediction of the bubble is assigned according to the nearest prototype.

In this work, we aimed to develop a more comprehensive computational framework to integrate the AD progression modeling and individual prediction. Specifically, we proposed a novel structure learning method to model the entire progression of AD as a tree structure by

Deep Prototype Learning in the latent space. Figure 1 illustrates the main idea. During the training process, we used the functional magnetic resonance im-aging (fMRI) derived functional connectivity as the inputs and transformed them into a latent space. Then, we parameterized and learned several representative instances – prototypes for different disease stages to capture intrinsic relations among different clinical groups and represent them in a continuous process for AD development. To reveal complex and non-linear relations buried in the data, the prototype learning was conducted in a deep manner. We named this method as

Deep Prototype Learning . By deep prototype learning, we obtained a

Deep Prototype Tree ( DPTree ) which represents different clinical stages as a trajectory reflecting AD progression. During the prediction process, by projecting individuals onto the continuous trajectory, the learned DPTree can not only assign the clinical group to new patients but also show their clinical status in the entire development process from NC to AD. Moreover, our model achieves a high classification accuracy – 77.8% for multi-class classification (NC vs SMC vs EMCI vs LMCI vs AD). Method

Data Participants.

We used structure MRI (T1-weighted) and resting state fMRI (rs-fMRI) data from ADNI dataset (http://adni.loni.usc.edu/). We began with 490 subjects (284 NC, 34 SMC, 51 EMCI, 62 LMCI and 59 AD) which have both T1 and rs-fMRI mo-dalities. In order to keep sample balance among different groups, we chose gender and age matched 60 participants from 284 NCs. In total, we have 266 subjects, and the proposed analysis was conducted on these 266 subjects.

Data Description and Preprocessing.

The T1-weighted MRI data has 240 × 256 × 208 voxels and the voxel size = 1.0mm×1.0mm×1.0mm, TR = 2.3s. The rs-fMRI data has 197 volumes, each volume has 64 × 64 × 48 voxels and the voxel size = 3.4375mm × 3.4375mm × 3.4mm, TE = 30ms and TR = 3s, flip angle = 90°. The first 6 volumes were discarded during preprocessing procedures to ensure magnetization equilibrium. We applied skull removal for both T1 and rs-fMRI modalities. And for rs-fMRI images, we applied spatial smoothing, slice time correction, temporal pre-whitening, global drift removal and band pass filtering (0.01-0.1 Hz). All of these preprocessing steps are implemented using FMRIB Software Library (FSL) (https://fsl.fmrib.ox.ac.uk/fsl/fsl-wiki/) FEAT. For T1 images, we conducted segmentation by FreeSurfer package (https://surfer.nmr.mgh.harvard.edu/). After the segmentation, we adopted the Des-trieux Atlas for ROI labeling and the brain cortex is partitioned into 148 regions.

Generation of Functional Connectivity.

We calculated averaged fMRI signal for each brain region. Previous studies [17] suggested that for rs-fMRI 14 time points (when TR=2s) are sufficient to capture functional dynamic patterns. In order to enlarge the dataset, we divided the signal into four non-overlapping segments and each segment has 45 time points. We used Pearson Correlation Coefficient to calculate functional connectivity for each of the four groups of the signal segments and obtained four func-tional connectivity matrices for each subject. These functional connectivity matrices were vectorized and used as input of our model.

Method Overview

We proposed a Deep Prototype Tree (DPTree) model to represent the continuum of AD development process. The progression of AD is modeled as a tree structure embedded in a latent space using Deep Prototype Learning. Here, the prototype is an abstract rep-resentation defined in latent space that associated with a specifical clinical stage. We parameterized a group of prototypes as hidden variables in latent space (

Section 2.3 ) and used the order information of clinical groups (NC < SMC < EMCI < LMCI < AD) to guide both the prototypes learning and tree construction (

Section 2.4 ). By stacking multiple layers of perceptron, the prototype and the non-linear relationship between input features and the latent space can be represented and learned efficiently. In general, the DPTree model aims to learn a deep representation of the input signals in a latent space that is specially optimized for both tasks: the classification and AD progression learning. As a result, on the learned tree structure, the patients in the same clinical group are close and distant otherwise. Moreover, DPTree is able to predict the clinical group for a new patient by projecting it to the proper location on the learned tree structure (

Section 2.4 ). Next, we will present the details of DPTree and its predictive capability for new patients.

Deep Prototype Learning

Let {(𝑥 ! , 𝑦 ! )} !" be the training data consisting of 𝑛 labeled data with the 𝑖 th signal in-put 𝑥 ! ∈ ℛ 𝒹 in 𝑑 -dimensional space and class label 𝑦 ! ∈ {1, … , 𝐶} with 𝐶 disease stages. To maintain representative instances for different disease stages, we parameter-ized and learned several prototypes in latent space and used prototype matching for classification. First, we considered a non-linear function ℎ(𝑥, θ): ℛ 𝒹 → ℛ 𝓀 to transform any given input 𝑥 ∈ ℛ 𝒹 to a latent space ℛ 𝓀 with model parameter 𝜃 . Multi-layer perceptron (MLP) network can be the potential function. And then, we defined a set of prototypes as 𝒫 = 8𝑝 !,( ∈ ℛ 𝓀 :𝑖 = 1,2, ⋯ , 𝐶; 𝑗 = 1,2, ⋯ , 𝐾@ where 𝐾 is the number of the proto-types in each class. With the help of the non-linear transformation function and the set of prototypes in the latent space, we can make prediction for any given data. Specifi-cally, given an input data 𝑥 ∈ ℛ 𝒹 , we first got its representation (latent feature) ℎ(𝑥, 𝜃) in latent space, then we compared the latent feature with all prototypes and classified it to the category 𝑦 , which the nearest prototype belongs to: 𝑦 = argmin !∈ { min (∈ { Gℎ(𝑥, 𝜃) − 𝑝 !,( G ,, . (1) The network parameters 𝜃 and prototypes 𝒫 can be trained jointly in an end-to-end manner, which can make the MLP model and prototypes interact with each other for better performance. To train the model, we need to define a proper loss function such that 1) it is differentiable with respect to θ and 𝒫 and 2) it should be closely related to the classification accuracy. Prototype Learning Based Cross Entropy Loss.

In our DPTree model, we used dis-tance to measure the similarity between the input samples and the prototypes. Denote by 𝑦 !,( the class label of the prototype 𝑝 !,( . The probability of an input 𝑥 belonging to the prototype 𝑝 !,( is formulated as: 𝑃K𝑦 !,( :𝑥L = exp {−𝛼Gℎ(𝑥, 𝜃) − 𝑝 !,( G ,, }∑ ∑ exp {−𝛼Gℎ(𝑥, 𝜃) − 𝑝 G ,, } . (2) where 𝛼 is a hyper-parameter that controls the hardness of distance in probability as-signment. Given the definition of 𝑃K𝑦 !,( :𝑥L , we can further define the probability of an input 𝑥 belonging to the category 𝑐 ∈ {1,2, ⋯ , 𝐶} as: 𝑃(𝑐|𝑥) = T 𝑃K𝑦 :𝑥L . (3)

Then, we defined a classification loss function based on the probability

𝑃(𝑐|𝑥) and named it as prototype learning based cross entropy loss given by: ℒ 𝒫 K(𝑥, 𝑦); 𝜃; 𝒫L = − 1𝐶 T 𝕀(𝑐 = 𝑦) log 𝑃(𝑐|𝑥) .3" . (4) where indicator function

𝕀(𝑐 = 𝑦) is 1 if predicator 𝑐 = 𝑦 is true and 0 otherwise. From (2), (3) and (4), we can see that minimizing the prototype learning based cross entropy loss essentially means decreasing the distance between the latent feature ℎ(𝑥, 𝜃) and the prototype, which comes from the genuine category of the input sample. By this way, the distance of two input samples at the same disease stage will be small in the latent space and the disease related representative prototypes can be automatically learned from data. To improve the generalization performance and prevent over-fitting, we also pro-posed a new prototype-based regularization term: ℒ 𝒫ℛ K(𝑥, 𝑦); 𝜃; 𝒫L = Gℎ(𝑥, 𝜃) − 𝑝 G ,, , (5) where 𝑝 is the closest prototype of ℎ(𝑥, 𝜃) with class label 𝑦 . The regularization term pulls the latent feature ℎ(𝑥, 𝜃) of input sample 𝑥 close to their corresponding proto-type, making the latent features within the same class more compact, so it is beneficial for classification. Ordered Prototypes Learning

The class labels 𝑦 provides not only the separability of their inputs, but also the order-ing of the class labels, which corresponds to different stages during the progression of AD. It is generally assumed that the ordering of the classes is NC < SMC < EMCI < LMCI < AD. Even though the ordering of each input sample is unknown, the ordering of the prototypes can still provide valuable information to guide the prototypes learning. Based on this prior knowledge, we can construct an affinity matrix 𝒜 = ]𝑎 (!,(),(! ! ,( ! ) _ ∈ℛ :×: for the similarity among prototype class labels. 𝑁 = 𝐶 × 𝐾 is the total number of prototypes. 𝑎 (!,(),(! ! ,( ! ) = 1 if the (𝑖, 𝑗) <= prototype and the (𝑖 > , 𝑗 > ) <= prototype are from the same class, that is 𝑦 ! = 𝑦 ! ! , 𝑎 (!,(),(! ! ,( ! ) = 0.5 if 𝑦 ! is the neighbor of 𝑦 ! ! in the ordering of class labels, and 0 otherwise. To leverage this prior information for learning tree path of AD progression, we added an additional linear layer with softmax function onto the prototypes to link the latent feature of prototypes and the labels. As a result, the output of the probability for each class is: 𝑂 K𝑝 !,( ; 𝑊, 𝑏L = exp {(𝑊 𝑝 !,( + 𝑏 )}∑ exp {(𝑊 𝑝 !,( + 𝑏 )} .1" , (6) where {𝑊 , 𝑏 )} are the coefficients of the linear layer. The final prediction is obtained as: 𝑦 !,( = argmax 𝑂 K𝑝 !,( ; 𝑊, 𝑏L. (7) The classification loss of prototype is defined as: ℒ 𝒪 K𝑝 !,( ; 𝑊, 𝑏L = − 1𝐶 T 𝕀(𝑐 = 𝑖) 𝑙𝑜𝑔 𝑂 K𝑝 !,( ; 𝑊, 𝑏L .3" . (8) Then, we proposed the following regularization term to incorporate the ordering in-formation of the class labels in terms of the affinity matrix 𝒜 based on the manifold assumption: if two labels are similar, their probabilities of predictions should be close. The regularization term is then formulated as: ℒ 𝒪ℛ (𝒫; 𝑊, 𝑏) = 𝑡𝑟𝑎𝑐𝑒K𝑂𝐿 𝑂 ? L, (9) where

𝑂 = [𝑂 ; 𝑂 , ; ⋯ ; 𝑂 . ] ∈ ℛ .×(.×0) and 𝑂 = ]𝑂 K𝑝 !,( ; 𝑊, 𝑏L_ {!,(} ∈ ℛ ,∀𝑐 , 𝐿 = 𝒟 − 𝒜 is the graph Laplacian matrix of 𝒜 and 𝒟 is the degree matrix of 𝒜 . Together with (4), (5), (8), (9) in hand, we are now ready to formulate our deep prototype tree model with the loss function defined as: ℒ = T]ℒ 𝒫 K(𝑥 ! , 𝑦 ! ); 𝜃; 𝒫L + βℒ 𝒫ℛ K(𝑥 ! , 𝑦 ! ); 𝜃; 𝒫L_ $!" +𝛾 T T ℒ 𝒪 K𝑝 !,( ; 𝑊, 𝑏L + 𝛿ℒ 𝒪ℛ (𝒫; 𝑊, 𝑏). (10) This loss function (10) is derivable with respect to 𝜃 , 𝒫 , 𝑊 and 𝑏 . The whole model can be trained in an end-to-end manner. After the model was trained, we used Kruskal's algorithm to create a minimum spanning tree over training dataset in the latent space. This tree was learned by prototype learning and we named it as Deep Prototype Tree . For a new patient 𝑥 , DPTree can provide two sets of predictions. First, the probabil-ity of assigning 𝑥 to each of the given clinical group can be obtained via (3), and the best prediction can be made according to (1). The second prediction is to decide the location of the patient on the learned DPTree by ℎ(𝑥, 𝜃) , which reflects the stage where the patient is in the progression of the disease. Results

Experimental Setting Data Setting.

In this work, we used 266 subjects (60 NC, 34 SMC, 51 EMCI, 62 LMCI, 59 AD) to do experiment. Each subject has four functional matrices and we had 1064 data samples in total. Because the functional matrix is symmetric, to reduce the redun-dant data, we used the vectorized upper triangle of each matrix as input features and conducted five-fold cross-validation.

Model Setting.

In this work, the non-linear function ℎ(𝑥, 𝜃) was implemented by six-layer MLP. The dimensions of the MLP are 1024-512-256-64-16- 𝓀 and 𝓀 is the di-mension of the latent space ( Section 2.3 ). We tested 𝓀 = 5, 10, 15, 20, 25. We showed the results of 𝓀 = 25 which gives the best classification performance in Section 3.2 and , and compared the results of 𝓀 = 5, 10, 15, 20 and 25 in Section 3.4 . 𝐶 = 5 is the number of classes (NC vs. SMC vs. EMCI vs. LMCI vs. AD). Activation function

Relu and batchnorm were used at each layer. The other four hyper-parameters are 𝛼 = 1.0 , 𝛽 = 0.001, 𝛾 = 1.0 and 𝛿 = 1.0 . The number of the prototypes in each class is 𝐾 = 1 . The entire model was trained in an end-to-end manner. During the training process, the Adam optimizer was used to train the whole model with standard learning rate 0.001, weight decay 0.01, and momentum rates (0.9, 0.999).

Classification Performance

In this section, we showed the classification performance of the proposed DPTree. For fair comparisons, we used two strategies to compare the proposed method with other widely used methods. First, we used the same dataset and five-fold cross validation to conduct experiments with other five broadly used machine learning methods including support vector machine (SVM) with gaussian kernel , linear SVM , k-nearest neighbors (KNN) , logistic regression and random forest . The classification performance was measured through calculating class-specific 𝐹 scores: 𝐹 = 2 × ABC3!D!E$×BC3F11ABC3!D!E$GBC3F11 and accuracy (Acc). The results are showed in Table 1. We can see the class-specific 𝐹 scores of DPTree model is over 0.75 and for some classes it can reach 0.80, which is outstanding in multi-class classification of AD and significantly outperforms the other five methods. Second, we compared the multi-class classification performance with five latest deep learning methods on AD and reported the results in Table 2. [18] obtains a very high 𝐹 score for AD group, however the 𝐹 scores for other groups are much lower. The total accuracy of [21] and [8] are very close to our results, however, they only included three classes while we considered five classes in this work. Table 1.

Classification Performance of DPTree and Other Five Machine Learning Methods.

Method F1(whole) F1(AD) F1(LMCI) F1(EMCI) F1(SMC) F1(CN) Acc(whole) SVM 0.621 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± DPTree 0.777 ± ± ± ±0.06 ±0.02 ± ± Table 2.

Classification Performance of DPTree and Other Deep Learning Methods.

Work Modality Subjects Method Performance Amorosoa et al. (2018) [18] Predefined features 60AD, 60HC, 60cMCI, 60MCI Random Forest, Deep Learning

𝐴𝐷: 𝐹 ! = 0. 𝟖𝟎𝟓 𝑀𝐶𝐼: 𝐹 ! = 0.305 𝑐𝑀𝐶𝐼: 𝐹 ! = 0.518 𝑁𝐶: 𝐹 ! = 0.525 Zhou et al. (2018) [19] MRI, PET, SNP 190AD, 226HC, 157pMCI, 205sMCI Multimodality data fu-sion, Deep Learning

𝐴𝐷: 𝐴𝑐𝑐 = 0.574 𝑝𝑀𝐶𝐼: 𝐴𝑐𝑐 = 0.622 𝑠𝑀𝐶𝐼: 𝐴𝑐𝑐 = 0.342

𝑁𝐶: 𝐴𝑐𝑐 = 0.625

Brand et al. (2019) [20] MRI, SNP 412 (AD +MCI+NC) Deep Learning, joint regression-classification

𝐴𝐷: 𝐹 ! = 0.566 𝑀𝐶𝐼: 𝐹 ! = 0.513 𝑁𝐶: 𝐹 ! = 0.683 Lei et al. (2020) [21] MRI 192AD, 402MCI, 220NC Multiple template, adap-tive feature selection

𝑇𝑜𝑡𝑎𝑙: 𝐴𝑐𝑐 = 𝟎. 𝟕𝟕𝟓 (NC/MCI/AD) Wang et al. (2020) [8] rs-fMRI 253CN, 45EMCI, 88LMCI Autoencoder, Deep Learning

𝑇𝑜𝑡𝑎𝑙: 𝐴𝑐𝑐 = 𝟎. 𝟕𝟑 (NC/EMCI/LMCI) ours rs-fMRI 59AD, 62LMCI, 51EMCL, 34SMC, 60NC Deep Prototype Tree

𝐴𝐷: 𝐹 ! = 𝟎. 𝟕𝟖𝟓 𝐿𝑀𝐶𝐼: 𝐹 ! = 𝟎. 𝟕𝟔𝟐 𝐸𝑀𝐶𝐼: 𝐹 ! = 𝟎. 𝟖𝟎𝟏 𝑆𝑀𝐶: 𝐹 ! = 𝟎. 𝟕𝟔𝟓 𝑁𝐶: 𝐹 ! = 𝟎. 𝟕𝟕𝟑 𝑇𝑜𝑡𝑎𝑙: 𝐴𝑐𝑐 = 0. 𝟕𝟕𝟖

The Learned Deep Prototype Tree

An important contribution of our DPTree is using the learned tree structure to represent the entire spectrum of AD progression. As mentioned in the model setting section, the learned DPTree is in a 25-dimensional latent space. In order to simultaneously visualize the learned tree structure and maintain the intrinsic relations among different clinical groups, we used the adjacent matrix weighted by distance to draw a connection DPTree . We used NetworkX Software (https://networkx.org/)) to display the tree structure: it focuses on the connection relationships and the relative distance of the vertices but ig-nores their actual coordinates in the latent space. We showed the connection DPTree in Fig. 2. The first row of Fig. 2 is the learned five connection DPTree and each one is from one of the five-fold cross validation results. In the tree structure, each small bubble represents a single subject and the colors indicate different AD related groups. Each edge in the connection DPTree represents the connecting two nodes have higher simi-larity in the latent space. The five bigger bubbles represent the learned prototypes and the color indicates the class they belong to. The learned DPTree structure displays a trajectory of AD progression: it starts from NC, goes through SMC, EMCI, LMCI and eventually ends with AD. In the second row, to visualize the prediction results we pro-jected the new patients onto the connection DPTree with the following projection steps: 1) using the well trained MLP model ( ℎ(𝑥, 𝜃) ) to project all the new samples to the latent space to obtain the latent features; 2) assigning the predicted group to each new sample with formula (1); 3) projecting each new sample within a neighborhood of the corresponding prototype. The location is randomly assigned but the relative distance to other prototypes and samples are maintained.

Fig. 2.

First Row: visualization of connection DPTree from multiple clinical groups including NC, SMC, EMCI, LMCI and AD. Second Row: visualization of the prediction of new patients by projecting on the connection DPTree.

Fig. 3.

Top connectivity that contribute most to the learned DPTree structure. In each subfigure, the top and bottom rows display the involved brain regions and connectivity, respectively.

Fig. 4.

Classification performance with different dimension of the latent space.

Fig. 5.

Connection DPTree structures using different dimensions of the latent space. The red circles are used to highlight the prototypes from different clinical groups when their distance is small.

F1(whole) F1(AD) F1(LMCI) F1(EMCI) F1(SMC) F1(CN) Acc(whole)

Classification Performance under Different Prototype Dimension Reproducibility of Deep Prototype Tree Learning

In our DPTree model, the hyper-parameter that have the most important influence on the DPTree structure is 𝓀 , the dimension of latent space. We tested 𝓀 = 5, 10, 15, 20 and and reported the classification performance as well as the connection DPTree structure with different 𝓀 in Fig. 4 and Fig. 5, respectively. Fig. 4 shows that the clas-sification performance improves when increasing 𝓀 and reaches the peak at some val-ues between 20 and 25. From Fig. 5 we can see that if 𝓀 is too small (corresponding to lower dimensional latent space), the distances among different prototypes incline to be small. Insufficient dissimilarity between prototypes may limit the capability for DPTree when representing multiple clinical stages in AD progression and compromise the pre-diction performance in estimating new samples. We used red circle to highlight differ-ent prototypes in Fig. 5. Conclusion and Discussion

Here we proposed a novel DPTree framework to represent the continuum of AD devel-opment. The learned DPTree structure displays a trajectory of AD progression and achieves a high prediction performance over 77.8% for multiple AD related stages. The learned DPTree can not only predict the clinical status of individual patient, but also provide more information of the patient’s state within the entire spectrum of AD pro-gression. We summarize some advantages of DPTree model as following:

DPTree can model the continuum of diseases development.

In this work, we only applied DPTree to Alzheimer’s disease, but it is a general framework that can be ap-plied to a wide range of disease related studies. Any disease that presents multiple clin-ical stages in the development can use the DPTree framework by feeding suitable fea-tures into the model and adjusting the MLP structure according to the input features and tasks. More important, by modifying the affinity matrix 𝒜 , the prior knowledge about the disease can be easily introduced into the DPTree model. DPTree is not limited to the classification task.

By adjusting the additional linear layer in

Section 2.4 the DPTree can be applied to regression problems. For example, by replacing the discrete clinical labels with continuous clinical scores (i.e. Mini-Mental State Exam (MMSE) score), the proposed classification framework can be converted to a regression based model.

The number of prototypes in each class is flexible.

In this work we set the number of prototypes for each clinical group to be 1 (

𝐾 = 1 ) based on our application. How-ever, through modifying the value of 𝐾 , the proposed DPTree can be extended by al-lowing multiple prototypes in each group. For instance, we can introduce multiple pro-totypes to represent subtypes [10] of AD and each subtype may associated with differ-ent clinical presentations (features). References Henneman, W. et al.: Hippocampal atrophy rates in alzheimer disease: added value over whole brain volume measures. Neurology (11), 999–1007 (2009) Karas, G. et al.: Global and local gray matter loss in mild cognitive impairment and alz-heimer’s disease. Neuroimage (2), 708–716 (2004) 3. Li, S. et al.: Regional white matter decreases in alzheimer’s disease using optimized voxel-based morphometry. Acta Radiologica (1), 84–90 (2008) 4. Greicius, M.D. et al.: Default-mode network activity distinguishes alzheimer’s disease from healthy aging: evidence from functional mri. Proceedings of the National Academy of Sci-ences (13), 4637– 4642 (2004) 5.

Tong, T. et al.: Multi-modal classification of alzheimer’s disease using nonlinear graph fu-sion. Pattern recognition 63, 171–181 (2017) 6.

Smith, S. M. et al.: Tract-based spatial statistics: voxelwise analysis of multi-subject diffu-sion data. Neuroimage 31, 1487–1505 (2006) 7.

Ashburner, J. et al.: Voxel-based morphometry–the methods. Neuroimage 11(6), 805–821 (2000) 8.

Wang, L. et al.: Learning latent structure over deep fusion model of mild cognitive impair-ment. In: 17th International Symposium on Biomedical Imaging. 1039–1043. IEEE (2020) 9.

Aisen, P.S. et al.: On the path to 2025: understanding the alzheimer’s disease continuum. Alzheimer’s research & therapy (1), 60 (2017) 10. Ten Kate, M. et al.: Atrophy subtypes in prodromal alzheimer’s disease are associated with cognitive decline. Brain (12), 3443–3456 (2018) 11.

Vos, S.J. et al.: Prevalence and prognosis of alzheimer’s disease at the mild cognitive im-pairment stage. Brain (5), 1327– 1338 (2015) 12.

Stern, Y.: Cognitive reserve in ageing and alzheimer’s disease. The Lancet Neurology (11), 1006–1012 (2012) 13. Jack Jr, C.R. et al.: Hypothetical model of dynamic biomarkers of the alzheimer’s patholog-ical cascade. The Lancet Neurology (1), 119–128 (2010) 14. Mouiha, A. et al.: Toward a dynamic biomarker model in alzheimer’s disease. Journal of Alzheimer’s Disease (1), 91–100 (2012) 15. Oxtoby, N.P. et al.: Data-driven models of dominantly-inherited alzheimer’s disease pro-gression. Brain (5), 1529–1544 (2018) 16.

Li, D. et al.: Bayesian latent time joint mixed effect models for multicohort longitudinal data. Statistical methods in medical research (3), 835–845 (2019) 17. Zhang, X. et al.: Characterization of task-free and task-performance brain states via func-tional connectome patterns. Medical image analysis 17(8), 1106–1122 (2013) 18.

Amoroso, N. et al.: Deep learning reveals alzheimer’s disease onset in mci subjects: results from an international challenge. Journal of neuroscience methods 302, 3–9 (2018) 19.

Zhou, T. et al.: Effective feature learning and fusion of multimodality data using stage-wise deep neural network for dementia diagnosis. Human brain mapping 40(3), 1001-1016 (2019) 20.