[PDF] Cognitive Biomarker Prioritization in Alzheimer's Disease using Brain Morphometric Data

Abstract

Background:Cognitive assessments represent the most common clinical routine for the diagnosis of Alzheimer's Disease (AD). Given a large number of cognitive assessment tools and time-limited office visits, it is important to determine a proper set of cognitive tests for different subjects. Most current studies create guidelines of cognitive test selection for a targeted population, but they are not customized for each individual subject. In this manuscript, we develop a machine learning paradigm enabling personalized cognitive assessments prioritization. Method: We adapt a newly developed learning-to-rank approach PLTR to implement our paradigm. This method learns the latent scoring function that pushes the most effective cognitive assessments onto the top of the prioritization list. We also extend PLTR to better separate the most effective cognitive assessments and the less effective ones. Results: Our empirical study on the ADNI data shows that the proposed paradigm outperforms the state-of-the-art baselines on identifying and prioritizing individual-specific cognitive biomarkers. We conduct experiments in cross validation and level-out validation settings. In the two settings, our paradigm significantly outperforms the best baselines with improvement as much as 22.1% and 19.7%, respectively, on prioritizing cognitive features. Conclusions: The proposed paradigm achieves superior performance on prioritizing cognitive biomarkers. The cognitive biomarkers prioritized on top have great potentials to facilitate personalized diagnosis, disease subtyping, and ultimately precision medicine in AD.

Full PDF

PPeng et al.

RESEARCH

Cognitive Biomarker Prioritization in Alzheimer’sDisease using Brain Morphometric Data

Bo Peng , Xiaohui Yao , Shannon L. Risacher , Andrew J. Saykin , Li Shen , Xia Ning and for theADNI † AbstractBackground:

Cognitive assessments represent the most common clinical routine for the diagnosis ofAlzheimer’s Disease (AD). Given a large number of cognitive assessment tools and time-limited oﬃce visits, itis important to determine a proper set of cognitive tests for diﬀerent subjects. Most current studies createguidelines of cognitive test selection for a targeted population, but they are not customized for each individualsubject. In this manuscript, we develop a machine learning paradigm enabling personalized cognitiveassessments prioritization.

Method:

We adapt a newly developed learning-to-rank approach

PLTR to implement our paradigm. Thismethod learns the latent scoring function that pushes the most eﬀective cognitive assessments onto the top ofthe prioritization list. We also extend

PLTR to better separate the most eﬀective cognitive assessments and theless eﬀective ones.

Results:

Our empirical study on the ADNI data shows that the proposed paradigm outperforms thestate-of-the-art baselines on identifying and prioritizing individual-speciﬁc cognitive biomarkers. We conductexperiments in cross validation and level-out validation settings. In the two settings, our paradigm signiﬁcantlyoutperforms the best baselines with improvement as much as 22.1% and 19.7%, respectively, on prioritizingcognitive features.

Conclusions:

The proposed paradigm achieves superior performance on prioritizing cognitive biomarkers. Thecognitive biomarkers prioritized on top have great potentials to facilitate personalized diagnosis, diseasesubtyping, and ultimately precision medicine in AD.

Keywords:

Alzheimer’s Disease; Learning to Rank; Bioinformatics; Machine Learning * Correspondence: [email protected] The Ohio State University, Columbus, USFull list of author information is available at the end of the article † Data used in preparation of this article were obtained from the Alzheimer’sDisease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). Assuch, the investigators within the ADNI contributed to the design andimplementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigatorscan be found at: https://adni.loni.usc.edu/wp-content/uploads/how to apply/ADNI Data Use Agreement.pdf.This work has been accepted by BMC MIDM. Copyright may be transferredwithout notice, after which this version may no longer be accessible. a r X i v : . [ q - b i o . Q M ] N ov eng et al. Page 2 of 15

Identifying structural brain changes related to cog-nitive impairments is an important research topic inAlzheimer’s Disease (AD) study. Regression modelshave been extensively studied to predict cognitiveoutcomes using morphometric measures that are ex-tracted from structural magnetic resonance imaging(MRI) scans [1, 2]. These studies are able to advanceour understanding on the neuroanatomical basis ofcognitive impairments. However, they are not designedto have direct impacts on clinical practice. To bridgethis gap, in this manuscript we develop a novel learn-ing paradigm to rank cognitive assessments based ontheir relevance to AD using brain MRI data.Cognitive assessments represent the most commonclinical routine for AD diagnosis. Given a large num-ber of cognitive assessment tools and time-limited of-ﬁce visits, it is important to determine a proper set ofcognitive tests for the subjects. Most current studiescreate guidelines of cognitive test selection for a tar-geted population [3, 4], but they are not customizedfor each individual subject. In this work, we develop anovel learning paradigm that incorporate the ideas ofprecision medicine and customizes the cognitive testselection process to the characteristics of each indi-vidual patient. Speciﬁcally, we conduct a novel ap-plication of a newly developed learning-to-rank ap-proach, denoted as

PLTR [5], to the structural MRIand cognitive assessment data of the Alzheimer’s Dis-ease Neuroimaging Initiative (ADNI) cohort [6]. Us-ing structural MRI measures as the individual charac-teristics, we are able to not only identify individual-speciﬁc cognitive biomarkers but also prioritize themand their corresponding assessment tasks accordingto AD-speciﬁc abnormality. We also extend

PLTR to PLTR h using hinge loss [7] to more eﬀectively pri-oritize individual-speciﬁc cognitive biomarkers. The study presented in this manuscript is a substantial ex-tension from our preliminary study [8].Our study is unique and innovative from the follow-ing two perspectives. First, conventional regression-based studies for cognitive performance prediction us-ing MRI data focus on identifying relevant imagingbiomarkers at the population level. However, our pro-posed model aims to identify AD-relevant cognitivebiomarkers customized to each individual patient. Sec-ond, the identiﬁed cognitive biomarkers and assess-ments are prioritized based on the individual’s braincharacteristics. Therefore, they can be used to guidethe selection of cognitive assessments in a personal-ized manner in clinical practice; it has the potential toenable personalized diagnosis and disease subtyping. Learning-to-Rank (

LETOR ) [9] is a popular techniqueused in information retrieval [10], web search [11] andrecommender systems [12]. Existing

LETOR methodscan be classiﬁed into three categories [9]. The ﬁrst cat-egory is point-wise methods [13], in which a functionis learned to score individual instance, and then in-stances are sorted/ranked based on their scores. Thesecond category is pair-wise methods [14], which max-imize the number of correctly ordered pairs in order tolearn the optimal ranking structure among instances.The last category is list-wise methods [15], in which aranking function is learned to explicitly model the en-tire ranking. Generally, pairwise and listwise methodshave superior performance over point-wise methodsdue to their ability to leverage order structure amonginstances in learning [9]. Recently,

LETOR has also beenapplied in drug discovery and drug selection [16–19].For example, Agarwal et al. [20] developed a bipartiteranking method to prioritize drug-like compounds. He et al. [5] developed a joint push and learning-to-rankmethod to select cancer drugs for each individual pa- eng et al.

Page 3 of 15 tient. These studies demonstrate the great potentialof

LETOR in computational biology and computationalmedicine, particularly for biomarker prioritization.

The importance of using big data to enhance ADbiomarker study has been widely recognized [6]. As aresult, numerous data-driven machine learning modelshave been developed for early AD detection and AD-relevant biomarker identiﬁcation including cognitivemeasures. These models are often designed to accom-plish tasks such as classiﬁcation (e.g., [21]), regression(e.g., [1, 2, 22]) or both (e.g., [23, 24]), where imagingand other biomarker data are used to predict diagnos-tic, cognitive and/or other outcome(s) of interest. Adrawback of these methods is that, although outcome-relevant biomarkers can be identiﬁed, they are iden-tiﬁed at the population level and not speciﬁc to anyindividual subject. To bridge this gap, we adapt the

PLTR method for biomarker prioritization at the in-dividual level, which has greater potential to directlyimpact personalized diagnosis.

The imaging and cognitive data used in our study wereobtained from the Alzheimer’s Disease Neuroimag-ing Initiative (ADNI) database [6]. The ADNI waslaunched in 2003 as a public-private partnership, ledby Principal Investigator Michael W. Weiner, MD. Theprimary goal of ADNI has been to test whether se-rial MRI, PET, other biological markers, and clinicaland neuropsychological assessment can be combined tomeasure the progression of mild cognitive impairment(MCI, a prodromal stage of AD) and early AD. Forup-to-date information, Please refer to [25] for moredetailed, up-to-date information.Participants include 819 ADNI-1 subjects with 229healthy control (HC), 397 MCI and 193 AD partici- pants. We consider both MCI and AD subjects as pa-tients, and thus we have 590 cases and 229 controls.We downloaded the 1.5T baseline MRI scans and cog-nitive assessment data from the ADNI website [25].We processed the MRI scans using Freesurfer version5.1 [26], where volumetric and cortical thickness mea-sures of 101 regions relevant to AD were extracted tocharacterize brain morphometry.We focus our analysis on 151 scores assessed in15 neuropsychological tests. For convenience, we de-note these measures as cognitive features and thesetests as cognitive tasks . The 15 studied tasks includeAlzheimer’s Disease Assessment Scale (ADAS), Clini-cal Dementia Rating Scale (CDR), Functional Assess-ment Questionnaire (FAQ), Geriatric Depression Scale(GDS), Mini-Mental State Exam (MMSE), ModiﬁedHachinski Scale (MODHACH), Neuropsychiatric In-ventory Questionnaire (NPIQ), Boston Naming Test(BNT), Clock Drawing Test (CDT), Digit Span Test(DSPAN), Digit Symbol Test (DSYM), Category Flu-ency Test (FLUENCY), Weschler’s Logical MemoryScale (LOGMEM), Rey Auditory Verbal LearningTest (RAVLT) and Trail Making Test (TRAIL).

PLTR

We use the joint push and learning-to-rank methodthat we developed in He et al. [5], denoted as

PLTR ,for personalized cognitive feature prioritization.

PLTR has also been successfully applied in our prelimi-nary study [8]. We aim to prioritize cognitive featuresfor each individual patient that are most relevant tohis/her disease diagnosis. We will use patients’ brainmorphometric measures that are extracted from theirMRI scans for the cognitive feature prioritization. Thecognitive features are in the form of scores or answersin the cognitive tasks that the patients take. The pri-oritization outcomes can potentially be used in clinicalpractice to suggest the most relevant cognitive features eng et al.

Page 4 of 15 or tasks that can most eﬀectively facilitate diagnosisof an individual subject.In order to prioritize MCI/AD cognitive features,

PLTR learns and uses patient latent vector representa-tions and their imaging features to score each cognitivefeature for each individual patient. Then,

PLTR ranksthe cognitive features based on their scores. Patientswith similar imaging feature proﬁles will have similarlatent vectors and thus similiar ranking of cognitivefeatures [27, 28]. During the learning,

PLTR explicitlypushes the most relevant cognitive features on top ofthe less relevant features for each patient, and there-fore optimizes the latent patient vectors and cognitivefeature vectors in a way that they will reproduce thefeature ranking structures [9]. In

PLTR , these latentvectors are learned via solving the following optimiza-tion problem:min

U,V L s = (1 − α ) P ↑ s + αO + s + β R uv + γ R csim , (1)where α , β and γ ∈ [0 ,

1] are coeﬃcients of O + s , R uv and R csim terms, respectively; U = [ u , u , · · · , u m ]and V = [ v , v , · · · , v n ] are the latent matrices forpatients and features, respectively ( u and v are col-umn latent patient vector and feature vector, respec-tively); L s is the overall loss function. In Problem 1, P ↑ s measures the average number of relevant cognitivefeatures ranked below an irrelevant cognitive feature,deﬁned as follows, P ↑ s = m (cid:88) p =1 n + p n − p (cid:88) f − i ∈P p (cid:88) f + j ∈P + p I ( s p ( f + j ) ≤ s p ( f − i )) , (2)where m is the number of patients, f + j and f − i arethe relevant and irrelevant features of patient P p , n + p and n − p are their respective numbers, and I ( x ) is theindicator function ( I ( x ) = 1 if x is true, otherwise 0).In Equation (2), s p ( f i ) is a scoring function deﬁned as follows, s p ( f i ) = u T p v i , (3)that is, it calculates the score of feature f i on patient P p using their respective latent vectors u p and v i [29].By minimizing P ↑ s , PLTR learns to assign higher scoresto relevant features than irrelevant features so as torank the relevant features at the top of the ﬁnal rank-ing list. Note that,

PLTR learns diﬀerent latent vectorsand ranking lists for diﬀerent subjects, and thereforeenables personalized feature prioritization. In Prob-lem (1), O + s measures the ratio of mis-ordered featurepairs over the relevant features among all the subjects,deﬁned as follows, O + s = m (cid:88) p =1 |{ f + i (cid:31) P p f + j }| (cid:88) f + i (cid:31) P p f + j I ( s p ( f + i ) < s p ( f + j )) , (4)where f i (cid:31) P p f j represents that f i is ranked higherthan f j for patient P p . By minimizing O ↑ s , PLTR learnsto push the most relevant features on top of the less rel-evant features. Thus, most relevant features are pushedto the very top of the ranking list. In Problem (1), R uv is a regularizer on U and V to prevent overﬁtting, de-ﬁned as, R uv = 1 m (cid:107) U (cid:107) F + 1 n (cid:107) V (cid:107) F , (5)where (cid:107) X (cid:107) F is the Frobenius norm of matrix X . R csim is a regularizer on patients to constrain patient latentvectors, deﬁned as R csim = 1 m m (cid:88) p =1 m (cid:88) q =1 w pq (cid:107) u p − u q (cid:107) , (6)where w pq is the similarity between subject P p and P q that is calculated using the imaging features of thesubjects. The assumption here is that patients whoare similar in terms of imaging features could also besimilar in terms of cognitive features. eng et al. Page 5 of 15

PLTR h The objective of

PLTR is to score relevant featureshigher than less relevant features as shown in Equa-tion 2 and Equation 4. However, in some cases, thescore of relevant features is expected to be higher thanthat of less relevant features by a large margin. Forexample, patients can be very sensitive to a few cogni-tive tasks but less sensitive to many others. In order toincorporate such information, we propose a new hingeloss [7] based

PLTR , denoted as

PLTR h . In PLTR h , theoverall loss function is very similar to Equation 1, de-ﬁned as follows,min U,V L h = (1 − α ) P ↑ h + αO + h + β R uv + γ R csim , (7)where L h is the overall loss function; U , V , R uv and R csim are identical as those in Equation 1. In PLTR h , P ↑ h measures the average loss between the relevant fea-tures and irrelevant features using hinge loss as follows, P ↑ h = m (cid:88) p =1 n + p n − p (cid:88) f − i ∈P p (cid:88) f + j ∈P + p max(0 , t p − ( s p ( f + j ) − s p ( f − i ))) , (8)where max(0 , t p − ( s p ( f + j ) − s p ( f − i ))) is the hinge loss(max(0 , x ) = x if x >

0, otherwise 0) between the rel-evant feature f + j and the irrelevant feature f − i , and t p is the pre-deﬁned margin. Speciﬁcally, only when s p ( f + j ) − s p ( f − i ) > t p will not induce any loss duringoptimization. Otherwise, the hinge loss will be posi-tive and increase as s p ( f + j ) − s p ( f − i ) gets smaller than t p . Thus, the hinge loss forces the scores of relevantfeatures higher than those of irrelevant features by atleast t p . By doing this, the relevant features are rankedhigher than irrelevant features in the ranking list. Sim-ilarly, O + h measures the average loss among the relevantfeatures also using hinge loss as follows, O + h = m (cid:88) p =1 |{ f + i (cid:31) P p f + j }| (cid:88) f + i (cid:31) P p f + j max(0 , t o − ( s p ( f + i ) − s p ( f − j ))) , (9)where t o is also the pre-deﬁned margin. Following the protocol in our preliminary study [8],we selected all the MCI and AD patients from ADNIand conducted the following data normalization forthese patients. We ﬁrst performed a t -test on eachcognitive feature between patients and controls, andselected those features if there is a signiﬁcant diﬀer-ence between patients and controls on these features.Then, we converted the selected features into [0 ,

1] byshifting and scaling the feature values. We also con-verted all the normalized feature values according tothe Cohen’s d of the features between patients and con-trols, and thus, smaller values always indicate higherAD possibility. After that, we ﬁltered out features withvalues 0, 1 or 0 . Through the normalization and ﬁltering steps as inSection 2.4.1, we have 86 normalized imaging featuresremained. We represent each patient using a vector ofthese features, denoted as r p = [ r p , r p , · · · , r p ], inwhich r pi ( i = 1 , · · · ,

86) is an imaging feature for pa-tient p . We calculate the patient similarity from imag-ing features using the radial basis function (RBF) ker-nel, that is, w pq = exp( − (cid:107) r p − r q (cid:107) σ ), where w pq is thepatient similarity used in R csim . eng et al. Page 6 of 15

We compare

PLTR and

PLTR h with two baseline meth-ods: the Bayesian Multi-Task Multi-Kernel Learning( BMTMKL ) method [30] and the Kernelized Rank Learn-ing (

KRL ) method [31].

BMTMKL ) BMTMKL is a state-of-the-art baseline for biomarker pri-oritization. It was originally proposed to rank cell linesfor drugs and won the DREAM 7 challenge [32]. Inour study,

BMTMKL uses the multi-task and multi-kernellearning within kernelized regression to predict cogni-tive feature values and learns parameters by conduct-ing Bayesian inference. We use the patient similaritymatrix calculated from FreeSurfer features as the ker-nels in

BMTMKL . KRL ) KRL represents another state-of-the-art baseline forbiomarker prioritization. In our study,

KRL uses kernel-ized regression with a ranking loss to learn the rankingstructure of patients and to predict the cognitive fea-ture values. The objective of

KRL is to maximize thehits among the top k of the ranking list. We use thepatient similarity matrix calculated from FreeSurferfeatures as the kernels in

KRL . Following the protocol in our preliminary study [8], wetest our methods in two diﬀerent settings: cross valida-tion ( CV ) and leave-out validation ( LOV ). In CV , we ran-domly split each patient’s cognitive tasks into 5 folds:all the features of a cognitive task will be either splitinto training or testing set. We use 4 folds for trainingand the rest fold for testing, and do such experiments5 times, each with one of the 5 folds as the testingset. The overall performance of the methods is aver-aged over the 5 testing sets. This setting corresponds training patients testing patients p a t i e n t s cognitive features P P P P P f f f f f Figure 1: Data split for cross validation ( CV )training patients testing patients p a t i e n t s cognitive features P P P P P f f f f f Figure 2: Data split for leave-out validation (

LOV )to the goal to prioritize additional cognitive tasks thata patient should complete. In

LOV , we split patients(not patient tasks) into training and testing sets, anda certain patient and all his/her cognitive features willbe either in the training set or in the testing set. Thiscorresponds to the use scenario to identify the mostrelevant cognitive tasks that a new patient needs totake, based on the existing imaging information of thepatient, when the patient has not completed any cog-nitive tasks. Figure 1 and Figure 2 demonstrate the CV and LOV data split processes, respectively.Please note that as presented in Section 2.4.1, fornormalized cognitive features, smaller values alwaysindicate more AD possibility. Thus, in both settings,we use the ranking list of normalized cognitive fea-tures of each patient as ground truth for training andtesting.

We conduct grid search to identify the best param-eters on each evaluation metric for each model. We eng et al.

Page 7 of 15 use 0.3 and 0.1 as the value of t p and t o , respectively.In the experimental results, we report the combina-tions of parameters that achieve the best performanceon evaluation metrics. We implement PLTR and

PLTR h using Python 3.7.3 and Numpy 1.16.2, and run theexperiments on Xeon E5-2680 v4 with 128G memory. We use a metric named average feature hit at k (QH@ k ) as in our preliminary study [8] to evaluatethe ranking performance,QH@ k ( τ q , ˜ τ q ) = (cid:88) ki =1 I (˜ τ qi ∈ τ q (1 : k )) , (10)where τ q is the ground-truth ranking list of all thefeatures in all the tasks, τ q (1 : k ) is the top k featuresin the list, ˜ τ q is the predicted ranking list of all thefeatures, and ˜ τ qi is the i -th ranked features in ˜ τ q . Thatis, QH@ k calculates the number of features among top k in the predicted feature lists that are also in theground truth (i.e., hits). Higher QH@ k values indicatebetter prioritization performance.We use a second evaluation metric weighted averagefeature hit at k (WQH@ k ) as follows:WQH@ k ( τ q , ˜ τ q ) = (cid:88) kj =1 QH @ j ( τ q , ˜ τ q ) /k, (11)that is, WQH@ k is a weighted version of QH@ k thatcalculates the average of QH@ j ( j = 1 , · · · , k ) overtop k . Higher WQH@ k indicates more feature hits andthose hits are ranked on top in the ranking list. In in Peng et al. [8], we use the mean of the top- g nor-malized ground-truth scores/predicted scores on thefeatures of each cognitive task for a patient as the scoreof that task for that patient. For each patient, we rankthe tasks using their ground-truth scores and use theranking as the ground-truth ranking of these tasks.Thus, these scores measure how much relevant to AD the task indicates for the patients. We use the pre-dicted scores to rank cognitive tasks into the predictedranking of the tasks. We deﬁne a third evaluation met-ric task hit at k (NH g @ k ) as follows to evaluate theranking performance in terms of tasks,NH g @ k ( τ ng , ˜ τ ng ) = (cid:88) ki =1 I (˜ τ ngi ∈ τ ng (1 : k )) , (12)where τ ng /˜ τ ng is the ground-truth/predicted rankinglist of all the tasks using top- g question scores. CV Table 1 presents the performance of

PLTR , PLTR h andtwo baseline methods in the CV setting. Note that over-all, PLTR and

PLTR h have similar standard deviations; KRL and

BMTMKL have higher standard deviations com-pared to

PLTR and

PLTR h . This indicates that PLTR and

PLTR h are more robust than KRL and

BMTMKL forthe prioritization tasks.

For cognitive features from all tasks,

PLTR is ableto identify on average 2.665 ± ± PLTR h achieves similar performance as PLTR , and identi-ﬁes on average 2.599 ± ± PLTR and

PLTR h signiﬁcantly out-perform the baseline methods in terms of all the eval-uation metrics on cognitive feature level (i.e., QH@5and WQH@5). Speciﬁcally, PLTR outperforms the bestbaseline method

BMTMKL at 9.1 ± ± PLTR h also out-performs BMTMKL at 6.4 ± ± PLTR and

PLTR h are able to rank morerelevant features on top than the two state-of-the-art eng et al. Page 8 of 15

Table 1: Overall Performance in CV method parameters feature level task leveld λ QH@5 WQH@5 NH @1 NH @1 NH @1 NH @1 NH all @1 PLTR

10 - ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± PLTR h

10 - ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± KRL - 2 ± ± ± ± ± ± ± ± ± ± ± ± ± ± BMTMKL - - ± ± ± ± ± ± ± The column “ d ” corresponds to the latent dimension. The numbers in the form of x ± y represent the mean ( x ) and standard deviation ( y ). The bestperformance of each method is in bold . The best performance under each evaluation metric is underlined. baseline methods and the positions of those hits arealso higher than those in the baseline methods. For the scenario to prioritize cognitive tasks thateach patient should take,

PLTR and

PLTR h are able toidentify the top-1 most relevant task for 72.5 ± ± =0.725 ± PLTR and NH =0.743 ± PLTR h ). This indicates the strong power of PLTR and

PLTR h in prioritizing cognitive features and in rec-ommending relevant cognition tasks for real clinicalapplications. We also ﬁnd that PLTR and

PLTR h areable to outperform baseline methods on most of themetrics on cognitive task level (i.e., NH g @1). PLTR outperforms the best baseline method at 11.6 ± ± ± @1, NH @1 and NH @1, respectively. PLTR h performs even better than PLTR on NH @1 and NH @1, in addition to that itoutperforms the best performance of baseline methodsat 13.7 ± ± ± @1,NH @1 and NH @1, respectively. PLTR and

PLTR h per-form slightly worse than baseline methods on NH @1and NH all @1 (0.760 ± ± @1and 0.707 ± ± all @1). These ex-perimental results indicate that PLTR and

PLTR h areable to push the most relevant task to the top of theranking list than baseline methods when using a smallnumber of features to score cognitive tasks. Note thatin CV , each patient has only a few cognitive tasks inthe testing set. Therefore, we only consider the eval-uation at the top task in the predicted task rankings(i.e., only NH g @1 in Table 1). eng et al. Page 9 of 15

Table 1 also shows that

PLTR h outperforms PLTR on most of the metrics on cognitive task level (i.e.,NH g @1). PLTR h outperforms PLTR at 1.9 ± ± ± ± @1, NH @1, NH @1and NH all @1, respectively. This indicates that gen-erally PLTR h is better than PLTR on ranking cogni-tive tasks in CV setting. The reason could be that thehinge-based loss functions with pre-deﬁned marginscan enable signiﬁcant diﬀerence between the scoresof relevant features and irrelevant features, and thuseﬀectively push relevant features upon irrelevant fea-tures. LOV

Table 2 and Table 3 present the performance of

PLTR , PLTR h and two baseline methods in the LOV setting.Due to space limit, we did not present the standarddeviations in the tables, but they have similar trendsas those in Table 1. We ﬁrst hold out 26 (Table 2)and 52 (Table 3) AD patients as testing patients, re-spectively. We determine these hold-out AD patientsas the ones that have more than 10 similar AD pa-tients in the training set with corresponding patientsimilarities higher than 0.67 and 0.62, respectively.

Table 2 and Table 3 show that

PLTR and

PLTR h signiﬁ-cantly outperform the baseline methods in terms of allthe evaluation metrics on cognitive feature level (i.e.,QH@5 and WQH@5), which is consistent with the ex-perimental results in CV setting. When 26 patients arehold out for testing, with parameters α = 0 . β = 1 . γ = 1 . PLTR outperforms the best base-line method

KRL at 13.4% and 1.3% on QH@5 andWQH@5, respectively. The performance of

PLTR h isvery comparable with that of PLTR ” PLTR h outper-forms KRL at 13.4% and 0.5% on QH@5 and WQH@5,respectively. When 52 patients are hold out for test-ing, with parameters α = 0 . β = 0 . γ = 1 . PLTR outperforms the best baseline method

KRL at 18.1% and 7.8% on QH@5 and WQH@5, re-spectively.

PLTR h even performs better than PLTR inthis setting. In addition,

PLTR h outperforms KRL at19.7% and 9.5% on QH@5 and WQH@5, respectively.These experimental results demonstrate that for newpatients,

PLTR and

PLTR h are able to rank more rele-vant features to the top of the ranking list than the twobaseline methods. They also indicate that for new pa-tients, ranking based methods (e.g., PLTR and

PLTR h )are more eﬀective than regression based methods (e.g., KRL and

BMTMKL ) for biomarker prioritization.

Table 2 also shows that when 26 patients are hold outfor testing,

PLTR and

PLTR h are both able to identifythe top most relevant questionnaire for 84.6% of thetesting patients (i.e., 22 patients) under NH @1. Ta-ble 3 shows that when 52 patients are hold out for test-ing, PLTR and

PLTR h are both able to identify for 80.8%of the testing patients (i.e., 42 patients) under NH @1.Note that the hold-out testing patients in LOV donot have any cognitive features. Therefore, the perfor-mance of

PLTR and

PLTR h as above demonstrates theirstrong capability in identifying most AD related cog-nitive features based on imaging features only. We alsoﬁnd that PLTR and

PLTR h are able to achieve similar oreven better results compared to baseline methods interms of the evaluation metrics on cognitive task level(i.e., NH g @1 and NH g @5). When 26 patients are holdout for testing, PLTR and

PLTR h outperform the base-line methods in terms of NH g @1 (i.e., g = 1 , . . . KRL on ranking rel-evant tasks on their top-5 of predictions when g = 1or g = 5 (3.308 vs 3.423 on NH @5 and 3.808 vs 3.962on NH @5). When 52 patients are hold out for testing, PLTR and

PLTR h also achieve the best performance onmost of the evaluation metrics. They are only slightlyworse than KRL on NH @1, NH @5 (0.423 vs 0.481 on eng et al. Page 10 of 15

Table 2: Overall Performance in

LOV on 26 testing patients method feature level task levelQH@5 WQH@5 NH @1 NH @5 NH @1 NH @5 NH @1 NH @5 NH @1 NH @5 NH all @1 NH all @5 PLTR

PLTR h KRL

BMTMKL

The column “n” corresponds to the number of hold-out testing patients. The bset performance of each method is in bold . The best performanceunder each evaluation metric is underlined. NH @1 and 3.712 vs 3.808 on NH @5). These experi-mental results demonstrate that among top 5 tasks inthe ranking list, PLTR and

PLTR h rank more relevanttask on top than KRL .It’s notable that in Table 2 and Table 3, as the num-ber of features used to score cognitive tasks (i.e., g inNH g @ k ) increases, the performance of all the methodsin NH g @1 ﬁrst declines and then increases. This mayindicate that as g increases, irrelevant features whichhappen to have relatively high scores will be includedin scoring tasks, and thus degrade the model perfor-mance on NH g @1. However, generally, the scores of ir-relevant features are considerably lower than those ofrelevant ones. Thus, as more features are included, thescores for tasks are more dominated by the scores ofrelevant features and thus the performance increases.We also ﬁnd that BMTMKL performs poorly on NH @1in both Table 2 and Table 3. This indicates that BMTMKL , a regression-based method, could not wellrank relevant features and irrelevant features. It’s alsonotable that generally the best performance for the 26testing patients is better than that for 52 testing pa-tients. This may be due to that the similarities betweenthe 26 testing patients and their top 10 similar train-ing patients are higher than those for the 52 testingpatients. The high similarities enable accurate latentvectors for testing patients.Table 2 and Table 3 also show that

PLTR h is bet-ter than PLTR on ranking cognitive tasks in

LOV set-ting. When 26 patients are hold out for testing,

PLTR h outperforms PLTR on NH @5, NH @5 and NH all @5and achieves very comparable performance on the restmetrics. When 52 patients are hold out for testing, PLTR h is able to achieve better performance than PLTR on QH@5, WQH@5, NH @1, NH @1, NH @5 andNH all @5 and also achieves very comparable perfor- eng et al. Page 11 of 15

Table 3: Overall Performance in

LOV on 52 testing patients method feature level task levelQH@5 WQH@5 NH @1 NH @5 NH @1 NH @5 NH @1 NH @5 NH @1 NH @5 NH all @1 NH all @5 PLTR

PLTR h KRL

BMTMKL

The column “n” corresponds to the number of hold-out testing patients. The best performance of each model is in bold . The best performanceunder each evaluation metric is upon underline. mance on the rest metrics. Generally,

PLTR h outper-forms PLTR in terms of metrics on cognitive task level.This demonstrates the eﬀectiveness of hinge loss-basedmethods in separating relevant and irrelevant featuresduring modeling.

Our experimental results show that when NH @1achieves its best performance of 0.846 for the 26 test-ing patients in the LOV setting (i.e., the ﬁrst row blockin Table 2), the task that is most commonly priori-tized for the testing patients is Rey Auditory VerbalLearning Test (RAVLT), including the following cog-nitive features: 1) trial 1 total number of words re- called; 2) trial 2 total number of words recalled; 3)trial 3 total number of words recalled; 4) trial 4 totalnumber of words recalled; 5) trial 5 total number ofwords recalled; 6) total Score; 7) trial 6 total numberof words recalled; 8) list B total number of words re-called; 9) 30 minute delay total; and 10) 30 minutedelay recognition score. RAVLT is also the most rele-vant task in the ground truth if tasks are scored corre-spondingly. RAVLT assesses learning and memory, andhas shown promising performance in early detection ofAD [33]. A number of studies have reported high corre-lations between various RAVLT scores with diﬀerentbrain regions [34]. For instance, RAVLT recall is as-sociated with medial prefrontal cortex and hippocam- eng et al.

Page 12 of 15 pus; RAVLT recognition is highly correlated with tha-lamic and caudate nuclei. In addition, genetic analy-sis of

AP OE ε

PLTR is powerful inprioritizing cognitive features to assist AD diagnosis.Similarly, we ﬁnd the top-5 most frequent cog-nitive tasks corresponding to the performance atNH @5=3.731 for the 26 hold-out testing patients.They are: Functional Assessment Questionnaire (FAQ),Clock Drawing Test (CDT), Weschler’s Logical Mem-ory Scale (LOGMEM), Rey Auditory Verbal LearningTest (RAVLT), and Neuropsychiatric Inventory Ques-tionnaire (NPIQ). In addition to RAVLT discussedabove, other top prioritized cognitive tasks have alsobeen reported to be associated with AD or its pro-gression. In an MCI to AD conversion study, FAQ,NPIQ and RAVLT showed signiﬁcant diﬀerence be-tween MCI-converter and MCI-stable groups [35]. Wealso notice that for some testing subjects, PLTR is ableto very well reconstruct their ranking structures. Forexample, when NH @5 achieves its optimal perfor-mance 3.731, for a certain testing subject, her top-5predicted cognitive tasks RAVLT, LOGMEM, FAQ,NPIQ and CDT are exactly the top-5 cognitive tasksin the ground truth. These evidences further demon-strate the diagnostic power of our method. We have proposed a novel machine learning paradigmto prioritize cognitive assessments based on their rel-evance to AD at the individual patient level. Theparadigm tailors the cognitive biomarker discovery andcognitive assessment selection process to the brainmorphometric characteristics of each individual pa-tient. It has been implemented using newly developedlearning-to-rank method

PLTR and

PLTR h . Our empir-ical study on the ADNI data has produced promis- ing results to identify and prioritize individual-speciﬁccognitive biomarkers as well as cognitive assessmenttasks based on the individual’s structural MRI data. Inaddition, PLTR h shows better performance than PLTR on ranking cognitive assessment tasks. The resultingtop ranked cognitive biomarkers and assessment taskshave the potential to aid personalized diagnosis anddisease subtyping, and to make progress towards en-abling precision medicine in AD. AD : Alzheimer’s Disease MRI : Magnetic Resonance Imaging

ADNI : Alzheimer’s Disease Neuroimaging Initiative

LETOR : Learning-to-Rank

PET : Positron Emission Tomography

MCI : Mild Cognitive Impairment HC : Healthy Control ADAS : Alzheimer’s Disease Assessment Scale

CDR : Clinical Dementia Rating Scale

FAQ : Functional Assessment Questionnaire

GDS : Geriatric Depression Scale

MMSE : Mini-Mental State Exam

MODHACH : Modiﬁed Hachinski Scale

NPIQ : Neuropsychiatric Inventory Questionnaire

BNT : Boston Naming Test

CDT : Clock Drawing Test

DSPAN : Digit Span Test

DSYM : Digit Symbol Test

FLUENCY : Category Fluency Test

LOGMEM : Weschler’s Logical Memory Scale

RAVLT : Rey Auditory Verbal Learning Test

TRAIL : Trail Making Test

RBF : Radial Basis Function

PLTR : Joint Push and Learning-to-Rank Method

PLTR h : Joint Push and Learning-to-Rank Methodusing Hinge Loss BMTMKL : Bayesian Multi-Task Multi-Kernel Learn-ing eng et al.

Page 13 of 15

KRL : Kernelized Rank Learning CV : Cross Validation LOV : Leave-Out Validation

QH@ k : Average Feature Hit at k WQH@ k : Weighted Average Feature Hit at k NH g @ k : Task Hit at k APOE : Apolipoprotein E

EMCI : Early-MCI

The dataset supporting the conclusions of this articleis available in the Alzheimer’s Disease NeuroimagingInitiative (ADNI) [25]. ADNI data can be requested byall interested investigators; they can request it via theADNI website and must agree to acknowledge ADNIand its funders in the papers that use the data. Thereare also some other reporting requirements; the PImust give an annual report of what the data havebeen used for, and any publications arising. Moredetails are available at http://adni.loni.usc.edu/data-samples/access-data/.

Not applicable

The dataset supporting the conclusions of this articleis available in the Alzheimer’s Disease NeuroimagingInitiative (ADNI) [25].

The authors declare that they have no competing in-terests.

This work was supported in part by NIH R01 EB022574,R01 AG019771, and P30 AG010133; NSF IIS 1837964and 1855501. Any opinions, ﬁndings, and conclusionsor recommendations expressed in this material are those of the authors and do not necessarily reﬂect theviews of the funding agencies.

XN and LS designed the research study. BP and XYcontributed to the conduct of the study: XY extractedand processed the data from ADNI; BP conduced themodel development and data analysis. The results wereanalyzed, interpreted and discussed by BP, XY, SL,AJ, LS and XN. BP and XN drafted the manuscriptand all co-authors revised and approved the ﬁnal ver-sion of the manuscript.

Data collection and sharing for this project was fundedby the Alzheimer’s Disease Neuroimaging Initiative(ADNI) (National Institutes of Health Grant U01AG024904) and DOD ADNI (Department of Defenseaward number W81XWH-12-2-0012). ADNI is fundedby the National Institute on Aging, the National In-stitute of Biomedical Imaging and Bioengineering,and through generous contributions from the follow-ing: AbbVie, Alzheimer’s Association; Alzheimer’sDrug Discovery Foundation; Araclon Biotech; Bio-Clinica, Inc.; Biogen; Bristol-Myers Squibb Company;CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.;Eli Lilly and Company; EuroImmun; F. Hoﬀmann-La Roche Ltd and its aﬃliated company Genentech,Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; JanssenAlzheimer Immunotherapy Research & Development,LLC.; Johnson & Johnson Pharmaceutical Research& Development LLC.; Lumosity; Lundbeck; Merck& Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRxResearch; Neurotrack Technologies; Novartis Pharma-ceuticals Corporation; Pﬁzer Inc.; Piramal Imaging;Servier; Takeda Pharmaceutical Company; and Tran-sition Therapeutics. The Canadian Institutes of HealthResearch is providing funds to support ADNI clinicalsites in Canada. Private sector contributions are facil- eng et al.

Page 14 of 15

Author details The Ohio State University, Columbus, US. University of Pennsylvania,Philadelphia, US. Indiana University, Indianapolis, US.

References

1. Wan, J., Zhang, Z., et al. : Identifying the neuroanatomical basis ofcognitive impairment in Alzheimer’s disease by correlation- andnonlinearity-aware sparse Bayesian learning. IEEE Trans Med Imaging (7), 1475–87 (2014). doi:10.1109/TMI.2014.23147122. Yan, J., Li, T., et al. : Cortical surface biomarkers for predictingcognitive outcomes using group l2,1 norm. Neurobiol Aging

36 Suppl1 , 185–93 (2015). doi:10.1016/j.neurobiolaging.2014.07.0453. Cordell, C.B., Borson, S., et al. : Alzheimer’s Associationrecommendations for operationalizing the detection of cognitiveimpairment during the medicare annual wellness visit in a primary caresetting. Alzheimers Dement (2), 141–50 (2013).doi:10.1016/j.jalz.2012.09.0114. Scott, J., Mayo, A.M.: Instruments for detection and screening ofcognitive impairment for older adults in primary care settings: Areview. Geriatr Nurs (3), 323–329 (2018).doi:10.1016/j.gerinurse.2017.11.0015. He, Y., Liu, J., Ning, X.: Drug selection via joint push and learning torank. IEEE/ACM Transactions on Computational Biology andBioinformatics (1), 110–123 (2020)6. Weiner, M.W., Veitch, D.P., et al. : The Alzheimer’s DiseaseNeuroimaging Initiative 3: Continued innovation for clinical trialimprovement. Alzheimers Dement (5), 561–571 (2017)7. Gentile, C., Warmuth, M.K.: Linear hinge loss and average margin. In:Advances in Neural Information Processing Systems, pp. 225–231(1999)8. Peng, B., Yao, X., Risacher, S.L., Saykin, A.J., Shen, L., Ning, X.:Prioritization of cognitive assessments in alzheimer’s disease vialearning to rank using brain morphometric data. In: 2019 IEEE EMBSInternational Conference on Biomedical Health Informatics (BHI), pp.1–4 (2019). doi:10.1109/BHI.2019.88346189. Liu, T.-Y.: Learning to Rank for Information Retrieval., pp. 1–285.Springer, Verlag Berlin Heidelberg (2011)10. Li, H.: Learning to Rank for Information Retrieval and NaturalLanguage Processing (2011)11. Agichtein, E., Brill, E., Dumais, S., Brill, E., Dumais, S.: Improvingweb search ranking by incorporating user behavior. In: Proceedings of SIGIR 2006 (2006)12. Karatzoglou, A., Baltrunas, L., Shi, Y.: Learning to rank forrecommender systems. In: Proceedings of the 7th ACM Conference onRecommender Systems. RecSys ’13, pp. 493–494. ACM, New York,NY, USA (2013). doi:10.1145/2507157.2508063.http://doi.acm.org/10.1145/2507157.250806313. Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., Li, H.: Learning to rank:from pairwise approach to listwise approach. In: Proceedings of the24th International Conference on Machine Learning, pp. 129–136(2007). ACM14. Burges, C.J.C., Ragno, R., Le, Q.V.: Learning to rank with nonsmoothcost functions. In: Proceedings of the 19th International Conference onNeural Information Processing Systems. NIPS’06, pp. 193–200. MITPress, Cambridge, MA, USA (2006)15. Lebanon, G., Laﬀerty, J.: Cranking: Combining rankings usingconditional probability models on permutations. In: ICML, vol. 2, pp.363–370 (2002). Citeseer16. Liu, J., Ning, X.: Multi-assay-based compound prioritization viaassistance utilization: a machine learning framework. Journal ofchemical information and modeling (3), 484–498 (2017)17. Zhang, W., Ji, L., Chen, Y., Tang, K., Wang, H., Zhu, R., Jia, W.,Cao, Z., Liu, Q.: When drug discovery meets web search: learning torank for ligand-based virtual screening. Journal of cheminformatics (1), 5 (2015)18. Liu, J., Ning, X.: Diﬀerential compound prioritization via bi-directionalselectivity push with power. In: Proceedings of the 8th ACMInternational Conference on Bioinformatics, ComputationalBiology,and Health Informatics. ACM-BCB ’17, pp. 394–399. ACM,New York, NY, USA (2017). doi:10.1145/3107411.3107486.http://doi.acm.org/10.1145/3107411.310748619. Liu, J., Ning, X.: Diﬀerential compound prioritization via bi-directionalselectivity push with power. Journal of Chemical Information andModeling (12), 2958–2975 (2017). doi:10.1021/acs.jcim.7b00552.PMID: 29178784. http://dx.doi.org/10.1021/acs.jcim.7b0055220. Agarwal, S., Dugar, D., Sengupta, S.: Ranking chemical structures fordrug discovery: a new machine learning approach. Journal of chemicalinformation and modeling (5), 716–731 (2010)21. Wang, X., Liu, K., Yan, J., Risacher, S.L., Saykin, A.J., Shen, L.,Huang, H., et al. : Predicting interrelated alzheimer’s disease outcomesvia new self-learned structured low-rank model. In: InternationalConference on Information Processing in Medical Imaging, pp.198–209 (2017). Springer22. Yan, J., Deng, C., Luo, L., Wang, X., Yao, X., Shen, L., Huang, H.:Identifying imaging markers for predicting cognitive assessments usingwasserstein distances based matrix regression. Front Neurosci , 668(2019). doi:10.3389/fnins.2019.0066823. Wang, H., Nie, F., Huang, H., Risacher, S.L., Saykin, A.J., Shen, L.,Alzheimer’s Disease Neuroimaging, I.: Identifying disease sensitive andquantitative trait-relevant biomarkers from multidimensionalheterogeneous imaging genetics data via sparse multimodal multitasklearning. Bioinformatics (12), 127–36 (2012). eng et al. Page 15 of 15 doi:10.1093/bioinformatics/bts22824. Brand, L., Wang, H., Huang, H., Risacher, S., Saykin, A., Shen, L., etal. : Joint high-order multi-task feature learning to predict theprogression of alzheimer’s disease. In: International Conference onMedical Image Computing and Computer-Assisted Intervention, pp.555–562 (2018). Springer25. Weiner, M.W., Veitch, D.P., et al.: Alzheimer’s Disease NeuroimagingInitiative (Accessed 22 July 2020). adni.loni.usc.edu

26. Risacher, S., Kim, S., et al. : The role of apolipoprotein e (apoe)genotype in early mild cognitive impairment (e-mci). Frontiers in AgingNeuroscience , 11 (2013)27. Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-basedcollaborative ﬁltering recommendation algorithms. In: Proceedings ofthe 10th International Conference on World Wide Web. WWW ’01,pp. 285–295. Association for Computing Machinery, New York, NY,USA (2001). doi:10.1145/371920.372071.https://doi.org/10.1145/371920.37207128. Wang, J., De Vries, A.P., Reinders, M.J.: Unifying user-based anditem-based collaborative ﬁltering approaches by similarity fusion. In:Proceedings of the 29th Annual International ACM SIGIR Conferenceon Research and Development in Information Retrieval, pp. 501–508(2006)29. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques forrecommender systems. Computer (8), 30–37 (2009)30. Costello, J.C., Heiser, L.M., Georgii, E., G¨onen, M., Menden, M.P.,Wang, N.J., Bansal, M., Hintsanen, P., Khan, S.A., Mpindi, J.-P., etal. : A community eﬀort to assess and improve drug sensitivityprediction algorithms. Nature biotechnology (12), 1202 (2014)31. He, X., Folkman, L., Borgwardt, K.: Kernelized rank learning forpersonalized drug recommendation. Bioinformatics (16), 2808–2816(2018)32. Challenge, D..: DREAM 7 NCI-DREAM Drug Sensitivity PredictionChallenge (Accessed 23 July 2020). http://dreamchallenges.org/project/dream-7-nci-dream-drug-sensitivity-prediction-challenge/

33. Moradi, E., Hallikainen, I., et al. : Rey’s Auditory Verbal Learning Testscores can be predicted from whole brain MRI in Alzheimer’s disease.NeuroImage: Clinical , 415–427 (2017)34. Balthazar, M.L.F., Yasuda, C.L., et al. : Learning, retrieval, andrecognition are compromised in aMCI and mild AD: Are distinctepisodic memory processes mediated by the same anatomicalstructures? J Int Neuropsychol Soc. (1), 205–209 (2010)35. Risacher, S.L., Saykin, A.J., et al. : Baseline MRI Predictors ofConversion from MCI to Probable AD in the ADNI Cohort. CurrentAlzheimer Research6