Age-Net: An MRI-Based Iterative Framework for Brain Biological Age Estimation
Karim Armanious, Sherif Abdulatif, Wenbin Shi, Shashank Salian, Thomas Küstner, Daniel Weiskopf, Tobias Hepp, Sergios Gatidis, Bin Yang
AAge-Net: An MRI-Based IterativeFramework for Biological Age Estimation
Karim Armanious, Sherif Abdulatif, Wenbin Shi, Shashank Salian,Thomas K¨ustner, Daniel Weiskopf, Tobias Hepp, Sergios Gatidis, Bin Yang
Abstract —The concept of biological age (BA) - althoughimportant in clinical practice - is hard to grasp mainly dueto lack of a clearly defined reference standard. For specificapplications, especially in pediatrics, medical image data areused for BA estimation in a routine clinical context. Beyondthis young age group, BA estimation is restricted to whole-body assessment using non-imaging indicators such as bloodbiomarkers, genetic and cellular data. However, various organsystems may exhibit different aging characteristics due to lifestyleand genetic factors. Thus, a whole-body assessment of the BAdoes not reflect the deviations of aging behavior between organs.To this end, we propose a new imaging-based framework fororgan-specific BA estimation. As a first step, we introduce achronological age (CA) estimation framework using deep con-volutional neural networks (Age-Net). We quantitatively assessthe performance of this framework in comparison to existingCA estimation approaches. Furthermore, we expand upon Age-Net with a novel iterative data-cleaning algorithm to segregateatypical-aging patients (BA (cid:54)≈
CA) from the given population.In this manner, we hypothesize that the remaining populationshould approximate the true BA behaviour. For this initialstudy, we apply the proposed methodology on a brain magneticresonance image (MRI) dataset containing healthy individualsas well as Alzheimer’s patients with different dementia ratings.We demonstrate the correlation between the predicted BAs andthe expected cognitive deterioration in Alzheimers patients. Astatistical and visualization-based analysis has provided evidenceregarding the potential and current challenges of the proposedmethodology.
Index Terms —Biological age estimation, deep learning, chrono-logical age, magnetic resonance imaging.
I. I
NTRODUCTION A GE is one of the most important parameters describingindividuals in a medical context. For instance, age has asignificant impact on the establishment of working diagnosesand the choice of appropriate diagnostic tests [2]. Similarly,age is also an important parameter influencing therapeuticdecisions in a wide range of clinical situations [3], [4].
K. Armanious, S. Abdulatif, W. Shi and B. Yang are with the Institute ofSignal Processing and System Theory, University of Stuttgart, 70569 Stuttgart,Germany (e-mail: [email protected])S. Salian and D. Weiskopf are with the Visualization Research Center,University of Stuttgart, 70569 Stuttgart, GermanyT. K¨ustner and S. Gatidis are with the Department of Diagnostic and Inter-ventional Radiology, University Hospital Tbingen, 72076 Tbingen, GermanyT. Hepp is with the Empirical Inference Department, Max Planck Institutefor Intelligent Systems, 72076 Tbingen, GermanyThis paper was accepted in part at the IEEE European Signal ProcessingConference (EUSIPCO), 2020 [1].The first two authors equally contributed to this work.
However, age-related biological phenotypes can deviatesignificantly between individuals within the same age group.These observations have motivated the concept of biologicalage (BA) in contrast to chronological age (CA) [5]. CA is de-scribed as the amount of time since the birth of an individual.Unlike CA, BA is not clearly defined. It can be described asa measure for the extent of genetic, metabolic and functionalchanges in an individual that occur during the process of aging.Thus, BA can be considered as an extension to the traditionalconcept of CA in addition to any organ-specific acceleratedor delayed aging characteristics [6]–[8]. Despite this relativelyimprecise definition, the potential impact of the concept of BAon patient management is easily conceivable. It is a commonpractice for clinicians to assess the overall condition of patientsas part of the clinical examination relative to their respectiveage group and incorporate this impression into their medicaldecisions. However, these personal estimates are subjectiveand not easily quantifiable.As an expansion to the concept of BA, the notion of organ-specific BA has been proposed aiming to describe changes inmorphology, biology and function of organ systems that occurwith aging [9]. This concept is based on the rationale thatsingle organs or organ systems can be affected by differentgenetic or environmental factors and, thus, display differentcourses of aging. As an example, parameters of pulmonaryfunction were proposed as measures for lung BA [10].A large body of research attempts the quantification of BAusing non-imaging data. More specifically, age-dependant vari-ables such as genetic [5], [6], cellular [7], phenotypic [8], [11]and epidemiological data [12], blood biomarkers [13], [14]and physical activity [15] have been used as indicators for theBA. Traditional approaches rely on classical techniques suchas multiple linear regression (MLR) [16] or the Klemera andDoubal (KD) method [17], whereas recent works adopt deepneural networks [18], [19]. The majority of these approachesutilize large cohort datasets for the prediction of the mortalityrisk [20]–[22]. Other methods incorporate CA as ground-truthlabels and examine the relation between the predicted ages andother health indicators, such as the work ability index (WAI)[23] and frailty index (FI) [24], for assessing BA.Nonetheless, the above non-imaging approaches lead to awhole-body assessment of BA. In that sense, they are notcapable of recognizing the differences in aging characteristicsbetween individual organ systems. In this context, medicalimaging may potentially provide significant information al- a r X i v : . [ ee ss . I V ] S e p a) Hybrid 3D network model with a stem network followed by a basic building block of two inception blocks and a fire module.(b) Inception block basic architecture. (c) Fire module basic architecture. Fig. 1: An overview of the proposed chronological age estimation network (Age-Net) with the detailed architecture of each block. Thenetwork takes as an input either the whole organ volume or volume chunks of segmented brain gray matter (GM). The dimensions of theseparate blocks are the same regardless the feeding strategy (full/chunks). lowing for non-invasive estimation of organ-specific BA.The use of medical imaging for age estimation in a clinicalsetting is limited mainly to skeletal age estimation in infantsand adolescents using conventional radiography and MRI[25]–[28]. Beyond skeletal age estimation, first studies haveintroduced the concept of brain age based on changes inbrain morphology (e.g. changes in subvolumes) and providedevidence for associations between premature brain aging andcognitive function [29], [30]. Brain age can be defined bycomparing an individuals morphological brain features to areference data base of the underlying population. This ispossible using brain MRI due to the highly constant anatomyof the central nervous system that allows for detection of age-dependent morphological deviations. Similarly, Alzheimersdisease was found to correlate directly with abnormal brainaging [31].More recent efforts have incorporated the use of machinelearning (ML) and deep learning (DL) techniques for medicalimaging-based age estimation. Traditional approaches, suchas kernel methods [32] and support vector machines (SVMs)[33], were initially utilized for brain CA estimation usingT1-weighted MRI volumes. Also, atlas-based methods wereemployed to extract effective local features for the same task[34]. However, in recent years, the use of convolutional neuralnetworks (CNNs) has become more prevalent due to strongresults in a multitude of medical tasks, such as classificationand segmentation [35]–[38]. For instance, a deep CNN hasbeen utilized for prediction of brain age using 2D T1-weighedMR images [39]. This framework was then expanded upon to incorporate manually extracted features in addition to the 2DCNN architecture [40]. Recent advances have attempted theuse of relatively shallow 3D CNN architectures to incorporatethe spatial information between slices in the brain age esti-mation procedure [41], [42]. For forensic applications, a largebody of research has utilized hand and skeletal MRI volumesto estimate the BA [42]–[45].As stated clearly in the most recent survey on this topic [46],all DL approaches, whether imaging or non-imaging based,relies on the CA as ground-truth labels for BA prediction [41],[47]. Thus, the predicted ages cannot be used to assess the trueaging characteristics of the test subjects. To the best of ourknowledge, the problem of defining BA ground-truth labels isstill not possible and remains an open research question.The purpose of this study is to bridge the gap betweenchronological and biological age estimation. This is achievedby, first, introducing a new DL framework for organ-specificCA estimation using 3D medical imaging data. The perfor-mance of this framework is validated by a quantitative compar-ison with other concurrent CA estimation networks for brainage estimation. Additionally, a novel iterative training strategyis presented as an initial solution for approximating organ-specific BA labels. This is achieved by identifying outlierswho exhibit atypical-aging characteristics. These outliers arethen segregated from the training dataset. A serious challengepresents itself on how to validate the accuracy of the utilizedtraining approach. To this end, we apply the proposed method-ology on a dataset containing both healthy and Alzheimer’saffected patients. Subsequently, we quantify the amount of
ABLE IAge-Net Architecture.
Layer Output shape × × × —Inception stem network × × × × × × × × × α = 16, β = 64) × × × × × × × × × α = 16, β = 64) × × × × × × × × × α = 16, β = 64) × × × × × × × × × α = 32, β = 256) × × × Alzheimer’s patients detected as atypically-aging patients bythe iterative strategy. Statistical and visual analysis of theresults is conducted to illustrate the merit and limitations ofthe proposed methodology.This paper is organized as follows: Sec. II describes theproposed CA estimation network together with the conductedcomparative study and the corresponding results. Sec. IIIpresents the iterative data-cleaning strategy for BA estimationand describes the conducted experimental evaluations. Finally,Sec. IV presents the results and discussions for the BAframework followed by the conclusion in Sec. V.II. C
HRONOLOGICAL A GE E STIMATION
In this section, the proposed DL architecture for organ-specific CA estimation is introduced. This network is laterutilized as the foundation of the proposed iterative trainingstrategy for BA estimation. A detailed description of theexperiments conducted to validate the proposed architectureis presented.
A. Architectural Details
The proposed network for CA estimation, hereby referredto as Age-Net, is illustrated in Fig. 1a. Based on an ex-tensive assessment of different state-of-the-art DL structures,including ResNeXt [48] and DenseNets [49] among others,the proposed regression network was constructed out of ahybrid combination of inception v1 [50] and SqueezeNet [51]architectures.The inception modules are based on the concept of split-transform-merge strategy where each module is comprisedof parallel filters with different kernel dimensionality, whichresults in a network growing wider instead of deeper. Thisenables learning deeper feature representations by increasingthe capacity of the network while mitigating the increasedcomputation budget associated with network depth [50]. In
10 20 30 40 50 60 70 80 9005101520253035
Age [years] N u m b e r o f p a ti e n t s Male: µ = 46 . , σ = 16 . yrsFemale: µ = 50 . , σ = 16 . yrs Fig. 2: Distribution of the open-source brain IXI MR datset. Thedataset covers diverse age categories with a mean of µ ≈ yearsand a standard deviation of σ ≈ years. this work, we utilize the inflated inception v1 architecture,which is a 3D realization of conventional inception modulesachieved via inflating all filters and pooling kernels into their3D counterparts [52]. An illustration of an inflated inceptionmodule is illustrated in Fig. 1b.Additionally, fire modules proposed in SqueezeNet are alsoincorporated in the Age-Net architecture [51]. They consistof squeeze-and-expand layers comprised of a combinationof × × and × × convolutions that help reducethe total number of trainable parameters while enhancing therepresentation capacity. An illustration of the fire modules isdepicted in Fig. 1c.The final architecture for Age-Net is composed of an initialstem network consisting of × × convolutions and max-pooling layers. This is followed by four modules concatenatedtogether in an end-to-end manner. Each module consistsof two inception blocks followed by a single fire module.In the above modules, each convolutional layer is followedby batch normalization and a ReLU activation function. L2regularization is additionally utilized for each convolutionallayer. A global average pooling layer is then applied as astructural regularizer to reduce the four-dimensional tensor toa one-dimensional output vector of 512 features. Finally, threedense layers combine additional gender information before afinal regression layer outputs the predicted age. The completearchitectural details of the Age-Net are outlined in Table I.Kernel parameters for the different inception blocks can befound in the original inception v1 publication [51]. B. Input Pipeline
Empirically, the method of feeding the input MR volumesto the Age-Net was found to have a significant impact onthe network performance. Thus, two different approaches offeeding the 3D volumes are investigated in this work. First,entire MR volumes are fed as inputs to the network, whichproduces a single predicted age for each test subject. Due tothe relatively limited number of training patients, data augmen-tation is essential to prevent network overfitting. Accordingly,horizontal flipping and translating the input volumes within a) (b)
Fig. 3: Examples of input brain GM from the axial orientation for(a) 23 year old subject, (b) 86 year old subject. a pre-defined voxel-range ( × ) in the axial orientationwere incorporated in the input pipeline. Despite its simplicity,this approach comes with a significant cost on the trainingefficiency due to the large memory space required as well asthe on-the-fly data augmentations.The second data feeding strategy entails dividing the inputMR volumes into smaller 3D chucks and then feeding eachchunk separately. This implicitly augments the training processby expanding the number of input samples to the network,thus, negating the need for on-the-fly data augmentations. Thefinal predicted score for a test subject is then given as themean of the predicted ages for the different input chunks.In addition to being advantageous from a training efficiencyperspective, we hypothesize that this approach could assistin the unsupervised localization of anomalies and lesions perchunk. This can be achieved by investigating irregularities inthe prediction scores for individual chunks compared to theCA ground-truth label for each test subject. However, thishypothesis will be further investigated in the future. C. Dataset and Pre-processing
The proposed Age-Net architecture is evaluated on the taskof estimating organ-specific CA for the brain region. For thispurpose, we utilize the open source IXI dataset for brain MRscans [53]. T1-weighted MR scans with an original matrix sizeof × × voxels from 562 subjects were utilized.The scans were acquired according to a flip angle of 8, echotime of 4.6 ms and a repetition time of 9.8 ms. Scans from420, 92 and 50 subjects were used for training, validation andtesting, respectively. This was achieved while maintaining abalanced number of scans from all age-groups in all datasubsets. For pre-processing, we utilize the steps recommendedin [54], [55]. As a first step, all 3D volumes were realignedto provide a common orientation for brain visualization. Thiswas achieved by utilizing the canonical Montreal NeurologicalInstitute (MNI) 152 template adopted by the InternationalConsortium for Brain Mapping (ICBM) [54]. Since the GrayMatter (GM) content of the brain was previously provento be a strong indicator of the brain age [56], the GMcontent of the aligned brain volumes was segmented usingthe statistical parameter mapping 12 (SPM12) software [57].The resulting tissue maps were registered using DARTEL [58]followed by normalization and modulation using a Jacobiandeformation map. The resultant final GM volumes are of size × × voxels. The CA histogram for the utilized TABLE IIQuantitative comparison for CA estimation.
Model MAE SD Bias RMSE Corr.2D-Huang [39] 3.529 4.302 1.250 4.480 0.9693D-Ueda [59] 3.705 4.298 1.268 4.481 0.969Age-Net-Volume 2.658 3.532 0.608 3.584 0.979Age-Net-Chunk 2.283 3.546 0.902 3.659 0.978Age-Net-Gender brain MR volumes is depicted in Fig. 2 and example imagesare illustrated in Fig. 3.
D. Experiments
The proposed Age-Net architecture for CA estimation isinvestigated for the two different data feeding strategies de-scribed above. The first variant (Age-Net-volume) involvesfeeding the entire MR volume as input to the network. Incontrast, the second variant (Age-Net-chunk) feeds smaller3D chunks as inputs instead. Each volume is divided into20 non-overlapping xx chunks of matrix size × × .Additionally, we hypothesize that including additional meta-information about the test subjects, i.e. gender, would assistin enhancing the age-regression performance. As such, anadditional experiment was conduced to investigate the effectof including the gender (Age-Net-gender) with the chunk datafeeding strategy.To demonstrate the performance of Age-Net, quantitativecomparisons were conduced against other regression networkspreviously proposed for brain MRI CA estimation. First,we compared against the framework provided in [39] (2D-Huang) that consists of a modified deep VGG-Net [60]. Also,comparisons were conduced against a state-of-the-art brainage regression framework (3D-Ueda) [59]. This network iscomprised of four 3D convolutional blocks together with maxpooling and dense layers. All networks were trained untilconvergence to minimize the mean absolute error (MAE) lossfunction on a single NVIDIA Titan-X GPU using the Adamoptimizer [61] with Nesterov momentum (0.9) and a learningrate of − . All implementations of Age-Net will be madepublicly available upon the publication of this work . Severalmetrics were calculated for the quantitative comparisons: theMAE, standard deviation (SD), bias, root mean square error(RMSE) and the correlation coefficient (Corr.) between thepredicted ages and the ground-truth CA labels. E. Results
The results of the CA estimation for the IXI brain datasetare presented in Table II and Fig. 4. The current approaches byHuang [39] and Ueda [59] exhibit comparable performance.The proposed 3D Age-Net architecture outperformed thesetwo concurrent approaches across the utilized metrics. Feedingan entire MR volume as input enhanced the MAE to . years with a lower bias of . years and an improved RMSEof . years. Additionally, adapting the input pipeline to https://github.com/KarimArmanious + . S D : . - . S D : - . Chronological age ( CA ) [yrs] E s ti m a t e d c h r ono l og i ca l a g e ( (cid:100) C A ) [ y r s ] (cid:100) CA → CA + 6.5195% Confidence intervalMAE → ± →
20 30 40 50 60 70 80102030405060708090 + . S D : . - . S D : - . Chronological age ( CA ) [yrs] E s ti m a t e d c h r ono l og i ca l a g e ( (cid:100) C A ) [ y r s ] (cid:100) CA → CA + 2.4995% Confidence intervalMAE → ± →
20 30 40 50 60 70 80102030405060708090 + . S D : . - . S D : - . Chronological age ( CA ) [yrs] E s ti m a t e d c h r ono l og i ca l a g e ( (cid:100) C A ) [ y r s ] Age-Net-Gender (cid:100) CA → CA + 2.8295% Confidence intervalMAE → ± → Fig. 4: Statistical analysis and comparison between the networks proposed in literature and the proposed Age-Net with the chunk-basedfeeding strategy. The Age-Net approach outperforms the concurrent MRI-based CA estimation approaches across the investigated metrics.Fig. 5: An overview of the iterative data-cleaning strategy idea whereatypical outliers (BA (cid:54)≈
CA) are segregated from the training data.Thus, the Age-Net (with the chunk data feeding strategy) is trainedwith only typical-aging dataset (BA ≈ CA). accommodate smaller 3D chunks instead of a full volumeimproved the MAE further by approximately . years albeitwith an increased bias of . years. Furthermore, includingthe gender information with the chunk data feeding strategyresulted in the best quantitative scores represented by MAE of . years, the lowest systematic error of . years in bias andthe smallest RMSE of . years. An interesting observationregarding the quantitative metrics is that all different CAestimation approaches result in positive bias values. Thisindicates the tendency of the investigated test subjects toexhibit accelerated aging characteristics.III. B IOLOGICAL A GE E STIMATION
The main challenge for image-based BA estimation is thelack of ground-truth labels since BA is not clearly defined [18],[19]. As such, previous approaches attempting this task hadto, instead, rely on utilizing CA labels [43]–[45]. However,aging is an organ-specific process affected by a multitude offactors such as lifestyle and genetics. Thus, utilizing CA as ground-truth will not provide results that are indicative of thetrue aging features of the organs in question. Another option isto rely on subjective evaluations by radiologists. Nevertheless,this time-consuming and subjective process is challenging forlarge datasets and depends on the relative experience of theradiologists as it is not possible to accurately label the BA.To resolve this challenge, we propose an iterative data-cleaning strategy to approximate BA labels. This is achievedby iteratively identifying and subsequently removing outliersthat exhibit atypical-aging characteristics, whether accelerated ( BA > CA ) or delayed aging ( BA < CA ) . The rationale be-hind this approach is to arrive at a typical-aging dataset inwhich the CA labels resemble the true BA labels ( BA ≈ CA ) .We hypothesize that training on this dataset should help tobridge the gap from chronological to biological age estimation.Since the Age-Net-Gender framework was proven in the pre-vious section to result in the best quantitative CA estimation,we herby utilize this approach as the baseline for the followingBA estimation approach. For simplicity, we herby refer to thisarchitecture as “Age-Net”. A basic outline of this strategy isdepicted in Fig. 5. In the next sections, the introduced iterativedata-cleaning strategy and the outlier detection procedure arefurther defined. A. Iterative Data-Cleaning Strategy
A step-wise overview of the iterative data-cleaning strategyfor BA estimation is illustrated in Fig. 6. Initially, the availablebrain MR scans were divided into training and testing datasets.Care was taken so that the training subjects represent theentirety of the available age spectrum in a balanced manner.For each iteration, the first step is to shuffle and splitthe training data into two subsets. The first subset is usedto train an Age-Net architecture utilizing the chunk datafeeding strategy with gender information (Age-Net-Gender)till convergence. This input pipeline was chosen as it showedthe best results for CA estimation, as described above inSec. II. The trained model is then validated on the secondsubset and the estimated CAs ( (cid:99) CA ) for the different patientsis used to calculate a patient-dependent threshold γ BA . This ig. 6: A detailed flow chart of the proposed iterative data-cleaning strategy for the extraction of BA labels. threshold is then utilized for the detection of outliers whoexhibit atypical-aging characteristics in the validation subset.The process of threshold calculation and outlier detection isexplained in more details in the next subsection. The identifiedpatients are then flagged as outliers. A new iteration wouldthen be repeated starting with merging the validation sampleswith the training subset, reshuffling and repeating the processby training the Age-Net framework from scratch.At the end of an iteration, if no outliers are detectedin the validation subset, two arguments could be presented.First, the dataset has been thoroughly filtered out with allatypically-aging patients identified as outliers. Thus, no furtherrefinement of the dataset is possible. Another explanation isthat despite the lack of outliers in the validation data, somecould still exist in the training subset. To protect against thispossibility, an empirical stopping condition is enabled thatstates that three consecutive data-cleaning iterations trainedwith different initializations must yield no new outliers beforethe iterative strategy can be terminated.Upon termination of the data-cleaning algorithm, all patientswho were flagged as outliers in more than one iteration areremoved from the training dataset. This serves to assert thatno typically-aging patient is wrongfully detected as an outlier.Also, this process assists in maintaining the training datadistribution during the data-cleaning strategy. Finally, an Age-Net architecture is trained on the cleaned dataset (after theremoval of the outliers) where the CA labels should correspondapproximately to the true BA labels (BA ≈ CA).
B. Outlier Detection
In this work, we utilize a chunk data feeding strategy wereeach input MR volume is divided into K smaller chunks beforebeing fed as input to the Age-Net. Outlier detection is initiatedby first calculating a consolidated CA estimate for each patientin the validation dataset. This is achieved by averaging out thepredicted ages ( (cid:99) CA n,i ) for each chunk i in the MR volume ofpatient n as: (cid:99) CA n = 1 K K (cid:88) i =1 (cid:99) CA n,i (1) Additionally, the standard deviation for the different chunkpredictions is calculated for each patient as: σ n = (cid:118)(cid:117)(cid:117)(cid:116) K K (cid:88) i =1 (cid:16) (cid:99) CA n,i − (cid:99) CA n (cid:17) (2)This is repeated for all patients to obtain the vectors: (cid:99) CA = (cid:99) CA (cid:99) CA ... (cid:99) CA N , σ = σ σ ... σ N (3)where N is the total number of patients in the validationdataset. For outlier detection, we compare the predicated agedeviations (D) against a patient-dependent threshold ( γ BA ),both defined as:D = (cid:12)(cid:12)(cid:12) (cid:99) CA − CA (cid:12)(cid:12)(cid:12) , γ BA = R · σ (4)where R is a pre-defined constant value. The n th patient isflagged as an outlier only if the age deviation exceeds thecorresponding threshold value:D n > γ BA ,n (5)Assuming a normal distribution for the chunk predictions,the arbitrary constant R was set to . to reflect the %confidence interval of the mean predicted age of each patient,as illustrated in Fig. 7. At the end of each iteration, the trainingand validation datasets are reshuffled and a new iterationcommences until the stopping condition is reached. Upontermination of the iterative data-cleaning, all patients detectedas outliers are removed from the final training dataset only Confidence Interval − . σ . σ (cid:99) CACA
Fig. 7: Illustration of the outlier detection threshold γ BA ,n . Age [years] N u m b e r o f p a ti e n t s Before balancingAfter balancing
Fig. 8: Histogram of the utilized subset from the OASIS-3 MRdataset. Data balancing was performed with respect to the numberof data samples in the different age groups to ensure the Age-Netis trained on a class-balanced dataset ( ). The remaining datasamples were allocated to the test dataset. if they were flagged in more than one iteration. The finalframework is then trained on a dataset containing only patientsexhibiting typical-aging characteristics.
C. Dataset
Due to the lack of reference ground-truth BA labels, thevalidation of the proposed iterative data-cleaning strategyposes a key challenge. For this purpose, we investigate theperformance of the introduced training strategy on a class-balanced subset from the OASIS-3 brain dataset [62]. Thisdataset encompasses T1-weighted MR scans from anonymizedcognitively healthy individuals as well as patients sufferingfrom dementia due to Alzheimer’s disease. The scans wereacquired according to a flip angle of 10, echo time of 4ms and a repetition time of 9.7 ms. The degree of cognitivedeterioration in the Alzheimer’s patients is indicated by theclinical dementia rating (CDR), which distinguishes betweenquestionable, mild and moderate dementia by the CDR scoresof 0.5, 1 and 2, respectively [63].In total, we utilize a subset of 1230 MRI scans from 950patients in the age-range of 48-97 years. To ensure that theAge-Net is trained on a class-balanced dataset, we allocate565 MR scans from 405 healthy patients and 185 scans from165 Alzheimer’s patients for training the proposed framework.The remaining 490 scans from 380 patients (270: healthy, 110:Alzheimer’s) are assigned as the test set. The histogram ofthe utilized OASIS-3 data is depicted Fig. 8. The same pre-processing pipeline described previously in Sec. II-C was alsoapplied with MR chunks of matrix size × × beingfed to the framework as inputs. D. Experiments
In previous studies, it has been reported that Alzheimer’sdisease correlates directly with abnormal brain characteristics,particularly accelerated aging [31]. We apply this observationas an attempt to evaluate the capability of the iterative data-cleaning strategy in detecting atypically-aging individuals.More specifically, we count the number of patients flagged as /
405 = 25 % /
110 = 54 % /
49 = 73 % / % Iteration C u m u l a ti v e O u tli e r s [ % ] NormalCDR . CDR CDR Fig. 9: The cumulative number of outliers detected during the iterativedata-cleaning strategy with respect to the total number of patients inthe training dataset from the corresponding clinical dementia rating(CDR) levels. Annotated is the amount of outliers / total number ofpatients in each CDR level. outliers by the proposed training strategy. Further, we analyzethe percentage of cognitively healthy individuals (CDR = 0)and Alzheimer’s patients (CDR = 0.5, 1, 2) detected as outlierswith respect to their corresponding populations in the trainingdataset. We hypothesize that the proposed training strategyshould be capable of accurately detecting patients with mildand moderate dementia as those theoretically exhibit the mostpronounced atypical-aging characteristics.Additionally, we compare the final predicted BAs, afterapplying the iterative data-cleaning framework, against the ageestimates from an Age-Net trained by using CA as ground-truth labels. We illustrate, and subsequently analyze, the distri-butions of the resultant age estimates of the two frameworks.This was conducted separately for both cognitively healthyand Alzheimer’s patients.Motivated by the recent interest in providing explainable DLframeworks, we attempt to shed light onto the reason beyondthe predicted BA decision of our proposed network. For thispurpose, we employ recent DL visualization techniques tohighlight the most significant brain regions accounting tothe network’s prediction from patients labelled as outliers orhealthy by the iterative strategy. Specifically, we utilize atinference a combination of saliency-map visualizations [64],[65] together with GradCAM++ [66] for more fine grainedvisualization map. The resultant outputs of each visualizationtechnique are combined via a product operation to obtain thefinal visualization maps. We compare the differences betweenthe visualization maps from healthy and Alzheimer’s patientsin the same age groups. Also, we analyze the visualizations ofcognitively healthy individuals (CDR = 0) who were deemedby the framework as exhibiting atypical-aging characteristics,thus flagged as outliers.It is important to note that the results of these visualizationtechniques does not refer directly to the actual brain biologicalactivity. Rather, it is utilized as means to explain the network’sdecisions and highlight the differences between the results ofhealthy and outlier patients. − − − . . . . . .
12 0 . years Deviations P D F (cid:99) CA − CA (cid:99) BA − CA (a) Cogneitvely healthy patients (CDR = 0). − − − − . . . . . .
12 8 . years years Deviations P D F (cid:99) CA − CA (cid:99) BA − CA (b) Alzheimer’s patients (CDR > Fig. 10: The PDF of the deviations between the estimated ages and the ground-truth CA labels. The depicted lines ( ) and ( )represents the best-fit distribution for the CA and BA networks, respectively.
IV. R
ESULTS AND D ISCUSSION
The first step towards analyzing the proposed iterative data-cleaning strategy is to examine the detected outliers acrossconsecutive iterations. As depicted in Fig. 9, a total of 18training iterations were conducted before termination. This isdue to satisfying the pre-defined stopping condition with nooutliers detected in three successive iterations. Upon examin-ing the population of patients with moderate dementia (CDR= 2, ), it is observed that the proposed strategy detects allaforementioned patients after 11 iterations. For Alzheimer’spatients with mild dementia ( ), 36 out of 49 patientswere flagged as outliers in 16 iterations, amounting to a totalof 73% of the CDR 1 training population. For questionabledementia (CDR = 0.5, ), 54% of this population weredetected as outliers in 15 iterations. Conversely, for cognitivelyhealthy individuals ( ) a substantially smaller percentageof patients (25%) were flagged as outliers. Compared to thenumber of outliers detected from Alzheimer’s patients, it isrealistic for cognitively healthy individuals to less frequentlyexhibit atypically-aging characteristics. The above findingsindicate that this training algorithm is capable of detectingatypical-aging characteristics whether Alzheimer’s-related ornot. This demonstrates that the proposed training strategyis not only restricted to Alzheimer’s detection but reveals ahigher generalizable potential. For instance, it can potentiallybe extended to different sources of brain deterioration suchas tumours and lesions. Subsequently, all flagged outliers areremoved from the training samples to create a typical-agingdataset (BA ≈ CA). Thereafter, an Age-Net framework istrained for the task of BA estimation using this filtered dataset.Additionally, we also compare the predicted ages of the pro-posed BA estimation framework (after iterative data-cleaning)against those from a conventionally trained CA estimationAge-Net. The probability distribution functions (PDFs) of thedeviations between the predicted ages and ground-truth CAlabels for both frameworks are presented in Fig. 10. For thecognitively healthy population, minor perceivable differencescan be observed in the distributions of both frameworks.This is illustrated in Fig. 10a with both of them adopting a normal distribution with a mean-deviation shift of 0.92 years.However, in Fig. 10b the predicted BA ages for Alzheimer’spatients manifest a substantially different behavior with a mul-timodal distribution ( ) as opposed to a normal distribution( ) by the CA framework. Since Alzheimer’s disease wasfound to correspond with abnormal brain aging [31], it followsthat the predicted BA scores for patients with dementia mustexhibit a significant deviation from the corresponding CAlabels. This desired behavior is signified in the proposed BAestimation framework with the majority of the predicted agesexhibiting an over-aging of approximately 8.2 years comparedto the CA framework. We hypothesize that this reflects thecapability of the proposed framework in recognizing true BAbehavior. Also, a relatively smaller population of Alzheimer’spatients reveals under-aging behavior.For the final set of experimental evaluations, we analyzethe visualization maps from patients identified by the data-cleaning strategy as exhibiting atypical-aging characteristics.Thus, these patients were removed from the final BA trainingdataset. It is important to distinguish that these visualizationmaps reflects the network predictions rather than the actualbrain voxel intensities. We also compare the resultant outliervisualization maps against those extracted from healthy indi-viduals. As shown in Fig. 11, cognitively healthy individualsexhibit strong activations in the amygdala, hippocampus andthalamus in the axial plane as indicated by ( ). This agreeswith prior work investigating voxel-based morphometry ofbrain MRI of healthy patients [55]. In contrast, patients suffer-ing from dementia (CDR >
0) identified as outliers manifestrelatively weaker activations in the same regions, as indicatedby ( ). However, it is interesting to point out that cognitivelyhealthy patients who were flagged as outliers (CDR = 0) reveala noteworthy behavior. In some instances, the visualizationmaps from the aforementioned patients closely resemble themaps from dementia patients with similar highlighted regionsof low intensity. For other outlier patients of CDR 0, theresultant visualization maps exhibit unconventional behaviorwith different highlighted regions of high intensity comparedto cognitively healthy patients. For example, the regions in ig. 11: Examples of the brain visualization maps for 3 axial chunks from healthy and outlier patients as detected by the proposed iterativestrategy. Each column represents scans from a patient of the annotated CA value. The prominent activations in healthy patients werehighlighted by ( ) whereas the corresponding activations in Alzheimer’s patients were labelled via ( ). Uncommon or unique activationsdetected in cognitively healthy outliers were labelled by ( ). and around the fusiform gyrus depicts high activations in theoutlier patients, as specified by ( ), in contrast to healthyindividuals. On the whole, Alzheimer’s patients generallyexhibit weakened networks activations compared to healthyindividuals. Also, the CDR 0 outlier patients are either similarto dementia patients or display unique activations. We are ofthe opinion that the study of the activated brain regions foroutliers could potentially prove to be beneficial for radiologistsas it may assist in the early detection of disorders along withother various applications.This initial study reveals the potential of utilizing MRI scansfor BA estimation. The concept of incorporating an iterativetraining algorithm for approximating brain BA labels showsmerit by detecting the majority of patients with moderate andmild dementia as outliers. This does not come at the expense ofan overt detection of cognitively healthy patients as atypically-aging. Additionally, a visualization study highlighted the pos-sibility of utilizing the proposed BA estimation framework todiscover the deviation of seemingly healthy patients from theirrespective age groups.As this study is among the first of its kind, our work hasraised more questions than it has provided answers. Furtherclinical assessments by radiologists are necessary for thevalidation of the introduced framework. More specifically,a key question is the applicability of the BA frameworkon expanded MRI datasets from various organ systems withdifferent disorders. This could be a step towards achieving anorgan-specific BA assessment for patients using the whole- body MRI scans. This serves to evaluate the accelerated ordelayed-aging characteristics of various organs within patients.The correlation between the deviations of the estimated BAscores and the presence of different disorders should bestudied by radiologists together with the resultant visualizationmaps. Additionally, in the future, we plan to investigate thepossibility of utilizing the BA visualization maps for anomalydetection of lesions and disorders in an unsupervised setting.V. C
ONCLUSION
In this work, we present an initial study for organ-specificBA estimation using MRI scans. As an initial step, we de-velop a CA estimation framework capable of outperformingthe current state-of-the-art MRI-based regression networks.Furthermore, we introduce a novel iterative training algorithmfor excluding outlier patients exhibiting atypical-aging char-acteristics. This leads to the creation of a reference datasetwhere the available CA labels reflect the BA behavior. Uponvalidating the proposed BA framework on an Alzheimer’sdataset, the majority of patients with mild or moderate de-mentia were accurately detected as outliers. Moreover, theframework was found to be effective in detecting acceleratedaging in Alzheimer’s patients in comparison to conventionalCA estimation. Finally, an analysis was performed using estab-lished DL visualization techniques which reflects the potentialof the introduced framework in discovering deviations ofseemingly healthy patients from their respective age groups.In the future, we plan to expand the proposed framework viahe utilization of recent advances in Bayesian neural networksand uncertainty detection techniques to enhance the outlierdetection procedure. Also, extension to different organ systemswith whole-body MRI data would be investigated.R
EFERENCES [1] K. Armanious et al. , “Organ-based Chronological Age Estimationbased on 3D MRI Scans,” https://arxiv.org/abs/1910.06271, 2020, ac-cepted at IEEE European Signal Processing Conference (EUSIPCO).[2] T. Niccoli and L. Partridge, “Ageing as a Risk Factor for Disease,”
Current Biology , vol. 22, no. 17, pp. 741–752, 2012.[3] L. Repetto, “Greater risks of chemotherapy toxicity in elderly patientswith cancer,”
The Journal of Supportive Oncology , vol. 1, no. 4 Suppl.2, pp. 18–24, 2003.[4] D. A. Story, “Postoperative complications in elderly patients and theirsignificance for long-term prognosis,”
Current Opinion in Anaesthesi-ology , vol. 21, no. 3, pp. 375–379, 2008.[5] L. Jia, W. Zhang, and X. Chen, “Common methods of biological ageestimation,”
Clinical Interventions in Aging , vol. 12, pp. 759–772,2017.[6] B. H. Chen et al. , “Dna methylation-based measures of biological age:meta-analysis predicting time to death,”
Aging , vol. 8, no. 9, pp. 1844–1865, 2016.[7] V. Ignjatovic et al. , “Age-related differences in plasma proteins: howplasma proteins change from neonates to adults,”
PLoS One , vol. 6,no. 2, p. e17213, 2011.[8] E. Nakamura and K. Miyao, “A method for identifying biomarkers ofaging and constructing an index of biological age in humans,”
Thejournals of gerontology. Series A, Biological sciences and medicalsciences , vol. 62, no. 10, pp. 1096–1105, 2007.[9] E. Albrecht et al. , “Telomere length in circulating leukocytes is asso-ciated with lung function and disease,”
European Respiratory Journal ,vol. 43, no. 4, pp. 983–992, 2014.[10] S. Karrasch et al. , “Heterogeneous pattern of differences in respiratoryparameters between elderly with either good or poor FEV1,”
BMCPulmonary Medicine , vol. 18, no. 1, p. 27, 2018.[11] J. Park, B. Cho, H. Kwon, and C. Lee, “Developing a biological ageassessment equation using principal component analysis and clinicalbiomarkers of aging in korean men,”
Archives of Gerontology andGeriatrics , vol. 49, no. 1, pp. 7–12, 2009.[12] J. Jylhava, N. L. Pedersen, and S. Hagg, “Biological Age Predictors,”
EBioMedicine , vol. 21, pp. 29–36, 2017.[13] S. A. Rahman and D. A. Adjeroh, “Centroid of age neighborhoods: Anew approach to estimate biological age,”
IEEE Journal of Biomedicaland Health Informatics , vol. 24, no. 4, pp. 1226–1234, 2020.[14] D. Belsky et al. , “Eleven Telomere, Epigenetic Clock, and Biomarker-Composite Quantifications of Biological Aging: Do They Measure theSame Thing?”
American Journal of Epidemiology , vol. 187, no. 6, pp.1220–1230, 11 2017.[15] S. A. Rahman and D. Adjeroh, “Deep Learning using ConvolutionalLSTM estimates Biological Age from Physical Activity,”
ScientificReports , vol. 9, pp. 1–15, 08 2019.[16] J. Krøll and O. Saxtrup, “On the use of regression analysis for theestimation of human biological age,”
Biogerontology , vol. 1, pp. 363–368, 02 2000.[17] P. Klemera and S. Doubal, “A new approach to the concept and com-putation of biological age,”
Mechanisms of ageing and development ,vol. 127, pp. 240–248, 04 2006.[18] P. Fedichev et al. , “Extracting biological age from biomedical data viadeep learning: Too much of a good thing?”
Scientific Reports , vol. 8,03 2018.[19] E. Putin et al. , “Deep biomarkers of human aging: Application of deepneural networks to biomarker development,”
Aging , vol. 8, no. 5, pp.1021–1033, 2016.[20] J. H. Cole et al. , “Brain Age Predicts Mortality,”
Molecular Psychia-try , vol. 23, pp. 1385–1392, 04 2017.[21] M. Levine, “Modeling the Rate of Senescence: Can Estimated Bio-logical Age Predict Mortality More Accurately Than ChronologicalAge?”
The journals of gerontology. Series A, Biological sciences andmedical sciences , vol. 68, 12 2012.[22] Z. Liu et al. , “A new aging measure captures morbidity and mortal-ity risk across diverse subpopulations from NHANES IV: A cohortstudy,”
PLOS Medicine , vol. 15, 12 2018.[23] I. Cho, K. Park, and C. Lim, “An empirical comparative study onbiological age estimation algorithms with an application of WorkAbility Index (WAI),”
Mechanisms of ageing and development , vol.131, pp. 69–78, 12 2009. [24] A. Mitnitski, S. Howlett, and K. Rockwood, “Heterogeneity of HumanAging and Its Assessment,”
The Journals of Gerontology Series A:Biological Sciences and Medical Sciences , vol. 72, 2016.[25] A. Manzoor Mughal, N. Hassan, and A. Ahmed, “Bone Age As-sessment Methods: A Critical Review,”
Pakistan Journal of MedicalSciences , vol. 30, no. 1, pp. 211–215, 2014.[26] E. Tomei et al. , “Value of MRI of the hand and the wrist in evaluationof bone age: Preliminary results,”
Journal of Magnetic ResonanceImaging , vol. 39, no. 5, pp. 1198–1205, 2014.[27] D. ˇStern et al. , “Automated Age Estimation from Hand MRI VolumesUsing Deep Learning,” in
Medical Image Computing and Computer-Assisted Intervention (MICCAI) , 2016, pp. 194–202.[28] D. tern, C. Payer, N. Giuliani, and M. Urschler, “Automatic ageestimation and majority age classification from multi-factorial mridata,”
IEEE Journal of Biomedical and Health Informatics , vol. 23,no. 4, pp. 1392–1403, July 2019.[29] J. H. Cole, “Neuroimaging-derived Brain-Age: An AgeingBiomarker?”
Aging , vol. 9, no. 8, pp. 1861–1862, 2017.[30] S. G. Popescu et al. , “Deep Learning Methods for Estimating” BrainAge” from Structural MRI Scans,” in
Medical Imaging with DeepLearning (MIDL) , 2018.[31] K. Franke et al. , “Longitudinal Changes in Individual BrainAGE inHealthy Aging, Mild Cognitive Impairment, and Alzheimer’s Dis-ease,”
The Journal of Gerontopsychology and Geriatric Psychiatry ,vol. 25, pp. 235–245, 12 2012.[32] K. Franke, G. Ziegler, S. Klppel, and C. Gaser, “Estimating the ageof healthy subjects from t1-weighted mri scans using kernel methods:Exploring the influence of various parameters.”
NeuroImage , vol. 50,no. 3, pp. 883–892, 2010.[33] Z. Lao et al. , “Morphological classification of brains via high-dimensional shape transformations and machine learning methods,”
NeuroImage , vol. 21, no. 1, pp. 46–57, 1 2004.[34] R. Fujimoto et al. , “Brain age estimation from t1-weighted imagesusing effective local features,” in
The 39th Annual InternationalConference of the IEEE Engineering in Medicine and Biology Society(EMBC) , July 2017, pp. 3028–3031.[35] K. A. Bhawar and N. K. Bhil, “BrainTumor Classification using Neu-ral Network based Methods,”
International Journal of EngineeringSciences & Research Technology , vol. 5, no. 6, pp. 721–727, 2016.[36] K. Armanious et al. , “Independent Brain F-FDG PET AttenuationCorrection Using a Deep Learning Approach With Generative Adver-sarial Networks,”
Hellenic journal of nuclear medicine , vol. 22, no. 3,pp. 179–186, 2019.[37] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional net-works for biomedical image segmentation,” in
Medical Image Com-puting and Computer-Assisted Intervention (MICCAI) , 2015.[38] K. Armanious et al. , “Unsupervised Medical Image Translation UsingCycle-MedGAN,” in , 2019, pp. 1–5.[39] T. Huang et al. , “Age estimation from brain MRI images usingdeep learning,” in
IEEE 14th International Symposium on BiomedicalImaging (ISBI) , April 2017, pp. 849–852.[40] K. Ito et al. , “Performance evaluation of age estimation from t1-weighted images using brain local features and cnn,” in
The AnnualInternational Conference of the IEEE Engineering in Medicine andBiology Society (EMBC) , July 2018, pp. 694–697.[41] J. H. Cole et al. , “Predicting brain age with deep learning from rawimaging data results in a reliable and heritable biomarker,”
NeuroIm-age , vol. 163, pp. 115–124, 2016.[42] M. Urschler, S. Grassegger, and D. ˇStern, “What automated age esti-mation of hand and wrist MRI data tells us about skeletal maturationin male adolescents,”
Annals of Human Biology , vol. 42, no. 4, pp.358–367, 2015.[43] D. ˇStern and M. Urschler, “From individual hand bone age estimatesto fully automated age estimation via learning-based information fu-sion,” in
IEEE 13th International Symposium on Biomedical Imaging(ISBI) , April 2016, pp. 150–154.[44] B. Neumayer et al. , “Reducing acquisition time for MRI-based foren-sic age estimation,”
Scientific Reports , vol. 8, 02 2018.[45] D. ˇStern, C. Payer, and M. Urschler, “Automated age estimation fromMRI volumes of the hand,”
Medical Image Analysis , vol. 58, 2019.[46] S. A. Rahman et al. , “Deep learning for biological age estimation,”
Briefings in bioinformatics , 05 2020.[47] E. Bobrov et al. , “PhotoAgeClock: Deep learning algorithms for de-velopment of noninvasive visual biomarkers of aging,”
Aging , vol. 10,no. 11, pp. 3249–3259, 11 2018.[48] S. Xie et al. , “Aggregated residual transformations for deep neuralnetworks,” in
IEEE Conference on Computer Vision and PatternRecognition (CVPR) , 07 2017, pp. 5987–5995.49] G. Huang, Z. Liu, and K. Q. Weinberger, “Densely connected convolu-tional networks,” in
IEEE Conference on Computer Vision and PatternRecognition (CVPR) , 2017, pp. 2261–2269.[50] C. Szegedy et al. , “Going deeper with convolutions,” in
IEEE Con-ference on Computer Vision and Pattern Recognition (CVPR) , June2015.[51] F. N. Iandola et al. , “SqueezeNet: AlexNet-level accuracy with 50xfewer parameters and < , pp. 4724–4733, 2017.[53] “IXI dataset,” https://brain-development.org/ixi-dataset/.[54] C. D. Good et al. , “A voxel-based morphometric study of ageing in465 normal adult human brains,” NeuroImage , vol. 14, no. 1, pp. 21–36, 2001.[55] H. Matsuda, “Voxel-based morphometry of brain MRI in normal agingand Alzheimer’s disease,”
Aging and disease , vol. 4, no. 1, 2013.[56] Y. Taki et al. , “Correlations among brain gray matter volumes, age,gender, and hemisphere in healthy individuals,”
PLOS ONE
NeuroImage , vol. 38, no. 1, pp. 95–113, 2007.[59] M. Ueda et al. , “An age estimation method using 3D-CNN from brainmri images,” in
IEEE 16th International Symposium on BiomedicalImaging (ISBI) , April 2019, pp. 380–383.[60] K. Simonyan and A. Zisserman, “Very deep convolutional networksfor large-scale image recognition,” http://arxiv.org/abs/1409.1556,2014, arXiv preprint.[61] D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,”in
International Conference on Learning Representations (ICLR) , 122014.[62] P. LaMontagne et al. , “Oasis-3: Longitudinal neuroimaging, clini-cal, and cognitive dataset for normal aging and alzheimer disease,”
Alzheimer’s & Dementia , vol. 14, 07 2018.[63] K. Schmidt, “Clinical dementia rating scale,” in
Encyclopedia ofQuality of Life and Well-Being Research , 2014, pp. 957–960.[64] S. Srinivas and F. Fleuret, “Full-gradient representation for neuralnetwork visualization,” in
Advances in Neural Information ProcessingSystems (NeurIPS) , 2019.[65] T. Mundhenk, B. Chen, and G. Friedland, “Efficient Saliency Mapsfor Explainable AI,” https://arxiv.org/abs/1911.11293, 2019, arXivpreprint.[66] A. Chattopadhay, A. Sarkar, P. Howlader, and V. N. Balasubramanian,“Grad-CAM++: Generalized Gradient-Based Visual Explanations forDeep Convolutional Networks,” in2018 IEEE Winter Conference onApplications of Computer Vision (WACV)