Discriminative analysis of the human cortex using spherical CNNs - a study on Alzheimer's disease diagnosis
DDISCRIMINATIVE ANALYSIS OF THE HUMAN CORTEX USING SPHERICAL CNNS -A STUDY ON ALZHEIMER’S DISEASE DIAGNOSIS
Xinyang Feng * Jie Yang * Andrew F. Laine Elsa D. Angelini , Department of Biomedical Engineering, Columbia University, NY, USA NIHR Imperial BRC, ITMAT Data Science Group, Imperial College, London, UK
ABSTRACT
In neuroimaging studies, the human cortex is commonly mod-eled as a sphere to preserve the topological structure of thecortical surface. Cortical neuroimaging measures hence canbe modeled in spherical representation. In this work, we ex-plore analyzing the human cortex using spherical CNNs inan Alzheimer’s disease (AD) classification task using corti-cal morphometric measures derived from structural MRI. Ourresults show superior performance in classifying AD versuscognitively normal and in predicting MCI progression withintwo years, using structural MRI information only. This workdemonstrates for the first time the potential of the sphericalCNNs framework in the discriminative analysis of the humancortex and could be extended to other modalities and otherneurological diseases.
Index Terms — Spherical CNNs, cortex, Alzheimer’s dis-ease, structural MRI
1. INTRODUCTION
Deep learning has witnessed great success in image recogni-tion [1] using convolutional neural networks (CNNs) and hasbeen widely explored in neuroimaging field [2]. Most pre-vious studies in neuroimaging field either study the extractedfeatures from predefined regions of interest (ROIs) [3] or feed3D convolutional neural networks directly with the 3D imag-ing volume. The former approach potentially introduces toomuch prior into the model and limits the input representation.While the latter approach has the advantage of being agnosticand prior-free, adequate priors from previous neuroimagingstudies could be helpful to regularize the input information.The human cortex is commonly modeled as a 2D man-ifold sheet-like structure, despite the presence of sulci/gyrifolds. Therefore, 2D CNNs can, in principle, be applied onthe cortical sheet after flattening onto a 2D plane [4]. How-ever, inevitable distortions in the flattening process affect thedata representation. Surface cutting has been proposed to al-leviate distortions caused by the intrinsic curvature of the cor-tical surface, but this again introduces artificial changes to the * denotes equal contribution. topology of the surface [4]. Modeling the cortical surface ofeach hemisphere with a sphere is more accurate and desirable[4], and spherical coordinate system is the common practicein neuroimaging field, as it can preserve the topological struc-ture of the cortical surface. But 2D CNNs cannot be directlyapplied on a sphere.A spherical CNNs framework was recently introduced[5] and is explored for the first time in this study to analyzethe human cortex in a spherical representation. SphericalCNNs were proposed to model spherical data such as molec-ular modeling, 3D shape [5, 6] and has shown promisingperformances.Alzheimer’s disease (AD) is a neurodegenerative diseaseimpacting a large population and is the most common causeof dementia. Accurate diagnosis of AD and mild cognitiveimpairment (MCI) is of increasing importance. Cortical mor-phometric measures such as cortical thickness derived fromT1-weighted structural MRI have demonstrated to be impor-tant biomarkers for the diagnosis of AD, MCI, which are char-acterized by cortical gray matter atrophy. In this work, we ap-ply a spherical CNNs based framework on the cortical thick-ness data derived from structural MRI in Alzheimer’s DiseaseNeuroimaging Initiative (ADNI) cohort, for the AD versuscognitively normal (CN) classification task, and for MCI con-version prediction within two years.To the best of our knowledge, this is the first work apply-ing spherical CNNs on human cortex data and demonstratesthe potential for diverse studies on discriminative analyses ofhuman cortex neuroimaging data.
2. METHOD2.1. Cortical Modeling
The cortical surfaces were reconstructed using FreeSurfer[7] and morphed to the spherical representation by mini-mizing areal and distance distortions. All the individualcortical surfaces were registered to a spherical atlas in the fsaverage space matching cortical folding patterns [4]. Ateach vertex of the atlas cortical surface, multiple measures http://adni.loni.usc.edu/ a r X i v : . [ c s . C V ] D ec … SO(3) Convolution (SO3Conv) Batch Normalization + ReLUweighted Global Average Pooling (wGAP)S2 Convolution (S2Conv) Fully-connected (FC) with softmax D i a g n o s i s Concatenationinput ch@ … LR Fig. 1 . Illustration of the spherical CNNs framework proposed for AD diagnosis based on cortical morphometric data. The basic operation blocks are denotedas arrows and listed under the network structure. including thickness, surface area, volume, curvature, sulc, Ja-cobian determinant (warping to the atlas) can be derived fromFreeSurfer. Sensitivities of different measures vary in differ-ent diseases. And any measure can naturally be regarded asthe channels in the data representation. In this study, we usedcortical thickness as it has been previously demonstrated tobe highly sensitive for AD diagnosis [8, 9].We used a sampling grid with a bandwidth of to samplethe cortical surfaces, generating a × matrix for eachhemisphere. For each point in the sampling grid, we queriedthe closest vertices in the cortical surface in geodesic dis-tance and used the average measure as the matrix value. Spherical CNNs are extensions of regular CNNs formulationon the plane to spherical data, migrating the translationalequivariance to rotational equivariance. Hence, specially-designed convolution operations are re-formulated on spherespace S2 and 3D rotation group space SO(3) (SO=‘specialorthogonal group’). More theoretical underpinnings can befound in [5].Elements in the SO(3) space are represented in the EulerZYZ data format as: Z ( α ) Y ( β ) Z ( γ ) (1)where Z ( · ) denotes rotation around the Z axis, Y ( · ) denotesrotation around the Y axis, α ∈ [0 , π ] , β ∈ [0 , π ] , γ ∈ [0 , π ] are the rotation angles.Elements in the S2 space can be similarly represented as: Z ( α ) Y ( β ) Z (0) (2)The network architecture is similar to regular CNNs, withspherical convolutional blocks hierarchically layered. Themain parameters include bandwidth b , which is similar to thespatial dimension in regular CNNs, and number of channels c at each convolution block.In this work, we use a simple network structure with threeconvolutional layers interleaved with 3D batch normalization (BN) and rectifier linear unit (ReLU). Illustration of the net-work structure is shown in Fig. 1. The number of channelsdoubles and the spatial dimensions reduce by two along thedepth. Specifically, we denote the S2 convolution with band-width b and channel c as S2Conv( b , c ), and the SO(3) con-volution with bandwidth b and channel c as SO3Conv( b , c ).The fully convolutional part of the network is sequenced as:S2Conv(32, 32) - BN - ReLU - SO3Conv(16, 64) - BN -ReLU - SO3Conv(8, 128) - BN - ReLU. The three dimen-sions α, β, γ of the feature maps at each layer are all b .Then we apply a weighted global average pooling (wGAP)step, consisting of integrating over the spatial dimensions ofthe convolutional feature maps and correcting for the non-uniformity of the grid in the Y axis.The two hemispheres of human cortex are considered astwo sets of spherical data sharing the same diagnosis label.We therefore share the fully convolutional part of the networkbetween left and right hemispheres. The integrated featuresfrom left and right hemispheres are concatenated and fed intothe last fully connected layer with softmax activation functionfor the final disease classification.We also compared to regular CNNs on the same sampledinput, with the same architecture using regular 2D convolu-tions, replacing 3D BN with 2D BN, and doubling the chan-nel dimensions to ensure approximately same number of pa-rameters. Denoting the convolution operation with c channelsas Conv( c ), the fully convolutional part of the network testedfor comparison is: Conv(64) - BN - ReLU - Conv(128) - BN -ReLU - Conv(256) - BN - ReLU. The convolution layers havea stride of and a kernel size of .For model training, we used the cross-entropy loss andoptimized using stochastic gradient descent (SGD) with mo-ment 0.9. We used a batch size of 8 and ran the algorithm for200 epochs, with a 0.1 learning rate at the first 100 epochsand 0.01 learning rate for the last 100 epochs. .3. Activation Maps Spatial localization of features being used by CNNs can beexplored using class activation map [10], which has been ap-plied in medical imaging field [11]. In this study, we extendthe class activation map to spherical CNNs, generating classactivation maps on the sphere. The activation maps in spher-ical CNNs are defined in SO(3) space. According to Equa-tion 1 and 2, we selected the activation maps at γ = 0 toexplore the activation map patterns and corrected for the non-uniformity of the grid in the Y axis similar to the practice inwGAP. We performed weighted average of the corrected acti-vation maps using the weights from the fully-connected layer.
3. RESULT3.1. Data and Setup
We used the data from ADNI-1 cohort. We screened subjectsper diagnosis group as follows: CN subjects as having stayedcognitively normal during a follow-up period of at least twoyears, MCI stable (MCI-s) subjects as having stayed MCI dur-ing a follow-up period of at least two years, MCI progres-sion (MCI-p) subjects as having converted to AD within twoyears, and AD patients. Subject information for each diagno-sis group can be found in Table 1.
Table 1 . Subject InformationDiagnosis CN MCI-s MCI-p AD TotalN 151 114 136 188 589Age(std) 75.64(5.25) 74.90(7.33) 74.69(6.95) 75.18(7.50) 75.13(6.82)GenderM/F 74/77 72/42 85/51 99/89 330/259
We used the baseline T1-weighted MRI scans acquiredusing 1.5 T MRI scanners, pre-processed with the standardMayo Clinic pipeline and post-processed by UCSF usingFreeSurfer 4.3 [12].We performed two binary classification tasks: AD vs. CNand MCI-p vs. MCI-s. In each classification task, we per-formed 10-fold cross-validation with the fold split generatedfrom random stratified sampling ensuring similar distributionof diagnosis, age, and gender in each split.In each experiment, we set out one fold as test set, one foldas validation set, and the rest of the folds as training set. Ateach fold, the model with the maximum validation accuracyis selected as the optimal model. The probability output of alltest sets using the optimal models are aggregated together. Wereported the area under curve (AUC) of the receiver operatingcharacteristic (ROC) curve in Fig. 2, and also reported theaccuracy, sensitivity and specificity. http://adni.loni.usc.edu/methods/mri-analysis/mri-pre-processing/ Fig. 2 . ROC of (left) AD vs. CN classification and (right) MCI-pvs. MCI-s classification.
The ROC curve for AD vs. CN classification can be foundin Fig. 2 (left). The AUC values for spherical vs. standardCNNs are: 0.915 vs. 0.895. The accuracy (ACC), sensitivity(SEN) and specificity (SPE) values (with 0.5 as threshold) forspherical vs. standard CNNs are: 90.0% vs. 84.6%, 89.9%vs. 84.0%, 90.1% vs. 85.4%. The performance is higherthan a previous study also using cortical thickness patterns inADNI cohort (ACC: 84.5%, SEN: 79.4%, SPE: 88.9%, AUC:0.905) [9].
We further test our model on a more challenging MCI pro-gression prediction task using the same network setting. TheROC curve can be found in Fig. 2 (right). The ROC AUC val-ues for spherical vs. standard CNNs are: 0.707 vs. 0.657. Theaccuracy, sensitivity and specificity values (with 0.5 as thresh-old) for spherical vs. standard CNNs are: 71.6% vs. 66.4%,80.2% vs. 69.9%, 61.4% vs. 62.3%. The performance ishigher than a previous study on 2-year MCI progression pre-diction also using cortical thickness patterns in ADNI cohort(ACC: 66.7%, SEN: 59.0%, SPE: 70.2%, AUC: 0.673) [9].
A population-average AD class activation map of left hemi-sphere at γ = 0 , generated with the spherical CNNs, isshown in Fig. 3 together with a reference label map fromthe Desikan-Killiany atlas [13] sampled in the same way asthe thickness measures. The colors and orders of the regionsin the reference label map are displayed according to theFreeSurfer color lookup table. We observed two blobs ofAD predictive regions: the lower left blob corresponding toregions around medial temporal lobe, and the upper blob cor-responding to regions in the vicinity of supramarginal gyrus.Both regions are implicated in AD, according to [14].
4. DISCUSSION
Despite promising results obtained via our application ofspherical CNNs to cortical measures, there are several limi- ig. 3 . (Left) Class activation map for AD classification task fromthe proposed spherical CNN; (Right) Desikan-Killiany atlas in thesame space [13]. tations and potential future improvements to be considered:the input omits subcortical structures, such as the hippocam-pus, which is one of the brain structures affected by AD anda sensitive biomarker for AD diagnosis [8, 15], In futurework, the hippocampus can be modeled in the same way asthe general 3D structures [5, 6], and incorporated into theclassification model. And we can use multi-channel input in-cluding other measures such as volume to have multi-facetedcharacterization of the cortex.We shared the fully convolutional part between left andright hemispheres, while we can also use two different sets ofparameters, which however doubles the number of parametersfor the network. Left and right hemispheres could also be reg-istered into the same space and concatenated as two channels.By doing so, the asymmetry in the input information couldbe embedded and utilized by the CNNs for the diagnosis orprediction tasks.Since the spherical CNNs formulation is still new to thefield, there are still variant architectures to test, such as [6]. Amore thorough exploration of parameters (bandwidth, chan-nel), architectures, and properties (fully-convolutional prop-erty) is still necessary to fully exploit its potential.
5. CONCLUSION
In this study, we demonstrate for the first time that the newlyintroduced spherical CNNs formulation can be an effectivedeep learning framework for modeling human cortex and per-forming AD diagnosis task using MRI-based cortical mea-sures. Our results on the ADNI cohort show state-of-the-artclassification performance using structural MRI informationonly. The spherical CNNs formulation has the potential to beapplied to further structural MRI studies, on other neurologi-cal diseases, and other modalities such as fMRI and PET, aslong as the measures can be projected onto the cortical sphere.
Acknowledgments:
Thanks for funding from NIH/NHLBI R01-HL121270.Data used in preparation of this article were obtained from the AlzheimersDisease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). Assuch, the investigators within the ADNI contributed to the design and imple-mentation of ADNI and/or provided data but did not participate in analysisor writing of this report. A complete listing of ADNI investigators can befound at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
6. REFERENCES [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Delvingdeep into rectifiers: Surpassing human-level performance on imagenetclassification,” in
IEEE International Conference on Computer Vision(ICCV) , 2015, pp. 1026–1034.[2] Sandra Vieira, Walter H. L. Pinaya, and Andrea Mechelli, “Using deeplearning to investigate the neuroimaging correlates of psychiatric andneurological disorders: methods and applications,”
Neuroscience &Biobehavioral Reviews , vol. 74, pp. 58–75, 2017.[3] Heather Cody Hazlett, Hongbin Gu, Brent C Munsell, Sun HyungKim, Martin Styner, Jason J Wolff, Jed T Elison, Meghan R Swan-son, Hongtu Zhu, Kelly N Botteron, et al., “Early brain development ininfants at high risk for autism spectrum disorder,”
Nature , vol. 542, no.7641, pp. 348, 2017.[4] Bruce Fischl, Martin I Sereno, and Anders M Dale, “Cortical surface-based analysis: II: inflation, flattening, and a surface-based coordinatesystem,”
NeuroImage , vol. 9, no. 2, pp. 195–207, 1999.[5] Taco S Cohen, Mario Geiger, Jonas Khler, and Max Welling, “Spher-ical CNNs,” in
International Conference on Learning Representations(ICLR) , 2018.[6] Carlos Esteves, Christine Allen-Blanchette, Ameesh Makadia, andKostas Daniilidis, “Learning SO(3) equivariant representations withspherical CNNs,” in
The European Conference on Computer Vision(ECCV) , September 2018.[7] Anders M Dale, Bruce Fischl, and Martin I Sereno, “Cortical surface-based analysis: I. segmentation and surface reconstruction,”
NeuroIm-age , vol. 9, no. 2, pp. 179–194, 1999.[8] R´emi Cuingnet, Emilie Gerardin, J´erˆome Tessieras, Guillaume Auzias,St´ephane Leh´ericy, Marie-Odile Habert, Marie Chupin, Habib Benali,et al., “Automatic classification of patients with Alzheimer’s diseasefrom structural MRI: a comparison of ten methods using the ADNIdatabase,”
NeuroImage , vol. 56, no. 2, pp. 766–781, 2011.[9] Simon F Eskildsen, Pierrick Coup´e, Daniel Garc´ıa-Lorenzo, VladimirFonov, Jens C Pruessner, D Louis Collins, Alzheimer’s Disease Neu-roimaging Initiative, et al., “Prediction of Alzheimer’s disease in sub-jects with mild cognitive impairment from the ADNI cohort using pat-terns of cortical thinning,”
NeuroImage , vol. 65, pp. 511–521, 2013.[10] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and An-tonio Torralba, “Learning deep features for discriminative localiza-tion,” in
IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , 2016.[11] Xinyang Feng, Jie Yang, Andrew F. Laine, and Elsa D. Angelini,“Discriminative localization in CNNs for weakly-supervised segmen-tation of pulmonary nodules,” 2017, Medical Image Computing andComputer-Assisted Intervention (MICCAI), pp. 568–576.[12] Clifford R Jack, Matt A Bernstein, Nick C Fox, et al., “The Alzheimer’sdisease neuroimaging initiative (ADNI): MRI methods,”
Journal MagnReson Imaging , vol. 27, no. 4, pp. 685–691, 2008.[13] Rahul S Desikan, Florent S´egonne, Bruce Fischl, Brian T Quinn, Brad-ford C Dickerson, Deborah Blacker, Randy L Buckner, Anders M Dale,R Paul Maguire, et al., “An automated labeling system for subdividingthe human cerebral cortex on MRI scans into gyral based regions ofinterest,”
NeuroImage , vol. 31, no. 3, pp. 968–980, 2006.[14] Rahul S Desikan, Howard J Cabral, Christopher P Hess, William P Dil-lon, Christine M Glastonbury, Michael W Weiner, Nicholas J Schman-sky, Douglas N Greve, David H Salat, et al., “Automated MRI measuresidentify individuals with mild cognitive impairment and Alzheimer’sdisease,”
Brain , vol. 132, no. 8, pp. 2048–2057, 2009.[15] Xinyang Feng, Jie Yang, Andrew F Laine, and Elsa D Angelini,“Alzheimer’s disease diagnosis based on anatomically stratified textureanalysis of the hippocampus in structural MRI,” in