[PDF] DEEPMIR: A DEEP neural network for differential detection of cerebral Microbleeds and IRon deposits in MRI

Abstract

Lobar cerebral microbleeds (CMBs) and localized non-hemorrhage iron deposits in the basal ganglia have been associated with brain aging, vascular disease and neurodegenerative disorders. Particularly, CMBs are small lesions and require multiple neuroimaging modalities for accurate detection. Quantitative susceptibility mapping (QSM) derived from in vivo magnetic resonance imaging (MRI) is necessary to differentiate between iron content and mineralization. We set out to develop a deep learning-based segmentation method suitable for segmenting both CMBs and iron deposits. We included a convenience sample of 24 participants from the MESA cohort and used T2-weighted images, susceptibility weighted imaging (SWI), and QSM to segment the two types of lesions. We developed a protocol for simultaneous manual annotation of CMBs and non-hemorrhage iron deposits in the basal ganglia. This manual annotation was then used to train a deep convolution neural network (CNN). Specifically, we adapted the U-Net model with a higher number of resolution layers to be able to detect small lesions such as CMBs from standard resolution MRI. We tested different combinations of the three modalities to determine the most informative data sources for the detection tasks. In the detection of CMBs using single class and multiclass models, we achieved an average sensitivity and precision of between 0.84-0.88 and 0.40-0.59, respectively. The same framework detected non-hemorrhage iron deposits with an average sensitivity and precision of about 0.75-0.81 and 0.62-0.75, respectively. Our results showed that deep learning could automate the detection of small vessel disease lesions and including multimodal MR data (particularly QSM) can improve the detection of CMB and non-hemorrhage iron deposits with sensitivity and precision that is compatible with use in large-scale research studies.

Full PDF

1 DEEPMIR: A Deep Neural Network for Differential Detection of Cerebral Microbleeds and IRon Deposits in MRI for Large-Scale Cohort Based Studies

Tanweer Rashid , Ahmed Abdulkadir , Ilya M. Nasrallah , Jeffrey B. Ware , Hangfan Liu , Pascal Spincemaille , J. Rafael Romero , R. Nick Bryan , Susan R. Heckbert , Mohamad Habes

1 Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA, USA 2 Neuroimage Analytics Laboratory (NAL) and the Biggs Institute Neuroimaging Core (BINC), Glenn Biggs Institute for neurodegenerative disorders, University of Texas Health Science Center at San Antonio (UTHSCSA), San Antonio, USA 3 University Hospital of Old Age Psychiatry and Psychotherapy, University of Bern, Bern, Switzerland 4 Department of Radiology, Hospital of University of Pennsylvania, Perelman School of Medicine of the University of Pennsylvania, Philadelphia, PA, USA 5 Department of Radiology, Weill Cornell Medical College, New York, NY, USA 6 Department of Neurology, School of Medicine, Boston University, Boston, MA, USA 7 Department of Diagnostic Medicine, Dell Medical School, University of Texas at Austin, Austin, TX, USA 8 Department of Epidemiology and Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA Corresponding author: [email protected] 2

Abstract:

Background

Materials and Methods

We included a convenience sample of 24 participants from the MESA cohort and used T2-weighted images, susceptibility weighted imaging (SWI), and QSM to segment the two types of lesions. We developed a protocol for simultaneous manual annotation of CMBs and non-hemorrhage iron deposits in the basal ganglia. This manual annotation was then used to train a deep convolution neural network (CNN). Specifically, we adapted the U-Net model with a higher number of resolution layers to be able to detect small lesions such as CMBs from standard resolution MRI. We tested different combinations of the three modalities to determine the most informative data sources for the detection tasks.

Results

In detection of CMBs using single class and multiclass models, we achieved an average sensitivity and precision of between 0.84-0.88 and 0.40-0.59, respectively. The same framework detected non-hemorrhage iron deposits with an average sensitivity and precision of about 0.75-0.81 and 0.62-0.75, respectively. Conclusions: Our results showed that deep learning could automate the detection of small vessel disease lesions and including multimodal MR data (particularly QSM) can improve the detection of CMB and non-hemorrhage iron deposits with a sensitivity and precision that is compatible with use in large-scale research studies. 3 Keywords: cerebral microbleeds, iron deposits, brain lesions, deep learning, small vessel disease, MRI, dementia, SWI, QSM. Word count: 3958 4

List of Abbreviations

ANTs – Advanced Normalization Tools BOMBS - Brain Observer MicroBleed Scale CI – confidence interval CMB – Cerebral microbleeds EPVS – enlarged perivascular spaces FLIRT - FMRIB's Linear Image Registration Tool FN – false negative FP – false positive FSL - FMRIB Software Library GRE – gradient recalled echo IoU – intersection over union MARS - Microbleed Anatomical Rating Scale (MARS) MESA - Multi-Ethnic Study of Atherosclerosis MRI – magnetic resonance imaging QSM – quantitative susceptibility mapping SEM – standard error of the mean SWI – susceptibility weighted imaging TP – true positive TN – true negative 5 Introduction

The aging brain is subject to various irreversible changes, some driven by the aging process itself and others that are associated with various pathologies, including vascular lesions and neurodegeneration (1-4). On magnetic resonance imaging (MRI), particularly tuned to be sensitive for differences in magnetic susceptibility, focal accumulations of iron content can be visible. This includes lesions with iron content such as, cerebral microbleeds (CMBs) and non-hemorrhage iron deposits in the basal ganglia. CMBs are small hemorrhages that can occur sporadically throughout the brain (5). CMBs have been associated with cognitive decline and dementia (6), and are considered a biomarker for small vessel diseases. The presence of lobar CMBs is also a marker for cerebral amyloid angiopathy (7-9). Non-hemorrhage iron deposits are located in the deep structures of the brain, particularly in the basal ganglia. While an increase in iron concentration in the basal ganglia is expected in healthy aging (10), focal accumulation of iron has been associated with neurodegenerative disorders in small scale studies (11-13). Most of our knowledge on the iron toxicity in the aging brain is limited by the fact that both CMBs and iron deposit could be difficult to distinguish from each other and from other similar lesions including calcification using conventional MRI techniques (14). T2* gradient-recalled echo (GRE) and susceptibility-weighted imaging (SWI) are often used to clinically characterize CMB, with the latter being more sensitive for detecting CMBs (15, 16). CMBs can occur anywhere and appear as small rounded or ellipsoidal hypo-intense regions with a diameter of ten millimeters or less (7, 14, 17). Non-hemorrhage iron deposits in the basal ganglia have irregular shapes and could be larger than CMBs (14). Because hypo-intensities in SWI are not specific to CMBs and non-hemorrhage iron deposits, images with other tissue contrasts are required in order to identify other lesion types that can have similar low susceptibility signal on SWI, such as calcification (5, 18, 19). The specificity for CMB detection can be increased by post-processing SWI-magnitude and phase data to derive quantitative susceptibility maps (QSM) (20, 21). In QSM paramagnetic tissue appears different from diamagnetic materials, and therefore this contrast is particularly useful for distinguishing non-hemorrhage iron deposits from calcifications (22, 23). While previous efforts have been 6 made to automate the detection of microbleeds, all previous work neglected the detection of iron in such automated framework (24-30). No work has been published to date on segmenting iron deposits in the brain using QSM with either a semi- or a fully automatic method. The advances made in MRI technology with QSM for iron content recognition are gaining more attention as cohort-based studies such as The Multi-Ethnic Study of Atherosclerosis (MESA) (31-33) include QSM in their imaging protocol, and thus exploit its advantages in delivering specific insights on iron toxicity in the aging brain. The focus in MESA is utilizing non-invasive methods to investigate common risk factors, preclinical disease states and manifest diseases using standardized imaging protocol, which is applied to all participants (34). On one hand, this is providing a unique opportunity to study widely ignored lesions such as iron deposits in vivo using MRI but on the other hand this comes with additional challenges as such cohorts naturally include largely cognitively normal participants with a low lesion load, resulting in a very challenging task to automate. In order to tackle the challenges inherent in the detection of these lesions, we developed a robust and fully automated deep learning-based method to detect CMBs and non-hemorrhage iron deposits in a cohort without extensive apparent brain tissue damage and having a low load of CMBs and non-hemorrhage basal ganglia iron deposits. We experimented with both single class and multiclass segmentation models using multiple MR sequences. Our experiments show that using multi-sequence MRI (especially QSM) improves the accuracy of detection. The main contributions of this study include the following: 1.

We tackled the challenging problem of simultaneously detecting CMB and non-hemorrhage iron deposits. To our knowledge this is one of the first reports to detect both types of lesions simultaneously. Often iron accumulation in the brain has been understudied due to the lack of appropriate techniques for detecting them in vivo in large-scale epidemiological studies; 2.

We found out the most suitable pulse sequence combination to automate the detection tasks by exploiting imaging information jointly; 7 3.

We developed an effective and flexible neural network model that is specially tailored to the differential detection task. The proposed model can be easily adapted to segment additional lesions; 4.

We achieved highly competitive detection performance on real-life data, demonstrating the effectiveness of the proposed approach in practical applications. Materials and Methods

MRI Acquisition and Pre-Processing

The MESA Exam 6 Atrial Fibrillation Ancillary Study’s (34) brain MRI protocol included T1-weighted and T2-weighted sequences, and a susceptibility weighted imaging (SWI) sequence with 4 different, equally spaced echo times. SWI is a high-resolution, 3D imaging sequence where the image contrast is enhanced by combining magnitude and phase image data (35, 36). The scans were acquired at 6 sites using the same acquisition parameters. All scans were performed on Siemens MR scanners (2 Skyra with a 20 channel head coil and 4 Prisma Fit with a 32 channel head coil) at a static magnetic field strength of 3 Tesla and identical imaging sequence parameters, as shown in Table 1. Multiple SWI phase and magnitude images were acquired with varying echo times (Table 1). SWI data were generated following the method of Haacke et al (35, 37). A phase mask was generated from the phase images using a high-pass filter of size 64 x 64 in order to remove artifacts. The SWI was generated by multiplying the magnitude image with the phase mask. For creation of the reference annotation and subsequent deep learning-based inferencing, only the SWI image with the shortest echo time (TE=7.5 ms) was used because longer echo times have more noise due to increasingly pronounced blooming effects near the sinus cavity and cerebellum. In addition, SWI with longer echo times are also more prone to showing false positive CMBs, especially when veins are perpendicular to the imaging plane. Section 2 in the supplementary materials discuss this issue in more details. 8 The T1 and T2 images underwent N4 bias correction (38) with default parameters using the implementation in the Advanced Normalization Tools (ANTs) suite and were rigidly registered to the participants’ SWI image using FSL’s FLIRT (39-41). Anatomical parcellation and brain masks were generated with a multi-atlas segmentation method using the bias corrected T1 images (42). These brain masks were used in the generation of the QSM images. QSM maps were generated using the entire multi-echo SWI dataset using the Morphology Enabled Dipole Inversion (MEDI) method (21, 43) implemented in MATLAB. Manual Annotation

Manual annotation was performed according to a protocol developed with the focus on highly specific differential detection of CMBs and non-hemorrhage iron deposits based on multiple modalities including QSM. The detailed protocol is described in Section 2 in the supplementary materials, and a flowchart of the manual annotation process is shown in Supplementary Figure 4 in the supplementary materials. Panel A in Figure 1 shows an example of a CMB in the thalamus and non-hemorrhage iron deposits in the interior section of the globus pallidus on SWI (for TE=7.5 ms and for 22.5 ms), QSM and T2 MRI, and Panel B shows the expert segmentation of the lesions based on the annotation protocol. Panel C shows an example of a larger CMB located in the occipital lobe and Panel D shows it’s respective expert segmentation.

Study Participants

We included imaging data from participants in the MESA Exam 6 Atrial Fibrillation Ancillary Study (31-33). A subset of the MESA cohort participated in an ancillary study of cardiac arrhythmias and brain imaging during the 2016-2018 exam (Exam 6) (34). From 1061 participants who underwent MR brain scans, we selected a convenience sample of 34 scans based on visual identification of possible CMBs by two experienced readers (IMN and TR). These 34 participants are not representative of the MESA cohort in terms of prevalence of CMBs and non-hemorrhage iron deposits, and additional participants in the MESA http://stnava.github.io/ANTs/ https://fsl.fmrib.ox.ac.uk http://weill.cornell.edu/mri/pages/qsm.html

9 cohort likely have CMBs and/or non-hemorrhage iron deposits.

A total of 10 participants’ scans were excluded due to poor image quality ( n =4) and presence of distortions/artefacts or motion-related effects ( n =6). The demographics summary and lesion loads for the 24 included participants are presented in Table 2. Of these 24 participants, there were 13 males and 11 females with age range 65-94 years. Based on the expert annotation of these 24 participants, 4 participants had no microbleeds, 13 participants had 1 or 2 microbleeds (with average size of 10.85 mm ), 6 participants had between 3 and 8 microbleeds (with average size of 10.21 mm ) and 1 participant had more than 100 microbleeds (with average size of 4.76 mm ). In certain circumstances, the participant with more than 100 microbleeds may be considered an outlier in terms of the number of CMBs. An examination of this is presented in Section 5 of the supplementary materials. Of the 24 participants, 5 participants did not have any voxels labeled as non-hemorrhage iron deposits and the remaining had between 2 (each having a single voxel or 1.5 mm ) and 13 lesions (one participant had 4 non-hemorrhage iron deposit lesions with a total of 326 voxels or 489 mm ) labeled as non-hemorrhage iron deposits in the basal ganglia. The distribution of CMBs and iron deposits pooled over all participants is illustrated in Figure 2. The average size (± SEM, or standard error of the mean) of CMB lesions in this sample was 6.27 ± 0.51 mm (4.18 ± 0.34 voxels). Among the 20 participants with CMB, 70% ( n =14) had two or fewer CMBs, 25% ( n = 5) had between three and eight CMBs, and the remaining participant had 120 CMBs. The average size of non-hemorrhage iron deposit labels (± SEM) was 26.15 ± 4.76 mm (17.43 ± 3.17 voxels). Approximately 21% ( n = 5) had no discernable basal ganglia non-hemorrhage iron deposits and half ( n = 12) had fewer than 100 voxels (150 mm ) labeled as non-hemorrhage iron deposits. The remaining 29% ( n = 7) had more than 100 voxels labeled as non-hemorrhage iron deposits. Method Overview for Automated Processing

We developed a deep learning framework for automatic segmentation of CMBs and non-hemorrhage iron deposits based on the U-Net (44, 45), a widely used deep learning architecture for image segmentation. Our architecture, however, employed padded instead of unpadded convolutions and operated on six instead of 10 five spatial resolutions, and was used for both single class and multiclass segmentation experiments. The larger number of resolution layers enable the model to detect small CMBs. The detailed description of our implementation can be found in Section 3 in the supplementary materials. The overall system pipeline is shown in Figure 3. After the initial step of co-registration, the MR volumes were preprocessed to have zero mean and unit variance, as detailed in Section 4 of the supplementary materials. The normalized MR volumes were then sliced along the z -axis (axial slices) and edge-padded to obtain 2D slices with 256x256 voxels. We evaluated the performance using leave-one-out cross-validation for the 24 participants listed in Table 2 to ensure generalization of results. In each fold, a single participant’s data was kept separate for testing (test dataset), and the MR data and labels from the remaining 23 participants were randomly split into training dataset (75%, consisting of 17 participants) and validation datasets (25%, consisting of 6 participants). Both training and validation datasets were augmented to improve the robustness of the deep learning models (for more details on data augmentation see Section.4 in supplementary materials). The training dataset was used to train the model for a single epoch, after which the validation dataset was used to compute a commonly used evaluation metric known as intersection-over-union (IoU) which quantifies the amount of overlap between the predicted and groundtruth segmentations. Each model was trained for a maximum of 30 epochs, and the best model was determined as the model with the maximum IoU. This best model was then used to predict the labels of the test dataset. The set of predictions used for evaluating model performance thus consisted of 24 segmentation masks that were predicted with 24 different models with no overlap between training, validation and testing datasets. These cross-validated evaluations were done for both single class and multiclass experiments. For all experiments, four permutations of MR modalities were considered: (1) SWI only, (2) SWI and QSM, (3) SWI and T2, and (4) SWI, QSM and T2. For single class experiments, separate models were trained and evaluated for (1) CMBs only and (2) non-hemorrhage iron deposits only. For multiclass experiments, both CMBs and iron deposits had separate labels, and were segmented simultaneously. For multi-class segmentations, a larger number of augmentations were used than for single class segmentations. 11 Evaluation of Performance

We evaluated the performance in terms of rate of detected/missed CMBs and non-hemorrhage iron deposit lesions. For each participant, the number of true positives (TP), false positives (FP) and false negatives (FN) was counted. The centroid of the lesion in both the predicted segmentation and reference annotation was computed. TP, FP and FN were determined on whether the Euclidean distance between the centroid of each predicted lesion and a reference lesion was below a specified tolerance. A tolerance of 3 was used for evaluating CMBs and a tolerance of 5 was used for evaluating non-hemorrhage iron deposits, since iron deposits have a more dispersed pattern than CMBs which are spherical. The sensitivity 𝑆 (or true positive rate) was computed as the ratio of TP and number of lesions in the ground truth (TP + FN) for each participant: 𝑆 =

𝑇𝑃𝑇𝑃+𝐹𝑁 (1) The precision (or positive predictive value) 𝑃 was computed as the ratio of TP and the number lesions in the predicted mask: 𝑃 =

𝑇𝑃𝑇𝑃+𝐹𝑃 (2) When the true negative (TN) is available, the typical measure of performance is the accuracy, determined by

𝐴𝐶𝐶 =

𝑇𝑃 + 𝑇𝑁𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 (3) In our application, the true negative (TN) is difficult to estimate because the true negative encompasses all regions of the volume that are not CMBs or non-hemorrhage iron deposits. To evaluate the performance of each model, we report the average sensitivity across all participants and average precision across all participants, as well as a combined metric (magnitude accuracy) computed as √𝑆̅ + 𝑃̅ , where 𝑆̅ and 𝑃̅ are the average sensitivity and precision respectively. 12 Statistical Analysis

Due to the small sample size and potentially non-uniform distribution of the models’ sensitivity, precision and magnitude accuracy, we utilized the non-parametric two-tailed Wilcoxon signed rank test (46) to check for difference between the performance of the various models. In all experimental evaluations, the model trained with only SWI was considered as the baseline model for comparison. Statistical significance was considered at a p < 0.05. All statistical analyses were performed in MATLAB R2017b. Results

We performed leave-one-out cross-validated evaluations for both single class and multiclass segmentation experiments using the 24 participants listed in Table 2. Panel A in Figure 4 show an example of the automated segmentation of a CMB (indicated by the red arrow). Panel B in Figure 4 shows the segmentation of the focal iron deposits in the basal ganglia. In this figure, the model correctly segmented the iron deposit lesions (indicated by the green arrow) while rejecting an instance of calcification (indicated by the yellow arrow). The results of these experiments are reported in Supplementary Table 4.1 and Supplementary Table 4.2 in Section 4 of the supplementary materials. Overall, our experiments show that incorporating QSM in model training can increase the accuracy of CMB and iron deposit detection. In the case of segmenting CMBs, the best performance in terms of average magnitude accuracy is seen with the model trained with SWI and QSM in both single class and multiclass experiments. For non-hemorrhage iron deposits, the single class model trained with all three modalities had the highest average magnitude accuracy and the multiclass model trained with SWI and QSM had the highest average magnitude accuracy. Figure 5 show a joint scatterplot of the single class experimental results and Figure 6 show a joint scatterplot of the multiclass experiments. 13 In our dataset, we identified as an outlier a single individual with exceptionally many CMBs. A comparative analysis was done by removing this outlier from the dataset and repeating a similar cross-validated evaluation by retraining both single class and multiclass models. The results are detailed in Supplementary Table 5.1 and Supplementary Table 5.2 in Section 5 of the supplementary materials. With the exception of multiclass CMBs, we note that the best result in terms of magnitude accuracy is seen when the model training includes QSM. An additional leave-one-out cross-validated evaluation was done for the 24 participants using an implementation of the original UNet (45). The results of this experiment are reported in Supplementary Table 6.1 in Section 6 of the supplementary materials. In these experiments, we note that the models trained with SWI, QSM and T2 had the best performance in terms of magnitude accuracy for both CMBs and iron deposits. Discussion

We developed a deep learning framework for simultaneous segmentation of cerebral microbleeds and non-hemorrhage iron deposits using multi-modal MRI. To date, previously published methods for automated or semi-automated CMB detection have ignored iron deposits. In this study, we consider the iron deposit in the basal ganglia seen as hypo-intense lesions on SWI and confirmed by QSM to be iron-specific rather than mineralization. Those lesions may typically be labelled as possible or uncertain microbleeds on MARS (18) and BOMBS (19) mainly because of the limitation that T2* and SWI cannot differentiate iron content from mineralization. We overcome this limitation by including QSM in our study, which has shown to improve the overall accuracy for automated detection. To our knowledge, there are no studies that attempted to segment these focal iron deposits using SWI and/or QSM automatically. Our deep learning-based segmentation method presented here is filling in this gap. We have undertaken several experiments using both single class and multiclass models with different combinations of the available MR pulse sequences. 14 We noted that the models which included QSM in training consistently performed better and the resulting predictions had statistically high correlation when compared to the reference annotation. Our approach has several advantages over the current state-of-the-art methods for CMB detection. First, by using deep learning our model is capable of learning and generalizing features rather than rely on feature vectors derived with conventional image processing algorithms (28-30), Fourier shape descriptors (47) or probabilistic models (27). Second, we employ end-to-end learning by using a single model (or network). Previously published methods that used deep learning employed multiple stages consisting of (a) a candidate generation stage which use either conventional image processing methods (24, 26) or an initial (and separate) deep learning-based model (25) for identifying possible CMBs, and (b) a false positive reduction stage in the form of a CNN-based network (24-26). Our single-stage design allows for greater flexibility, for example in retraining with different or larger data sets, adding additional class labels, or using different modalities, while achieving sensitivity and precision comparable to published results. Third, we trained with different sets of input imaging modalities. Combinations of imaging modalities allowed our models to reject mimics such as calcifications without explicit provisions (as shown in Figure 4, Panel B). The models in (25, 26) used SWI only and therefore may not be capable of recognizing and rejecting mimics. The method in (24) utilizes SWI-phase and magnitude images along with QSM, but did not consider iron deposits in the basal ganglia. Lastly, we investigated the performance of the CNN architecture for the simultaneous differentiation and labelling of both CMB and iron deposit labels. One of the major challenges was the small size of the lesions and their potential presence throughout the brain. The average size of four voxels (or 6 mm ) per CMB together with the generally low lesion burden of the study participants resulted in including only two CMB lesions/4 voxels on average per participant, resulting in a higher weight of a single lesion or error in the evaluation. In other words, missing a single lesion would result in a drop of sensitivity from one to 0.5 and a single false positive for a given participant would result in drop of that participant’s precision from one to 0.5 or 0.66. Similarly, a small number of false positives, in absolute terms, can lower the average precision substantially. In general, our models over- 15 segmented the data. In all experiments using the aforementioned combinations of available imaging modalities, most of the lesions were detected and the average sensitivity was consistently above 0.75. Notably the sample used to train the model was a convenience sample from participants of the MESA study without particular clinical profile and without apparent brain disorder such as dementia, depression, or traumatic brain injury. Given the low number of lesions on average, our method achieved sensitivities that are comparable to state-of-the-art CMB segmentation/detection methods trained with large datasets. We expect that including more samples with more lesions would improve the precision. In general, most studies incorporating automated methods for large-scale abnormality detection or brain region segmentation incorporate a segmentation quality control step that could result in corrections or exclusions (1, 48, 49). Thanks to the flexibility of our method, it is straight forward to increase the sample size. In clinical terms, a larger number of CMBs is more likely to be clinically relevant. The proposed DEEPMIR method was trained and evaluated on a relatively small population and outputs the number of lesions and lesion segmentation maps for each participant. The next step would be to rigorously test and evaluate the proposed model on a larger sample size to ensure viable sensitivity, precision and accuracy, before applying it to a large cohort to determine the prevalence of lesions in the population. An adequately trained model can be used as a screening tool to flag participants with a high lesion load. DEEPMIR can also be used to generate an initial segmentation of lesions to accelerate manual annotation. QSM is a good, non-invasive technique to distinguish between iron content and mineralization in the brain and showed great advantage in improving the accuracy of CMB and iron deposit detection in the current study. While QSM is being recognized and is being integrated in more population-based studies, large studies with QSM data acquisition such as MESA is still ongoing. This left us with a relatively small number of imaging data used for training. For our experiment, we had a ratio of validation to training data (25:75), which showed to be reasonable to ensure that a maximal amount of the available data is used in model training, while at the same time a sufficient amount is reserved for within-training validation. The use of 16 similar sample sizes for training and evaluation is not unprecedented in such small lesion detection (27, 29, 50, 51). We have presented a framework for the automated detection of cerebral microbleeds and non-hemorrhage iron deposits in the basal ganglia. While SWI remains the preferred modality of choice for CMB detection, few studies have leveraged QSM as an additional source of information to improve detection accuracy, and to date there have been no attempts to include iron deposits in the basal ganglia as an item of interest. We have utilized QSM in this study to confirm that these focal lesions in the basal ganglia are in fact iron depositions, rather than mineralization such as calcifications. Our deep learning neural network model is flexible and at the same time scalable to include additional modalities and/or class labels while maintaining comparably high sensitivity and precision. Acknowledgements Availability of data and materials

Project name: DEEPMIR Project home page: nallab.org, github.com/NAL-UTHSCSA/CMB_NHID_Segmentation Archived version: https://arxiv.org/abs/2010.00148 Operating system(s): e.g. Linux Programming language: Python 3 Other requirements: TensorFlow, Keras References

1. Habes M, Erus G, Toledo JB, Zhang T, Bryan N, Launer LJ, et al. White matter hyperintensities and imaging patterns of brain ageing in the general population. Brain. 2016;139(4):1164-79. 2. Debette S, Beiser A, DeCarli C, Au R, Himali JJ, Kelly-Hayes M, et al. Association of MRI markers of vascular brain injury with incident stroke, mild cognitive impairment, dementia, and mortality: the Framingham Offspring Study. Stroke. 2010;41(4):600-6. 3. Ge Y, Grossman RI, Babb JS, Rabin ML, Mannon LJ, Kolson DL. Age-related total gray matter and white matter changes in normal adult brain. Part I: volumetric MR imaging analysis. American journal of neuroradiology. 2002;23(8):1327-33. 4. Scahill RI, Frost C, Jenkins R, Whitwell JL, Rossor MN, Fox NC. A longitudinal study of brain volume changes in normal aging using serial registered magnetic resonance imaging. Archives of neurology. 2003;60(7):989-94. 5. Charidimou A, Jäger HR, Werring DJ. Cerebral microbleed detection and mapping: principles, methodological aspects and rationale in vascular dementia. Experimental gerontology. 2012;47(11):843-52. 6. Akoudad S, Wolters FJ, Viswanathan A, de Bruijn RF, van der Lugt A, Hofman A, et al. Association of cerebral microbleeds with cognitive decline and dementia. JAMA neurology. 2016;73(8):934-43. 7. Haller S, Vernooij MW, Kuijer JP, Larsson E-M, Jäger HR, Barkhof F. Cerebral microbleeds: imaging and clinical significance. Radiology. 2018;287(1):11-28. 8. Vernooij M, van der Lugt A, Ikram MA, Wielopolski P, Niessen W, Hofman A, et al. Prevalence and risk factors of cerebral microbleeds: the Rotterdam Scan Study. Neurology. 2008;70(14):1208-14. 9. Cordonnier C, van de r Flier WM. Brain microbleeds and Alzheimer’s disease: innocent observation or key player? Brain. 2011;134(2):335-44. 10. Acosta-Cabronero J, Betts MJ, Cardenas-Blanco A, Yang S, Nestor PJ. In vivo MRI mapping of brain iron deposition across the adult lifespan. Journal of Neuroscience. 2016;36(2):364-74. 11. Ward RJ, Zucca FA, Duyn JH, Crichton RR, Zecca L. The role of iron in brain ageing and neurodegenerative disorders. The Lancet Neurology. 2014;13(10):1045-60. 12. Bartzokis G, Tishler TA, Lu PH, Villablanca P, Altshuler LL, Carter M, et al. Brain ferritin iron may influence age-and gender-related risks of neurodegeneration. Neurobiology of aging. 2007;28(3):414-23. 13. House MJ, Pierre TS, Foster J, Martins R, Clarnette R. Quantitative MR imaging R2 relaxometry in elderly participants reporting memory loss. American Journal of Neuroradiology. 2006;27(2):430-9.

14. Greenberg SM, Vernooij MW, Cordonnier C, Viswanathan A, Salman RA-S, Warach S, et al. Cerebral microbleeds: a guide to detection and interpretation. The Lancet Neurology. 2009;8(2):165-74. 15. Nandigam R, Viswanathan A, Delgado P, Skehan M, Smith E, Rosand J, et al. MR imaging detection of cerebral microbleeds: effect of susceptibility-weighted imaging, section thickness, and field strength. American Journal of Neuroradiology. 2009;30(2):338-43. 16. Ayaz M, Boikov AS, Haacke EM, Kido DK, Kirsch WM. Imaging cerebral microbleeds using susceptibility weighted imaging: one step toward detecting vascular dementia. Journal of Magnetic Resonance Imaging. 2010;31(1):142-8. 17. Cordonnier C, Al-Shahi Salman R, Wardlaw J. Spontaneous brain microbleeds: systematic review, subgroup analyses and standards for study design and reporting. Brain. 2007;130(8):1988-2003. 18. Gregoire S, Chaudhary U, Brown M, Yousry T, Kallis C, Jäger H, et al. The Microbleed Anatomical Rating Scale (MARS): reliability of a tool to map brain microbleeds. Neurology. 2009;73(21):1759-66. 19. Cordonnier C, Potter GM, Jackson CA, Doubal F, Keir S, Sudlow CL, et al. Improving interrater agreement about brain microbleeds: development of the Brain Observer MicroBleed Scale (BOMBS). Stroke. 2009;40(1):94-9. 20. de Rochefort L, Liu T, Kressler B, Liu J, Spincemaille P, Lebon V, et al. Quantitative susceptibility map reconstruction from MR phase data using bayesian regularization: validation and application to brain imaging. Magnetic resonance in medicine. 2010;63(1):194-206. 21. Wang Y, Liu T. Quantitative susceptibility mapping (QSM): decoding MRI data for a tissue magnetic biomarker. Magnetic resonance in medicine. 2015;73(1):82-101. 22. Liu T, Surapaneni K, Lou M, Cheng L, Spincemaille P, Wang Y. Cerebral microbleeds: burden assessment by using quantitative susceptibility mapping. Radiology. 2012;262(1):269-78. 23. Schweser F, Deistung A, Lehr BW, Reichenbach JR. Differentiation between diamagnetic and paramagnetic cerebral lesions based on magnetic susceptibility mapping. Medical physics. 2010;37(10):5165-78. 24. Liu S, Utriainen D, Chai C, Chen Y, Wang L, Sethi SK, et al. Cerebral microbleed detection using Susceptibility Weighted Imaging and deep learning. NeuroImage. 2019;198:271-82. 25. Dou Q, Chen H, Yu L, Zhao L, Qin J, Wang D, et al. Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks. IEEE transactions on medical imaging. 2016;35(5):1182-95. 26. Chen Y, Villanueva-Meyer JE, Morrison MA, Lupo JM. Toward Automatic Detection of Radiation-Induced Cerebral Microbleeds Using a 3D Deep Residual Network. Journal of digital imaging. 2018:1-7. 27. Seghier ML, Kolanko MA, Leff AP, Jäger HR, Gregoire SM, Werring DJ. Microbleed detection using automated segmentation (MIDAS): a new method applicable to standard clinical MR images. PloS one. 2011;6(3):e17547. 28. Bian W, Hess CP, Chang SM, Nelson SJ, Lupo JM. Computer-aided detection of radiation-induced cerebral microbleeds on susceptibility-weighted MR images. NeuroImage: clinical. 2013;2:282-90. 29. Kuijf HJ, de Bresser J, Geerlings MI, Conijn MM, Viergever MA, Biessels GJ, et al. Efficient detection of cerebral microbleeds on 7.0 T MR images using the radial symmetry transform. NeuroImage. 2012;59(3):2266-73. 30. Roy S, Jog A, Magrath E, Butman JA, Pham DL, editors. Cerebral microbleed segmentation from susceptibility weighted images. Medical Imaging 2015: Image Processing; 2015: International Society for Optics and Photonics. 31. Bild DE, Bluemke DA, Burke GL, Detrano R, Diez Roux AV, Folsom AR, et al. Multi-ethnic study of atherosclerosis: objectives and design. American journal of epidemiology. 2002;156(9):871-81. 32. Olson JL, Bild DE, Kronmal RA, Burke GL. Legacy of MESA. Global heart. 2016;11(3):269-74. 33. Winston-Salem N, Irvine C. The multiethnic study of atherosclerosis. Global heart. 2016;11(3):267.

34. Heckbert SR, Austin TR, Jensen PN, Floyd JS, Psaty BM, Soliman EZ, et al. Yield and consistency of arrhythmia detection with patch electrocardiographic monitoring: The Multi-Ethnic Study of Atherosclerosis. Journal of electrocardiology. 2018;51(6):997-1002. 35. Haacke EM, Mittal S, Wu Z, Neelavalli J, Cheng Y-C. Susceptibility-weighted imaging: technical aspects and clinical applications, part 1. American Journal of Neuroradiology. 2009;30(1):19-30. 36. Mittal S, Wu Z, Neelavalli J, Haacke EM. Susceptibility-weighted imaging: technical aspects and clinical applications, part 2. American Journal of neuroradiology. 2009;30(2):232-52. 37. Haacke EM, Xu Y, Cheng YCN, Reichenbach JR. Susceptibility weighted imaging (SWI). Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine. 2004;52(3):612-8. 38. Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, et al. N4ITK: improved N3 bias correction. IEEE transactions on medical imaging. 2010;29(6):1310. 39. Jenkinson M, Bannister P, Brady M, Smith S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage. 2002;17(2):825-41. 40. Jenkinson M, Smith S. A global optimisation method for robust affine registration of brain images. Medical image analysis. 2001;5(2):143-56. 41. Greve DN, Fischl B. Accurate and robust brain image alignment using boundary-based registration. Neuroimage. 2009;48(1):63-72. 42. Doshi J, Erus G, Ou Y, Resnick SM, Gur RC, Gur RE, et al. MUSE: MUlti-atlas region Segmentation utilizing Ensembles of registration algorithms and parameters, and locally optimal atlas selection. Neuroimage. 2016;127:186-95. 43. Liu J, Liu T, de Rochefort L, Ledoux J, Khalidov I, Chen W, et al. Morphology enabled dipole inversion for quantitative susceptibility mapping using structural consistency between the magnitude image and the susceptibility map. Neuroimage. 2012;59(3):2560-8. 44. Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O, editors. 3D U-Net: learning dense volumetric segmentation from sparse annotation. International conference on medical image computing and computer-assisted intervention; 2016: Springer. 45. Ronneberger O, Fischer P, Brox T, editors. U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention; 2015: Springer. 46. Wilcoxon F. Individual comparisons by ranking methods. Breakthroughs in statistics: Springer; 1992. p. 196-202. 47. Liu H, Rashid T, Habes M, editors. Cerebral Microbleed Detection Via Fourier Descriptor with Dual Domain Distribution Modeling. 2020 IEEE 17th International Symposium on Biomedical Imaging Workshops (ISBI Workshops); 2020: IEEE. 48. Habes M, Sotiras A, Erus G, Toledo JB, Janowitz D, Wolk DA, et al. White matter lesions: Spatial heterogeneity, links to risk factors, cognition, genetics, and atrophy. Neurology. 2018;91(10):e964-e75. 49. Nasrallah IM, Pajewski NM, Auchus AP, Chelune G, Cheung AK, Cleveland ML, et al. Association of intensive vs standard blood pressure control with cerebral white matter lesions. Jama. 2019;322(6):524-34. 50. Barnes SR, Haacke EM, Ayaz M, Boikov AS, Kirsch W, Kido D. Semiautomated detection of cerebral microbleeds in magnetic resonance images. Magnetic resonance imaging. 2011;29(6):844-52. 51. Fazlollahi A, Meriaudeau F, Villemagne VL, Rowe CC, Yates P, Salvado O, et al., editors. Efficient machine learning framework for computer-aided detection of cerebral microbleeds using the radon transform. 2014 IEEE 11th international symposium on biomedical imaging (ISBI); 2014: IEEE. Table 1: Scanner parameters

Image Type Echo Time (TE) (ms) Repetition Time (TR) (ms) Pixel Bandwidth (Hz/pixel) Flip Angle (FA) Slice Thickness (mm) Acquisition Matrix in-plane voxel size (mm) T1 MPRAGE T2

408 3200 750 120 1 256x256 1x1

SWI Table 2: Summary demographics for the included MESA participants (n=24).

Participants Age Sex Number of CMBs (Average Size) Iron deposits (voxel count) ) 4 participants had 96 – 326 voxels 13 participants 66-94 6 females, 7 males 1 or 2 CMBs (7.24 voxels or 10.85 mm ) 11 participants had 9 – 283 voxels, 2 participants had 0 voxel) 6 participants 65-89 2 females, 4 males 3 to 8 CMBs (7.1 voxels or 10.21 mm ) 4 participants had 2 – 146 voxels, 2 participants had 0 voxels 1 participant 67 1 male 120 (3.175 voxels or 4.76 mm ) 0 voxels Figure 1: This figure shows examples of cerebral microbleeds and basal ganglia iron deposition in SWI for TE=7.5ms (left column), SWI for TE=22.5ms (middle left column), QSM (middle right column) and T2 (right column). Panels A and C show the lesions in two different brains, and Panels B and D show the corresponding human expert labelling of the CMBs (red) and iron deposits (green). Figure 2: Histogram of the size of cerebral microbleeds (left panel) and iron deposits (right panel) lesions in mm pooled over all participants. Figure 3: Overview of the split for one-fold of the cross-validation process that is repeated n times. In each fold, the model that was used to predict the test participant was trained on the remaining n-1 examples in order to avoid data leakage. Within the training stage, 25 percent of the n-1 participants were used as validation set. The model with highest validation accuracy was chosen to predict the left out participant example. Figure 4 Panel A: (Top) An example of the correct segmentation of a small microbleed (red arrow). (Bottom) Magnified view of microbleed with segmentation mask (single red pixel). Panel B: An example of QSM being used to distinguish iron deposits from calcifications in the basal ganglia. (Top row) The SWI for TE=7.5ms, SWI for TE=22.5ms and QSM of the basal ganglia. The yellow arrow points to hypo-intense voxels which are likely calcifications and the green arrow points to basal ganglia iron deposits. (Bottom row) The segmentation mask (green labels) of the iron deposits. Figure 5: Joint scatterplots of the sensitivity vs precision of all single class experiments predicting CMBs and non-hemorrhage iron deposits. (Left) all CMB only experiments and (Right) all iron deposits only experiments. In each subplot, the round points indicate the individual participants’ sensitivity and precision evaluated with leave-one-out cross-validation, and the X indicates the mean sensitivity and precision. The legend at the upper left corner of each subplot shows the coordinates of X. In each subplot, histograms of the sensitivity and precision are displayed along the upper and right axes.

Single Class CMB Only Single Class Iron Deposits Only Figure 6: Joint scatterplots of the sensitivity vs precision of all multiclass experiments predicting CMBs and non-hemorrhage iron deposits. (Left) all evaluations for CMBs and (Right) all evaluations for iron deposits. In each subplot, the round points indicate the individual participants’ sensitivity and precision evaluated with leave-one-out cross-validation, and the X indicates the mean sensitivity and precision. The legend at the upper left corner of each subplot shows the coordinates of X. In each subplot, histograms of the sensitivity and precision are displayed along the upper and right axes.

Multiclass CMB Multiclass Iron Deposits

28 29

Supplementary Materials

Title: DEEPMIR: A Deep Neural Network for Differential Detection of Cerebral Microbleeds and IRon Deposits in MRI for Large-Scale Cohort Based Studies Authors: Tanweer Rashid, Ahmed Abdulkadir, Ilya M. Nasrallah, Jeffrey B. Ware, Hangfan Liu, Pascal Spincemaille, J. Rafael Romero, R. Nick Bryan, Susan R. Heckbert, Mohamad Habes

Section 1. Multi-Echo SWI

Susceptibility weighted imaging (SWI) is a high-resolution, 3D imaging technique where the magnitude image is combined with phase data (in post-processing stage), and is able to detect small CMBs due to its’ high sensitivity to haemosiderin (1). The choice of the echo time has a significant impact on the visibility and/or detection of CMBs. Longer echo times allow increased proton dephasing, (also known as the blooming effect) and may increase the CMB’s hypo-intensity and/or size. However, there are disadvantages in the use of longer TEs. Blood vessels that are perpendicular to the imaging plane may take on the appearance of CMBs (i.e. hypo-intense and rounded in shape), and for longer TE even small blood vessels may be visible and thus increase the chances of false positive identification in an automated segmentation/detection algorithm. Supplementary Figure 1 shows an example of a CMB mimic becoming more visible at longer TEs. Another side effect for the use of longer TEs is diminishing image quality. The presence air-tissue interfaces in regions such as the sinus cavity and skull base can cause distortions/artifacts in the image and create uncertainty in distinguishing CMBs from noise. Supplementary Figure 2 show an example of increasing distortions in the region above the sinus cavity for longer TE, and Supplementary Figure 3 show an example of poor image quality for regions in the temporal lobe and cerebellum at longer TEs. 30

Supplementary Figure 1: Example of a blood vessel having more pronounced visibility at longer TEs. There is a true CMB (red arrow) and a potential CMB mimic (yellow arrow) which is not visible at short TEs but becomes more visible at longer TEs. Supplementary Figure 2: Example of distortions in the region above the sinus cavity for longer TEs. Supplementary Figure 3: An example of increasing distortions in the temporal lobes and cerebellum for linger TEs. Section 2. Manual Annotation

Current rating scales for CMBs such as MARS (2) and BOMBS (3) provided only coarse information about the spatial location and shape of CMBs. They were not designed to individually detect and annotate individual CMBs on the MR image itself, which limits their usefulness for training machine learning models. To address this, we implemented a protocol (shown in Supplementary Figure 4) to simultaneously annotate CMBs and non-hemorrhage iron deposits in the basal ganglia using SWI, QSM and T2 MRI. Our protocol was inspired by the systemic approach of the Brain Observer MicroBleed Scale (BOMBS) (3). CMBs and non-hemorrhage iron deposits were annotated by experts (IMN and TR) with integer values 1 and 2, respectively. The manual annotation is based on the following observations: 1.

CMBs and iron deposits are hypo-intense on SWI, hyper-intense on QSM and have some hypo-intensity on T2, 2.

CMBs are round in shape and can appear anywhere in the brain, 3.

Non-hemorrhage iron deposits do not have any specific shape but are generally larger than CMBs and are mainly located in the gray matter of the basal ganglia, particularly the globus pallidus 4.

Similar to the characteristics of CMBs and iron deposits described in MARS, if a hypo-intensity on SWI occurs unilaterally, i.e. on one side of the basal ganglia, then it is more likely to be a CMB, and if the hypo-intensity is bilateral, then it is assumed to be non-hemorrhage iron deposits. The annotation protocol is as follows: For each axial SWI slice having a round hypo-intense region (candidate region) similar to CMBs: 1.

Inspect the previous and next few slices to ensure that the round hypo-intense region is not a part of other structures such as blood vessels or sulcus. If it is a part of a blood vessel or similar elongated linear structure, then the rounded hypointense region will be prevalent in several adjacent axial slices (more than 5 slices). If the rounded hypointense region is part of the sulci, then the hypointense region will seem to join with the sulci in subsequent slices. It may be necessary to inspect the candidate region in sagittal and coronal slices to verify. 2.

Once it has been confirmed that the round hypo-intense region is separate and not part of other structures, check the region’s corresponding intensities on the T2 and QSM images. a.

If the corresponding region is hyper-intense on the T2 image, and shows no appreciable change in intensity on the QSM compared to surrounding voxels, then the region likely represents an enlarged perivascular space. b.

If the corresponding region is hypo-intense or does not show any discernable changes in intensity on the T2 compared to surrounding voxels, and hyperintense on the QSM image, then the region likely represents a CMB. 34 3.

If the region within the globus pallidus section of the basal ganglia is hypo-intense on the SWI, hyper-intense on the QSM, and show some hypo-intensity on the T2 image compared to surrounding voxels, then the region represents non-hemorrhage iron deposits. On the other hand, if the region is hypo-intense on the QSM, then the region likely represents calcium deposits. 35

Supplementary Figure 4: Flowchart for labeling cerebral microbleeds, iron deposits, basal ganglia calcifications and enlarged perivascular spaces.

Within Basal Ganglia? Intensity on T2?

Intensity on QSM?

Has rounded shape? (check neighboring slices)

Hypo-intense region separate from neighboring structures? Separable from Sulcus? (Check sagittal and coronal slices)

Intensity on T2?

Intensity on QSM?

Start

Next axial

SWI slice

Label region as

Cerebral Microbleed

Label region as

Iron Deposit

Enlarged Perivascular Space (Do nothing)

Yes No

Found hypo-intense voxels/region in slice?

Hypo-intense

Yes

Hyper-intense

Yes

Hypo-intense

Yes No No Hyper-intense

Hyper-intense

Basal Ganglia Calcification (Do nothing) No Hypo-intense

Hypo-intense

Hypo-intensity unilateral or bilateral?

Bilateral

Unilateral Section 3. 2D U-Net

Our lesion prediction models are based on the U-Net (4). Both single and multi-class models consist of a analysis path (down-sampling operations) with five stages of convolution blocks and pooling, followed by a five synthesis path (up-sampling) with five stages of up-convolutions, plus a convolutional block. Each downsampling block consists of two layers of a 2D padded convolution layer having kernel size of 3x3 and stride of 1x1, followed by Batch Normalization and ReLU activation. The downsampling block ends with a 2x2 max pooling layer which reduces the resolution feature map by half in every spatial direction. The central block consists of two instances of padded 2D convolution with kernel size 3x3 and stride 1x1, followed by Batch Normalization and ReLU activation. Each upsampling block passes its input data through a 2D transpose convolution with kernel size of 2x2 and stride 2x2 in order to double the size of the feature map. This doubled feature map is then concatenated with the feature map (same size) of the corresponding analysis stage (i.e. the feature map before max pooling layer), followed by two instances of a padded 2D convolution layer having kernel size 3x3 and stride 1x1, followed by Batch Normalization and ReLU activation. Due to the use of padded convolutions throughout the model, the input and output image sizes are the same (256x256). The smallest downsampled image size is 8x8 in the central convolution block. In the case of the single class prediction model, the output of the final upsampling stage passes through a 2D convolution layer with kernel size 1x1, stride 1x1 and Sigmoid activation function. For the multi-class prediction model, the output of the final upsampling block is passed through a 2D convolution layer with kernel size 1x1, stride 1x1 and ReLU activation function, and then through a SoftMax layer to generate class probabilities. The model architecture is depicted in Supplementary Figure 5. We employed random translations, random rotation, and flipping along the left-right axis during training. The network was trained with the cross-entropy loss. 37

Supplementary Figure 5: U-Net architecture using padded convolutions for both single class and multi-class predictions. Section 4. Additional Details on Experimental Pipeline

Each input image was normalized to have zero mean and unit variance. For QSM images, an additional prior step truncated the overall intensity such that the intensity was within the range [ −𝑘 ∗ 𝜎

𝑄𝑆𝑀 ≤ 𝑉

𝑄𝑆𝑀 ≤𝑘 ∗ 𝜎

𝑄𝑆𝑀 ], where k = 5 and 𝜎 𝑄𝑆𝑀 is the standard deviation for the QSM image. This step is necessary because QSM images contain high intensity noise (especially around the boundary of the brain and the region proximate to the sinus cavity) which may de-emphasize the intensity of the rest of the brain.

To improve the robustness of the deep learning network and include more training data we enriched the training and validation datasets with augmentation. Axial slices containing CMBs and iron deposits are, for the most part, few compared to the remaining slices in a given brain volume. This type of class imbalance may bias the training process. To address this, data augmentation was performed on slices selectively instead of all slices, inspired from the concept of random over-sampling (ROS) and random under-sampling (RUS) (5). First, all slices containing the labels of interest (i.e. CMBs and/or iron deposits) are augmented. Then a number of the remaining slices are randomly selected and augmented in the same manner until the total number of slices containing the labels of interest and the total number of slices that do not contain any labels of interest is similar. Data augmentation consisted of geometric transforms such as translations, rotations and image mirroring. In each experiment, the axial SWI slice (along with the corresponding axial QSM and T2 slices) and corresponding axial reference annotation slice was augmented. For translations, a set of two random integers tx and ty (representing the amount of shift per axis) were generated within the range [-45, 45] and used to translate the image slice(s) and corresponding slice of the reference annotation. This range was chosen empirically so that most of the brain would be visible in the translated image. A total of 10 random integers per axis were generated for multi-class experiments. For rotations, a set of random integers d (representing the rotation in degree) were generated within the range [1, 60], and the image slice(s) and the slices with reference annotations were rotated using both + d and - d . The regions of the crops that were located outside the image matrix were padded with edge values. A total of 16 random integers were used for multi-class experiments. 39 Supplementary Table 4.1: Experimental results using the single class model for the number of predicted CMB and iron deposit lesions evaluated against the reference annotation.

Experiments Avg Sensitivity Avg Precision Avg Magnitude Accuracy Correlation Coeff (Pearson) Bland-Altman Plot (md, [lower, upper])

Single Class CMB SWI 0.85 ± 0.06 CI: [0.74, 0.97] 0.22 ± 0.04 CI: [0.14, 0.31] 0.91 ± 0.06 CI: [0.80, 1.03] 0.96 a md=-4.54 CI: [-24.77, 15.69] SWI and QSM 0.88 ± 0.06 CI: [0.77, 0.99] 0.40* ± 0.07 CI: [0.27, 0.54] 1.09* ± 0.04 CI: [1.00, 1.17] 0.97 a md=-2.21 CI: [-19.45, 15.04] SWI and T2 0.84 ± 0.07 CI: [0.70, 0.97] 0.29 ± 0.06 CI: [0.17, 0.42] 0.95 ± 0.08 CI: [0.80, 1.10] 0.76 a md=-3.04 CI: [-36.59, 30.51] SWI, QSM and T2 0.87 ± 0.06 CI: [0.76, 0.98] 0.50* ± 0.07 CI: [0.35, 0.64] 1.08* ± 0.07 CI: [0.94, 1.22] 0.98 a md=-0.71 CI: [-15.26, 13.84] Single Class Iron Deposits SWI 0.81 ± 0.06 CI: [0.68, 0.94] 0.51 ± 0.07 CI: [0.37, 0.65] 1.06 ± 0.08 CI: [0.91, 1.21] 0.88 a md=-4.08 CI: [-102.63, 94.46] SWI and QSM 0.77 ± 0.06 CI: [0.65, 0.89] 0.60 ± 0.07 CI: [0.46, 0.75] 1.09 ± 0.05 CI: [0.99, 1.20] 0.92 a md=-2.54 CI: [-80.42, 75.34] SWI and T2 0.77 ± 0.06 CI: [0.64, 0.89] 0.56 ± 0.07 CI: [0.42, 0.70] 1.04 ± 0.08 CI: [0.88, 1.19] 0.85 a md=9.04 CI: [-91.48, 109.56] SWI, QSM and T2 0.81 ± 0.05 CI: [0.71, 0.92] 0.62 ± 0.07 CI: [0.47, 0.76] 1.11 ± 0.05 CI: [1.02, 1.21] 0.81 a md=2.58 CI: [-114.33, 119.50] a Significant differences indicated: p < 0.05. md = mean difference Supplementary Table 4.2: Experimental result using the multiclass model for the number of predicted CMB and iron deposit lesions evaluated against the reference annotation.

Experiments Avg Sensitivity Avg Precision Avg Magnitude Accuracy Correlation Coeff (Pearson) Bland-Altman Plot (md, [lower, upper])

Multi-class CMB SWI 0.82 ± 0.07 CI: [0.68, 0.96] 0.36 ± 0.06 CI: [0.24, 0.49] 0.99 ± 0.06 CI: [0.87, 1.12] 0.96 a md=0.04 CI: [-26.61, 26.70] SWI and QSM 0.84 ± 0.07 CI: [0.70, 0.98] 0.59 ± 0.08 CI: [0.43, 0.75] 1.15 ± 0.07 CI: [1.00, 1.29] 0.99 a md=0.67 CI: [-18.26, 19.59] SWI and T2 0.76 ± 0.08 CI: [0.60, 0.91] 0.43 ± 0.06 CI: [0.31, 0.56] 1.00 ± 0.07 CI: [0.86, 1.13] 0.97 a md=0.75 CI: [-25.47, 26.97] SWI, QSM and T2 0.89 ± 0.05 CI: [0.79, 1.00] 0.49 ± 0.06 CI: [0.37, 0.61] 1.07 ± 0.06 CI: [0.95, 1.19] 0.98 a md=0.92 CI: [-25.49, 27.33] Multi-class Iron Deposits SWI 0.76 ± 0.06 CI: [0.63, 0.88] 0.70 ± 0.08 CI: [0.55, 0.86] 1.13 ± 0.07 CI: [0.99, 1.28] 0.91 a md=13.83 CI: [-62.08, 89.75] SWI and QSM 0.75 ± 0.07 CI: [0.62, 0.88] 0.75 ± 0.08 CI: [0.60, 0.91] 1.20 ± 0.05 CI: [1.11, 1.30] 0.91 a md=6.71 CI: [-74.77, 88.18] SWI and T2 0.76 ± 0.06 CI: [0.64, 0.89] 0.60 ± 0.08 CI: [0.44, 0.75] 1.08 ± 0.07 CI: [0.93, 1.22] 0.87 a md=18.29 CI: [-71.54, 108.12] SWI, QSM and T2 0.81 ± 0.05 CI: [0.71, 0.92] 0.64 ± 0.08 CI: [0.49, 0.79] 1.17 ± 0.05 CI: [1.08, 1.27] 0.90 a md=16.29 CI: [-62.23, 94.82 a Significant differences indicated: p < 0.05. md = mean difference 41 Section 5. Experimental Results Excluding Outlier

In our training and testing dataset, there is a single participant with more than 100 CMBs. As shown in Supplementary Figure 6, a beta distribution fitted to the number of CMBs in this dataset indicates that the 99 th percentile of the distribution is approximately 16 CMBs. The participant with more than 100 CMBs is clearly an outlier by this definition. It should be noted that under different circumstances this participant may not be considered an outlier in terms of the number of CMBs. Large numbers of CMBs have been observed in patients with vascular pathologies such as cerebral amyloid angiopathy (CAA) (6) or hypertension (7). In this series of experiments, model training and testing using leave-one-out cross-validation is conducted with the outlier participant being excluded from the dataset. All other training and testing parameters were kept the same as the experiments in the main paper. The non-parametric two-tailed Wilcoxon signed rank test was used to check for statistical significance in the average sensitivity, precision and magnitude accuracy in all experiments. Comparisons were made against the model trained with only SWI. All statistical testing was performed using MATLAB R2017b. The results of all single class experiments are reported in Supplementary Table 5.1. For these experiments, we note that model trained with SWI, QSM and T2 had the best performance in terms of average magnitude accuracy for detecting CMBs. For detecting iron deposits, the model trained with SWI and QSM had the highest average magnitude accuracy. Supplementary Figure 7 shows the scatterplots of the average sensitivity and precision for all single class experiments. The results of all multiclass experiments are reported in Table 5.2. Interestingly, the model trained with SWI and T2 had the best performance in terms of average magnitude accuracy for both CMBs and iron deposits. Supplementary Figure 8 shows the scatterplots of the average sensitivity and precision for all multiclass experiments. 42

Supplementary Figure 6: A histogram of the number of CMB lesions. The blue columns represent the number of lesions. A beta distribution is fitted to this histogram (red dashed curve). The 99 th percentile of this distribution (orange dotted line) is approximately 16 CMBs. The blue bar towards the right of the figure is the participant with more than 100 CMBs. Supplementary Table 5.1: Experimental result using the single class model for the number of predicted CMB and iron deposit lesions evaluated against the reference annotation when the outlier is excluded.

Experiments Avg Sensitivity Avg Precision Avg Magnitude Accuracy Correlation Coeff (Pearson) Bland-Altman Plot (md, [lower, upper]) Single Class CMB

SWI 0.76 ± 0.09 [0.59, 0.93] 0.30 ± 0.06 [0.18, 0.43] 0.86 ± 0.10 [0.67, 1.05] 0.27 md=-4.09 CI: [-15.22, 7.04] SWI and QSM 0.81 ± 0.08 [0.65, 0.96] 0.43 ± 0.07 [0.29, 0.57] 0.97 ± 0.09 [0.79, 1.14] 0.51 a md=-1.52 CI: [-5.88, 2.84] SWI and T2 0.86 ± 0.07 [0.72, 1.00] 0.35 ± 0.07 [0.22, 0.49] 0.98 ± 0.09 [0.81, 1.15] 0.27 md=-4.04 CI: [-14.69, 6.61] SWI, QSM and T2 0.85 ± 0.07 [0.71, 0.99] 0.45 ± 0.08 [0.29, 0.60] 1.09 ± 0.07 [0.94, 1.23] 0.47 a md=-2.74 CI: [-9.58, 4.10] Single Class Iron Deposits

SWI 0.80 ± 0.06 [0.68, 0.93] 0.54 ± 0.07 [0.40, 0.68] 1.05 ± 0.08 [0.90, 1.20] 0.92 a md=0.52 CI: [-77.23, 78.27] SWI and QSM 0.80 ± 0.06 [0.68, 0.92] 0.64 ± 0.07 [0.50, 0.77] 1.12 ± 0.05 [1.01, 1.22] 0.94 a md=1.35 CI: [-64.33, 67.02] SWI and T2 0.74 ± 0.07 [0.60, 0.89] 0.51 ± 0.07 [0.37, 0.65] 0.99 ± 0.08 [0.83, 1.16] 0.87 a md=3.13 CI: [-88.11, 94.37] SWI, QSM and T2 0.76 ± 0.07 [0.62, 0.90] 0.56 ± 0.08 [0.41, 0.71] 1.06 ± 0.07 [0.92, 1.20] 0.85 a md=-3.35 CI: [-111.66, 104.96] a Significant differences indicated: p < 0.05. md = mean difference 44 Supplementary Table 5.2: Experimental result using the multiclass model for the number of predicted CMB and iron deposit lesions evaluated against the reference annotation when the outlier is excluded.

Experiments Avg Sensitivity Avg Precision Avg Magnitude Accuracy Correlation Coeff (Pearson) Bland-Altman Plot (md, [lower, upper]) Multi-class CMB

SWI 0.72 ± 0.08 CI: [0.56, 0.88] 0.46 ± 0.06 CI: [0.34, 0.59] 1.03 ± 0.04 CI: [0.96, 1.11] 0.56 a md=-0.96 CI: [-4.90, 2.98] SWI and QSM 0.85 ± 0.07 CI: [0.71, 0.98] 0.55 ± 0.08 CI: [0.39, 0.71] 1.11 ± 0.08 CI: [0.96, 1.26] 0.69 a md=-1.26 CI: [-4.97, 2.45] SWI and T2 0.86 ± 0.06 CI: [0.74, 0.98] 0.62 ± 0.07 CI: [0.48, 0.77] 1.18* ± 0.04 CI: [1.10, 1.27] 0.69 a md=-0.83 CI: [-3.97, 2.32] SWI, QSM and T2 0.89 ± 0.06 CI: [0.78, 1.00] 0.48 ± 0.07 CI: [0.34, 0.61] 1.07 ± 0.07 CI: [0.93, 1.21] 0.60 a md=-1.96 CI: [-7.66, 3.75] Multi-class Iron Deposits

SWI 0.84 ± 0.05 CI: [0.74, 0.93] 0.65 ± 0.07 CI: [0.51, 0.79] 1.16 ± 0.05 CI: [1.07, 1.26] 0.95 a md=14.52 CI: [-52.73, 81.77] SWI and QSM 0.80 ± 0.07 CI: [0.67, 0.94] 0.66 ± 0.07 CI: [0.52, 0.80] 1.17 ± 0.05 CI: [1.07, 1.27] 0.97 a md=6.70 CI: [-45.57, 58.96] SWI and T2 0.87 ± 0.04 CI: [0.78, 0.95] 0.66 ± 0.07 CI: [0.52, 0.79] 1.17 ± 0.05 CI: [1.08, 1.26] 0.93 a md=18.78 CI: [-57.96, 95.53] SWI, QSM and T2 0.77 ± 0.08 CI: [0.62, 0.93] 0.56 ± 0.07 CI: [0.42, 0.71] 1.06 ± 0.09 CI: [0.89, 1.23] 0.91 a md=13.78 CI: [-63.72, 91.28] a Significant differences indicated: p < 0.05. md = mean difference 45 Supplementary Figure 7: Joint scatterplots of the sensitivity vs precision of all single class experiments predicting CMBs and non-hemorrhage iron deposits when excluding the outlier participant. (Left) all CMB only experiments and (Right) all iron deposits only experiments. In each subplot, the round points indicate the individual participants’ sensitivity and precision evaluated with leave-one-out cross-validation, and the X indicates the mean sensitivity and precision. The legend at the upper left corner of each subplot shows the coordinates of X. In each subplot, histograms of the sensitivity and precision are displayed along the upper and right axes.

Single Class CMB Only Single Class Iron Deposits Only Supplementary Figure 8: Joint scatterplots of the sensitivity vs precision of all multiclass experiments predicting CMBs and non-hemorrhage iron deposits when excluding the outlier. (Left) all evaluations for CMBs and (Right) all evaluations for iron deposits. In each subplot, the round points indicate the individual participants’ sensitivity and precision evaluated with leave-one-out cross-validation, and the X indicates the mean sensitivity and precision. The legend at the upper left corner of each subplot shows the coordinates of X. In each subplot, histograms of the sensitivity and precision are displayed along the upper and right axes.

Multiclass CMB Multiclass Iron Deposits Section 6. Experimental Results with Original UNet

We implemented the original UNet model as described in (4) and used this model to perform a multiclass cross-validated evaluation of the 24 participants. The main difference between our implementation and the original UNet is in the size of the input and output images. The original UNet used input and output image size of 572x572 and 388x388, respectively, whereas our implementation used input and output image size of 412x412 and 228x228, respectively. All other network architecture parameters were kept the same as the original UNet. The results of these multiclass experiments are reported in Supplementary Table 6.1. We note that for both CMB and iron deposits, the model trained with SWI, QSM and T2 had the best performance in terms of average magnitude accuracy.

Supplementary Table 6.1: Experimental result for the number of predicted CMB and iron deposit lesions evaluated against the reference annotation using the original UNet (4).

Experiments Avg Sensitivity Avg Precision Avg Magnitude Accuracy Correlation Coeff (Pearson) Bland-Altman Plot (md, [lower, upper]) Multiclass CMB

SWI 0.79 ± 0.07 [0.65, 0.93] 0.51 ± 0.07 [0.38, 0.65] 1.08 ± 0.03 [1.03, 1.13] 0.97 a md=0.08 CI: [-20.49, 20.65] SWI and QSM 0.83 ± 0.07 [0.7, 0.97] 0.48 ± 0.07 [0.35, 0.6] 0.99 ± 0.08 [0.83, 1.15] 0.98 a md=0.33 CI: [-20.89, 21.55] SWI and T2 0.80 ± 0.07 [0.66, 0.95] 0.49 ± 0.08 [0.34, 0.64] 1.1 ± 0.05 [1, 1.2] 0.83 a md=0.38 CI: [-32.62, 33.37] SWI, QSM and T2 0.82 ± 0.07 [0.68, 0.96] 0.52 ± 0.07 [0.38, 0.66] 1.11 ± 0.04 [1.03, 1.2] -0.21 md=3.21 CI: [-44.91, 51.33]

Multiclass Iron Deposits