Robust Automatic Whole Brain Extraction on Magnetic Resonance Imaging of Brain Tumor Patients using Dense-Vnet
Sara Ranjbar, Kyle W. Singleton, Lee Curtin, Cassandra R. Rickertsen, Lisa E. Paulson, Leland S. Hu, J. Ross Mitchell, Kristin R. Swanson
RRobust Automatic Whole Brain Extraction on Magnetic Resonance Imaging of Brain Tumor Patients using Dense-Vnet
Sara Ranjbar , Kyle W. Singleton , Lee Curtin , Cassandra R. Rickertsen , Lisa E. Paulson , Leland S. Hu , J. Ross Mitchell , Kristin R. Swanson Mathematical NeuroOncology Lab, Precision Neurotherapeutics Innovation Program, Department of Neurological Surgery, Mayo Clinic, Phoenix, AZ, USA Department of Diagnostic Imaging and Interventional Radiology, Mayo Clinic, Phoenix, AZ,USA Department of Biostatistics and Bioinformatics, Moffitt Cancer Center and Research Institute, Tampa, Florida, USA *Corresponding Author: [email protected] § Authors contributed equally.
Abstract
Whole brain extraction, also known as skull stripping, is a process in neuroimaging in which non-brain tissue such as skull, eyeballs, skin, etc. are removed from neuroimages. Skull striping is a preliminary step in presurgical planning, cortical reconstruction, and automatic tumor segmentation. Despite a plethora of skull stripping approaches in the literature, few are sufficiently accurate for processing pathology-presenting MRIs, especially MRIs with brain tumors. In this work we propose a deep learning approach for skull striping common MRI sequences in oncology such as T1-weighted with gadolinium contrast (T1Gd) and T2-weighted fluid attenuated inversion recovery (FLAIR) in patients with brain tumors. We automatically created gray matter, white matter, and CSF probability masks using SPM12 software and merged the masks into one for a final whole-brain mask for model training. Dice agreement, sensitivity, and specificity of the model (referred herein as DeepBrain) was tested against manual brain masks. To assess data efficiency, we retrained our models using progressively fewer training data examples and calculated average dice scores on the test set for the models trained in each round. Further, we tested our model against MRI of healthy brains from the LBP40A dataset. Overall, DeepBrain yielded an average dice score of 94.5%, sensitivity of 96.4%, and specificity of 98.5% on brain tumor data. For healthy brains, model performance improved to a dice score of 96.2%, sensitivity of 96.6% and specificity of 99.2%. The data efficiency experiment showed that, for this specific task, comparable levels of accuracy could have been achieved with as few as 50 training samples. In conclusion, this study demonstrated that a deep learning model trained on minimally processed automatically-generated labels can generate more accurate brain masks on MRI of brain tumor patients within seconds.
Keywords
Whole Brain Extraction , brain tumors, MRI, Deep learning Acknowledgement
This publication would not have been possible without the support of the James S. McDonnell Foundation, the Ivy Foundation, the Mayo Clinic, the Zicarelli Foundation and the NIH (R01 NS060752, R01 CA164371, U54 CA210180, U54 CA143970, U54 CA193489, U01 CA220378).
Introduction agnetic resonance imaging (MRI) has a pivotal role in non-invasive diagnosis and monitoring of many neurological diseases such as Alzheimer’s disease and dementia , brain aneurysm, stroke , and primary and metastatic brain tumors . The large amount of data produced in routine patient care has prompted the birth of many studies aiming to automate image analysis tasks relevant to patient care including surgical planning, volumetric analyses , study of anatomical structures , tissue classification , disease staging , and localization of pathology . To be successful in characterization of both normal baseline and pathological deviation , non-brain tissue (fat, skull, eyeballs, eyes, teeth, etc.) needs to be removed from anatomical MRI. As manual annotation of brain tissue on every slice in a 3D volumetric MRI is excruciatingly labor intensive, many automatic ‘whole brain extraction’ or ‘skull stripping’ techniques have been introduced in the literature to tackle this need. Over the years, many approaches to automatic whole brain extraction have been proposed in the literature. Edge-based skull stripping approaches such as BSE and BEA use predetermined sets of parameters to separate brain and non-brain tissue through the use of morphological or region-growing operations. Intensity-based methods such as SPM2 and WAT rely on intensity variations to find the edge of the brain. BET , BET2 , MLS , SMHASS are examples of deformable surface-based methods that use image gradient to fit an active contour/curve to the brain. Atlas-based methods such as MAPS and ANTs define the boundaries of the brain by registering images to one or many atlases for improved accuracy. Patch-based methods such as BEaST and SPECTRE are an extension of atlas-based methods in which image-to-atlas registration is performed on non-local image patches. Hybrid methods integrate several of the above approaches to achieve enhanced results. Examples are HWA , McStrip , ROBEX . The accuracy and robustness of skull stripping methods are key in their adoption, these two measures often being counter-balanced. Several comparative studies have found hybrid methods to be superior in accuracy at the cost of time-efficiency . Intensity-based and edge-based methods tend to be fast because of their simplicity, but their accuracies tend to fluctuate across heterogeneous datasets with varying levels of image resolutions, noise, and artifacts . Atlas-based methods are designed for healthy subjects and thus fail in the presence of large pathological tissue on the image such as diffusely invasive glioblastoma (GBM) tumors. Moreover, GBMs are often localized close to the border of the brain and thus can throw off most skull stripping approaches. Among existing methods, OptiBET –a modified version of BET– has shown robustness with brain pathology . In addition, MONSTR , a patch-based multi-atlas skull stripping method, demonstrated robustness with images of schizophrenia, traumatic brain injury, and brain tumors. Recent success of deep learning methods in the ImageNet challenge has made a lasting impact in computer vision and by extension, in biomedical image analysis. Deep convolutional neural networks have shown success in a number of neuroimaging applications such as MR sequence classification , prediction of genetic mutation using MRI , and tumor segmentation . Naturally, several works have explored the utility of deep learning approaches in MRI skull stripping, including the works of Salehi et al and Kleesiek at al that have reported high performance on publicly available datasets of normal brains. The input-agnostic fully convolutional network in the works of Kleesiek at al outperformed BET, BEaST, BSE, ROBEX, and HWA. Few have fully explored the performance of deep learning approaches on brain tumor data. Given the level of variability that we routinely observe in oncology data, namely in terms of image quality and the varied presentation of brain tumors MRI, we adopted a learning-based approach to tackle this task. The contributions of this work are as follows: 1) assessing the performance of Dense-Vnet architecture in MRI skull stripping of brain tumor data, 2) comparison of performance across the Dense-Vnet MRI input type, 3) conducting a data efficiency experiment to assess the effect f train set size on model performance, and finally 4) assessing the performance of a model trained on brain tumor data on a publicly available dataset of healthy subjects. Imaging Data
Brain Tumor Data for Training and Testing.
The data source in this work was our in-house IRB-approved repository (described in our previous work ) which contains over 70,000 serial structural MR studies of 2,500+ unique brain tumor patients acquired across 20+ institutions. The data pertaining to this study included paired pretreatment T1-weighted post injection of gadolinium contrast (T1Gd) and T2-weighted fluid-attenuated inversion recovery (FLAIR) series of 721 adult brain tumor patients. These series were randomly assigned to 586 training, 52 validation, and 96 test sets. Selection of T1Gd and FLAIR sequences was a practical decision due to their higher prevalence in our repository. We also restricted inclusion criteria to only pre-treatment images as treatment often significantly alters brain appearance on MRI. No additional restriction was imposed on data selection criteria. The imaging data was acquired from 1990 to 2016. Due to the retrospective nature of this dataset, the quality and resolution of images varied across the year and institution of image acquisition. Thus, we employed a number of preprocessing steps to harmonize the data including noise reduction with nonlinear curvature-flow noise reduction , radiofrequency non-uniformity correction reduced using the N4 algorithm , and resizing to a common matrix of 240x240x64 voxels. The SimpleElastix framework was used to rigidly co-register the FLAIR volume to the T1Gd volume within each study. Our imaging repository contains patient information and therefore is subject to HIPAA regulations. Due to the proprietary nature of patient data and patient information that is visible in input images of the network (pre skull stripping), we are not at liberty to freely share data with readers. However, data may be available for sharing upon the request of qualified parties as long as patient privacy and intellectual property interests of our institution are not compromised. Typically, data access will occur through a collaboration and may require interested parties to obtain an affiliate appointment with our institution. Healthy Patient Data for Testing.
The LONI Probabilistic Brain Atlas Project (LBPA40) consists of 40 T1-weighted MRI scans of healthy subjects (20 males, 20 females) and their corresponding manually labeled brain masks. This dataset was used only for model evaluation and was not used during model training. Brain Masks
Several whole brain segmentation approaches were implemented to create brain masks for model training, model testing, and for comparison with previously successful skull stripping methods.
SPM12-p Masks for Model Training and Validation.
Given the large size of our cohort, it was impractical and time-consuming to manually delineate brains on this dataset. Consequently, we used an automatic method to create brain masks for training our network. We relied on the work of Malone et al. for choosing the appropriate method that can serve as a substitute for manual delineation. Malone et al. compared the performance of several methods for total intracranial volume segmentation on T1-weighted MRI of a 288 patients with Alzheimer’s disease using manual labels and suggested the total intracranial volume of SPM12 to be an acceptable substitute for labour-intensive brain masks in multi-centric datasets, even in the presence of neurodegenerative pathology. Statistical Parameter Mapping or SPM is an image analysis software developed at University College London that contains tools for processing positron-emitted tomography (PET), voxel-based morphometry (VBM), electroencephalography (EEG), functional-MRI and MRI data. Given an input MRI, the segmentation procedure in SPM12 (the most recent version of SPM) outputs probability density maps of specific structures within the rain including white matter, gray matter and cerebrospinal fluid (CSF). We used this component within SPM12 to automatically segment gray matter, white matter, and CSF maps from our T1Gd MRIs. These three resulting maps were combined into a single brain probably map and thresholded at 0.7 (empirically decided) to generate a brain mask. Since the presence of tumors (e.g., tumor necrosis) resulted in occasional missing regions inside the combined mask, we performed minimal morphological operations to fill in the holes in the combined brain mask. The final post-processed result (referred to as SPM12-p) was stored as a binary mask and used as labels for training and validation. SPM12 was run in Matlab version 2018a and postprocessing steps resulting in SPM12-p mask were executed in Python version 3.6.6 and SciPy library version 1.0.0. Figure 1 shows an example of this process. Figure 1 - Gradual steps for creating the brain masks. Images reflect the MRI of a 29 year old male brain tumor patient with a diagnosis of glioblastoma. FLAIR refers to Fluid Attenuated Inversion Recovery (FLAIR) MRI and T1Gd refers to T1-weighted MRI with gadolinium contrast enhancement. Gray matter, white matter, and CSF probability masks were generated using the SPM12 software. Yellow voxels in these masks reflect higher probability. The final brain mask (SPM12-p) was generated by combining SPM12 masks, using a threshold of 0.7, and minimal post-processing.
Manual Masks for Model Testing.
To measure model accuracy, we performed manual brain tissue segmentation on 30 randomly-selected test cases. We defined intracranial volume as the combination of gray matter, white matter, subarachnoid CSF, ventricles (lateral, 3rd, 4th), and cerebellum as suggested in the work of Roy et al . Manual segmentation was initiated by two trained individuals with experience in segmentation of tumors on MR imaging data. The segmentation process was initiated with our in-house semi-automatic software used for glioma segmentation. The results were further loaded into the ITK-SNAP Multi-cONtrast brain STRipping (MONSTR) Masks for Comparison.
In addition to the above brain masks, for the 30 test cases processed manually, we also used Multi-cONtrast brain STRipping method (MONSTR) to compare against other methods in the literature. MONSTR is a patch-based multi-atlas skull stripping method that has previously demonstrated robustness with MRI of brain tumor patients. MONSTR brain masks were generated using a containerized version of the MONSTR method called from Python 3.6.6 using T1Gd and FLAIR contrasts as inputs. Figure 2 – An example of SPM12-p mask compared to ground truth generated manually. Despite the occasional under- and over- segmentation (arrows), automatically generated brain masks correctly identified brain boundaries even in the presence of a tumor in the brain.
Network Architecture and Training Approach
Network Architecture . Training was conducted using the Tensorflow -based deep learning platform, NiftyNet version 0.5.0. NiftyNet is a modularly-structured deep learning platform tailored towards medical image analysis applications with modules for pre-processing, network training, evaluation, and inference. For this semantic segmentation task, we used the dense V-network (Dense-Vnet) architecture, a fully connected convolutional neural network that previously has demonstrated success in establishing voxel-voxel connections between input and output images . Dense-Vnet consists of three layers of dense feature stacks whose outputs are concatenated after a single convolution in the skip connection and bilinear upsampling. Supplementary Table 1 presents the setting of parameters in the configuration file used for training a network using the NiftyNet platform. Hereon, we refer to our deep learning model as ‘DeepBrain’. The Main Experiments.
Training was conducted over 500 epochs using 586 training and 52 validation samples. No augmentation was performed on our data. During training, model checkpoints were locally saved every 20 epochs. Optimization was implemented using dice loss and adaptive moment estimation (Adam) optimizer . We repeated model training 3 times: using only T1Gd series, only FLAIR series, and both series as inputs. Details of training procedure, network architecture, and parameters were identical between runs. All experiments were conducted in Tensorflow 1.12.0 using an Ubuntu 17.10 system with a single Nvidia TITAN V GPU. Data Efficiency Experiments.
To contribute towards green and reproducible AI , we conducted a data efficiency experiment in which we estimated the effect of training set size on model performance. We repeated model training with progressively fewer training samples (500, 400, 300, 200, 100, and 50). Details of training procedure, network architecture, and parameters were identical to the training experiments on the entire cohort. The final model in all experiments was identified among checkpoints by calculating dice loss on the validation set and selecting the model with the best performance. Performance Evaluation
We used all test cases (N=96) to compare train time per iterations, inference time per case, and average dice agreement between predictions and labels (SPM12-p masks). To evaluate performance against ground truth, we used dice overlap coefficient, sensitivity, and specificity to compare the predicted brain masks with manual masks (N=30 out of 96 test cases). Let G be the round truth image and S the segmentation result. The dice coefficient (D), Sensitivity, and Specificity were defined as follows:
𝐷𝑖𝑐𝑒 (𝐺, 𝑆) = 2 𝑇𝑃2𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 , 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 (𝐺, 𝑆) = 𝑇𝑃𝑇𝑃 + 𝐹𝑁 , 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 (𝐺, 𝑆) = 𝑇𝑁𝑇𝑁 + 𝐹𝑃 where TP, FP, FN are the number of true positive, false positive, and false negative, respectively. Sensitivity measures the detection rate of brain tissue while specificity measures how much non-brain tissue is correctly identified. Finally, the dice score evaluates the trade-off between sensitivity and specificity. Paired t-test was used to compare the results across runs. These performance measures are reported for brain tumor MRIs with available manual masks. Mean and standard deviation of performance measures were calculated to reflect the range of performance. A p-value lower than 0.05 was used to assess statistically significant differences in performance between experiments. All statistical comparisons were performed in Python 3.6.6 using the SciPy package version 1.0.0. To compare our work with other non-DL skull stripping methods in the literature, we also calculated dice, sensitivity, and specificity of the MONSTR algorithm on our data. For the data efficiency experiment, we report dice scores with training conducted on progressively smaller subsets of the training cohort. Finally, we compared the robustness of our model to other deep-learning skull stripping methods in the literature using LBPA40 dataset of healthy subjects. Here, we compared dice score, sensitivity, and specificity of our results with others. Reported results for the deep learning methods devised by others were acquired from the respective publications. Results
Performance on Brain Tumor Data.
The three versions of DeepBrain (trained on T1Gd, FLAIR, and both) yielded similar levels of agreements between predictions and labels. Table 1 compares the performance of DeepBrain across input types on previously unseen test cases. On average, when DeepBrain was trained on FLAIR, it achieved the highest dice and sensitivity while the model trained on both sequences was superior to single input models in specificity (98.84%).
Performance compared to MONSTR and SPM12 on Brain Tumor Data.
Table 2 compares the performance of our model with other non-DL brain masks created in this work. While MONSTR did not fail to include the regions occupied by tumors into the segmentation, its performance was much worse in identifying the boundaries of the brain in other regions. We observed over- and under- segmentations in MONSTR-generated brain masks especially at the top and bottom of the brain. In comparison, SPM12-p showed a much-improved sensitivity with statistical significance over MONSTR. DeepBrain was superior in dice score and showed significantly higher sensitivity than both non-DL methods. Figure 3 shows an example of predicted brain masks using DeepBrain for a test case, where our approach performed much better than the other methods. With respect to runtime, we created SPM12-p masks for our cohort within an average of 2-3 minutes which was lower than MONSTR runtime of 10-20 minutes. Longer run time for MONSTR is expected as atlas-based methods tend to take longer than other approaches. The average runtime for DeepBrain was outstandingly faster than the other methods for a mere 2 seconds per case. able 1 – Mean and standard deviation of performance for DeepBrain using 30 brain tumor cases with available manual brain mask. Values represent mean and standard deviation of scores . CNN Input
Dice score 𝛍(𝛔)
Sensitivity 𝛍(𝛔)
Specificity 𝛍(𝛔)
T1Gd ab d a (1.09) c (2.34) 98.48 (1.05) T1Gd+FLAIR b c d (0.79) p values of paired t tests: a p = 0.0003, b p = 0.003, c p = 0.04, and d p= 0.001. Results of Data Efficiency experiment . Figure 4 reports the effect of train set size on the dice scores of predicted masks on the independent test set. Compared to the results of the main experiment (N=586), we observed no drop in the overall dice scores by reducing the train set size suggesting that similar results could be achieved using much smaller cohorts. We also did not observe any consistent gain/loss by using both FLAIR and T1Gd as input to the Dense V-Net.
Table 2 – Comparison with other non-DL methods on the brain tumor test set. Values represent mean and standard deviation of scores.
Method
Dice score 𝛍(𝛔)
Sensitivity 𝛍(𝛔)
Specificity 𝛍(𝛔)
Average runtime per case
MONSTR 91.34 (6.76) §d (2.22) 10-20 mins SPM12-p 93.36 (3.75) d¶ (1.09) §¶ (2.34) 98.48 (1.05) p values of paired t tests: § p < 10 -6 , ¶ p = 0.022, and d p = 0.006. Performance on healthy cases.
Table 3 presents the performance of our model on normal brains from the LBPA40 dataset. On average, DeepBrain achieved a dice of 96.2%, sensitivity of 96.6%, and specificity of 99.2% on this dataset. As Table 3 shows, our results are comparable to the state-of-the-art CNN approaches in the literature, however our dice score and sensitivity was on the lower end of these scores among the DL approaches. We believe this is expected given that unlike others we trained our model using brain-tumor patient data that includes unexpected brain appearance due to edema and necrosis. In comparison, the average turn-around time of our model for new test cases is drastically shorter than others. igure 3 - Masks overlaid on the brain tumor MRIs; images on the left show the brain masks created using MONSTR, SPM12-p, and our DL model, DeepBrain in different anatomical views. The right image shows the ground truth manual segmentation. Our method performed very well and much better than the other methods in this application. The Dice coefficient, sensitivity, and specificity, calculated based on the ground truth for this case, are shown to the left of each image. Figure 4 - The effect of training size on accuracy. Comparable dice scores were generated for the independent test set using models with various trainset sizes between 50 and 586. We didn’t observe any consistent gain by using both FLAIR and T1Gd series as inputs.
Table 3 - Comparison with previous literature on manual segmentation of healthy brains in the LBPA40 dataset. Values for scores and run time in others’ work are from literature . Method
Dice score 𝝁(𝝈)
Sensitivity 𝝁(𝝈)
Specificity 𝝁(𝝈)
Average testing time (s)
CONSNet Lucena et al (0.001) 20 Auto-U-net Salehi et al (0.003) iscussion Despite the large body of existing literature on automatic skull striping methods on MRI, few have reported robustness in cases with a pathological brain. Among the non-learning based skull stripping approaches in the literature, the MONSTR algorithm outperformed BEaST , SPECTRE , OPTIBET , and ROBEX on a small cohort of 5 brain tumor cases with an average dice agreement of 96.95% with ground truth . Unfortunately, our data did not support this level of performance for MONSTR and we achieved a moderate dice score of 91.34% and sensitivity of 88.22%. In comparison, the performance of SPM12-p was much better than MONSTR, particularly with respect to its much superior sensitivity (93.39%). This could be associated with the atlas-based nature of the MONSTR segmentation which results in inaccuracies when images deviate from healthy brain MRIs. Discrepancies could also be related to our use of T1Gd and FLAIR inputs to MONSTR, as the original results were reported for T1 and T2. Given the large size of our cohort and the labor-intensive nature of manual segmentation, we needed an automatic method to create brain masks for training. We selected SPM12 due to its comparable performance with manual delineation of total intracranial volume in the presence of neurodegenerative pathology . With minimal post-processing to compensate for the unexpected effects of the presence of tumor on images, our model achieved a dice score of 94.54% and sensitivity of 96.39% using only FLAIR images as input. The closest work to ours is the modality-agnostic 3D convolutional neural network created by Thakur et al . In this work, authors trained their network with pretreatment images of glioma patients using the common MR sequences in oncology including T1, T1Gd, T2, and FLAIR. Their model achieved an average dice coefficient of 97.8% on images from training institution and 95.6%, 91.6%, and 96.9% on datasets of other institutions. Our result is within the range of dice scores reported in their work for other institutes. Given the multi-centric nature of our brain tumor repository and the heterogeneity of its data, we believe the performance of our model is comparable to theirs. Moreover, one advantage of training on a heterogenous dataset with samples from many institutions is that it closely approximates the range of data found in clinical practice. Kleesiek at al also reported an average dice of 95.2% and a sensitivity of 96.25% when testing on brain tumor data. Our performance on healthy subjects was decidedly on the lower end of reported results for skull stripping deep learning models (Table 3). Salehi et al compared the performance of a voxel-wise approach using three convolutional pathways for each anatomical plane, and a fully convolutional U-net architecture and achieved dice coefficients of 97.7% and 96.8% on two publicly available datasets of normal brains. Kleesiek at al trained a 3D input-agnostic fully convolutional network and compared its performance to 6 skull stripping methods (BET, BEaST, BSE, ROBEX, HWA, 3dSkullStrip) on publicly available datasets. Though the authors reported their performance in a merged public dataset, others reported their performance on the LBPA40 dataset for a dice of 97.0% and sensitivity of 97.4%. Lucena et al adopted a brain extraction method called CONSNet which consists of three parallel fully convolutional networks using the U-Net architecture and achieved a dice score of 97.3% and sensitivity of 97.2%. In this work, authors automatically generated silver standard labels for training using the STAPLE approach which combines 8 different segmentation approaches into a probabilistic consensus mask. Our approach did not yield the same level of accuracy on the LBPA40 dataset. We believe this is expected given that, unlike others, we trained our network using only brain-tumor patient data. Also, we trained our model using automatically-generated labels using only one method (SPM12-p). e also conducted a data efficiency experiment in which training was repeated using progressively smaller cohorts. Our results demonstrated that, for the task of MRI skull stripping, a train set size of 50 MRIs might be sufficient to successfully train a convolutional neural network. Although larger datasets are always desirable, they are often unavailable in medical imaging. We thus suggest that rather than collecting large cohorts for training skull stripping CNN models, future efforts should focus on improving training labels and adopting an optimized learning approach. Our work has a number of limitations. Firstly, we did not train a modality-agnostic model. Given the heterogeneity of data types that we observe across institutions, a modality-agnostic approach is necessary for ensuring utility across sites. Secondly, our training labels were generated using only one automatic method. Consensus methods have shown to be a more reliable alternative to any single automatic method in segmenting brain tissue . In future work we aim to address both of these shortcomings. Conclusion
In this work we assessed the performance of a deep learning approach in extracting the brain on pretreatment MRI data of brain tumor patients acquired from over 20 institutions. We trained our network in a large cohort of patients using automatically-generated labels using the SPM12 software. Overall, our approach reached the highest accuracy with FLAIR images as input and our results on previously unseen brain tumor data was comparable to previous work in the literature. The data efficiency analysis showed that comparable levels of accuracy could have been achieved with 50 training samples. In conclusion, this study showed that whole brain extraction using deep learning approaches are more robust and accurate compared to alternative approaches and that comparable performance can be achieved with training on relatively smaller cohorts.
References
1. Fox NC, Schott JM. Imaging cerebral atrophy: normal ageing to Alzheimer’s disease.
Lancet . 2004;363(9406):392-394. 2. Tong DC, Yenari MA, Albers GW, O’Brien M, Marks MP, Moseley ME. Correlation of perfusion- and diffusion-weighted MRI with NIHSS score in acute (<6.5 hour) ischemic stroke.
Neurology . 1998;50(4):864-870. 3. Bauer S, Wiest R, Nolte L-P, Reyes M. A survey of MRI-based medical image analysis for brain tumor studies.
Phys Med Biol . 2013;58(13):R97-R129. 4. Shattuck DW, Sandor-Leahy SR, Schaper KA, Rottenberg DA, Leahy RM. Magnetic resonance image tissue classification using a partial volume model.
Neuroimage . 2001;13(5):856-876. 5. Filipek PA, Semrud-Clikeman M, Steingard RJ, Renshaw PF, Kennedy DN, Biederman J. Volumetric MRI analysis comparing subjects having attention-deficit hyperactivity disorder with normal controls.
Neurology . 1997;48(3):589-601. 6. Hu LS, Ning S, Eschbacher JM, et al. Radiogenomics to characterize regional genetic heterogeneity in glioblastoma.
Neuro Oncol . 2017;19(1):128-137. 7. Kickingereder P, Bonekamp D, Nowosielski M, et al. Radiogenomics of Glioblastoma: Machine Learning–based Classification of Molecular Characteristics by Using Multiparametric and Multiregional MR Imaging Features.
Radiology . 2016;281(3):907-918. 8. Hu LS, Ning S, Eschbacher JM, et al. Multi-Parametric MRI and Texture Analysis to Visualize Spatial Histologic Heterogeneity and Tumor Extent in Glioblastoma.
PLoS One . 2015;10(11):e0141506. 9. Ranjbar S. Texture Analysis Platform for Imaging Biomarker Research. 2017. ttps://pdfs.semanticscholar.org/9e2d/7fea4957d79b81e1d035e9507b60d4c8c4a6.pdf. 10. Ramkumar S, Ranjbar S, Ning S, et al. MRI-Based Texture Analysis to Differentiate Sinonasal Squamous Cell Carcinoma from Inverted Papilloma.
AJNR Am J Neuroradiol . 2017;38(5):1019-1025. 11. Ranjbar S, Velgos SN, Dueck AC, Geda YE, Mitchell JR, Alzheimer’s Disease Neuroimaging Initiative. Brain MR Radiomics to Differentiate Cognitive Disorders.
J Neuropsychiatry Clin Neurosci . 2019;31(3):210-219. 12. Chaddad A, Desrosiers C, Niazi T. Deep radiomic analysis of MRI related to Alzheimer’s disease.
IEEE Access . 2018;6:58213-58221. 13. Kalavathi P, Prasath VBS. Methods on Skull Stripping of MRI Head Scan Images-a Review.
J Digit Imaging . 2016;29(3):365-379. 14. Somasundaram K, Kalaiselvi T. Automatic brain extraction methods for T1 magnetic resonance images using region labeling and morphological operations.
Comput Biol Med . 2011;41(8):716-725. 15. Ashburner J, Friston KJ. Voxel-based morphometry—the methods.
Neuroimage . 2000;11(6):805-821. 16. Hahn HK, Peitgen H-O. The Skull Stripping Problem in MRI Solved by a Single 3D Watershed Transform. In:
Medical Image Computing and Computer-Assisted Intervention – MICCAI 2000 . Springer Berlin Heidelberg; 2000:134-143. 17. Smith SM. Fast robust automated brain extraction.
Hum Brain Mapp . 2002;17(3):143-155. 18. Jenkinson M, Pechaud M, Smith S, Others. BET2: MR-based estimation of brain, skull and scalp surfaces. In:
Eleventh Annual Meeting of the Organization for Human Brain Mapping . Vol 17. Toronto.; 2005:167. 19. Zhuang AH, Valentino DJ, Toga AW. Skull-stripping magnetic resonance brain images using a model-based level set.
Neuroimage . 2006;32(1):79-92. 20. Galdames FJ, Jaillet F, Perez CA. An accurate skull stripping method based on simplex meshes and histogram analysis for magnetic resonance images.
J Neurosci Methods . 2012;206(2):103-119. 21. Leung KK, Barnes J, Modat M, et al. Brain MAPS: an automated, accurate and robust brain extraction technique using a template library.
Neuroimage . 2011;55(3):1091-1108. 22. Avants BB, Tustison NJ, Song G, Cook PA, Klein A, Gee JC. A reproducible evaluation of ANTs similarity metric performance in brain image registration.
Neuroimage . 2011;54(3):2033-2044. 23. Eskildsen SF, Coupé P, Fonov V, et al. BEaST: brain extraction based on nonlocal segmentation technique.
Neuroimage . 2012;59(3):2362-2373. 24. Carass A, Cuzzocreo J, Wheeler MB, Bazin P-L, Resnick SM, Prince JL. Simple paradigm for extra-cerebral tissue removal: algorithm and analysis.
Neuroimage . 2011;56(4):1982-1992. 25. Segonne F, Dale AM, Busa E, et al. A hybrid approach to the Skull Stripping problem in MRI.
NeuroImage . 2001;13(6):241. doi:10.1016/s1053-8119(01)91584-8 26. Rehm K, Schaper K, Anderson J, Woods R, Stoltzner S, Rottenberg D. Putting our heads together: a consensus approach to brain/non-brain segmentation in T1-weighted MR volumes.
Neuroimage . 2004;22(3):1262-1270. 27. Iglesias JE, Liu C-Y, Thompson PM, Tu Z. Robust brain extraction across datasets and comparison with publicly available methods.
IEEE Trans Med Imaging . 2011;30(9):1617-1634. 28. Boesen K, Rehm K, Schaper K, et al. Quantitative comparison of four brain extraction algorithms.
Neuroimage . 2004;22(3):1255-1261. 29. Shattuck DW, Prasad G, Mirza M, Narr KL, Toga AW. Online resource for validation of brain segmentation methods.
Neuroimage . 2009;45(2):431-439. 30. Lutkenhoff ES, Rosenberg M, Chiang J, et al. Optimized brain extraction for pathological rains (optiBET).
PLoS One . 2014;9(12):e115551. 31. Roy S, Butman JA, Pham DL, Alzheimers Disease Neuroimaging Initiative. Robust skull stripping using multiple MR image contrasts insensitive to pathology.
Neuroimage
Int J Comput Vis . 2015;115(3):211-252. 34. Ranjbar S, Singleton KW, Jackson PR, et al. A Deep Convolutional Neural Network for Annotation of Magnetic Resonance Imaging Sequence Type.
J Digit Imaging . October 2019. doi:10.1007/s10278-019-00282-4 35. Yogananda CGB, Shah BR, Vejdani-Jahromi M, et al. A Novel Fully Automated Mri-Based Deep Learning Method for Classification of Idh Mutation Status in Brain Gliomas.
Neuro Oncol . October 2019. doi:10.1093/neuonc/noz199 36. Chang P, Grinband J, Weinberg BD, et al. Deep-Learning Convolutional Neural Networks Accurately Classify Genetic Mutations in Gliomas.
AJNR Am J Neuroradiol . 2018;39(7):1201-1207. 37. Pereira S, Pinto A, Alves V, Silva CA. Deep Convolutional Neural Networks for the Segmentation of Gliomas in Multi-sequence MRI. In:
Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries . Springer International Publishing; 2016:131-143. 38. Işın A, Direkoğlu C, Şah M. Review of MRI-based Brain Tumor Image Segmentation Using Deep Learning Methods.
Procedia Comput Sci . 2016;102:317-324. 39. Kamnitsas K, Bai W, Ferrante E, et al. Ensembles of Multiple Models and Architectures for Robust Brain Tumour Segmentation. In:
Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries . Springer International Publishing; 2018:450-462. 40. Salehi SSM, Erdogmus D, Gholipour A. Auto-Context Convolutional Neural Network (Auto-Net) for Brain Extraction in Magnetic Resonance Imaging.
IEEE Transactions on Medical Imaging . 2017;36(11):2319-2330. doi:10.1109/tmi.2017.2721362 41. Kleesiek J, Urban G, Hubert A, et al. Deep MRI brain extraction: A 3D convolutional neural network for skull stripping.
Neuroimage . 2016;129:460-469. 42. Sethian JA.
Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science . Cambridge University Press; 1999. 43. Tustison NJ, Avants BB, Cook PA, et al. N4ITK: improved N3 bias correction.
IEEE Trans Med Imaging . 2010;29(6):1310-1320. 44. Marstal K, Berendsen F, Staring M, Klein S. SimpleElastix: A user-friendly, multi-lingual library for medical image registration. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops . ; 2016:134-142. 45. Malone IB, Leung KK, Clegg S, et al. Accurate automatic estimation of total intracranial volume: a nuisance variable with less nuisance.
Neuroimage . 2015;104:366-372. 46. Penny WD, Friston KJ, Ashburner JT, Kiebel SJ, Nichols TE.
Statistical Parametric Mapping: The Analysis of Functional Brain Images . Elsevier; 2011. 47. Yushkevich PA, Piven J, Hazlett HC, et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability.
Neuroimage . 2006;31(3):1116-1128. 48. Abadi M, Agarwal A, Barham P, et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv [csDC] . March 2016. http://arxiv.org/abs/1603.04467. 49. Li W, Wang G, Fidon L, Ourselin S, Cardoso MJ, Vercauteren T. On the Compactness, Efficiency, and Representation of 3D Convolutional Networks: Brain Parcellation as a Pretext Task. In:
Information Processing in Medical Imaging . pringer International Publishing; 2017:348-360. 50. Gibson E, Li W, Sudre C, et al. NiftyNet: a deep-learning platform for medical imaging.
Comput Methods Programs Biomed . 2018;158:113-122. 51. Gibson E, Giganti F, Hu Y, et al. Automatic Multi-Organ Segmentation on Abdominal CT With Dense V-Networks.
IEEE Trans Med Imaging . 2018;37(8):1822-1834. 52. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . ; 2015:3431-3440. 53. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . ; 2017:4700-4708. 54. Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv [csLG] . December 2014. http://arxiv.org/abs/1412.6980. 55. Schwartz R, Dodge J, Smith NA, Etzioni O. Green AI. arXiv [csCY] . July 2019. http://arxiv.org/abs/1907.10597. 56. Virtanen P, Gommers R, Oliphant TE, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python.
Nat Methods . February 2020. doi:10.1038/s41592-019-0686-2 57. Lucena O, Souza R, Rittner L, Frayne R, Lotufo R. Convolutional neural networks for skull-stripping in brain MR imaging using silver standard masks.
Artificial Intelligence in Medicine . 2019;98:48-58. doi:10.1016/j.artmed.2019.06.008 58. Mohseni Salehi SS, Erdogmus D, Gholipour A. Auto-Context Convolutional Neural Network (Auto-Net) for Brain Extraction in Magnetic Resonance Imaging.
IEEE Trans Med Imaging . 2017;36(11):2319-2330. 59. Thakur S, Doshi J, Min Ha S, Shukla G. NIMG-40. ROBUST MODALITY-AGNOSTIC SKULL-STRIPPING IN PRESENCE OF DIFFUSE GLIOMA: A MULTI-INSTITUTIONAL STUDY.
Neuro . 2019. https://academic.oup.com/neuro-oncology/article-abstract/21/Supplement_6/vi170/5620104. 60. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In:
Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 . Springer International Publishing; 2015:234-241.
Supplementary Table 1 - list of parameter settings in NiftyNet configuration file. Any parameter not specifically described here was left at the default value in the pipeline.
Input: [T1Gd], [Flair], [Label] [Network] path_to_search /path_to_data name dense_vnet spatial_window_size (144,144,144) whitening Ture [System] batch_size 6 num_threads 6 window_sampling resize num_gpus 1 volume_padding_size 0 queue_length 36 window_sampling resize model_dir /model_dir_path optimiser adam data_split_file /file_path [Segmentation] [Training]
Image T1Gd + Flair Sample per volume 1 label label Learning rate 0.001 Label normalization False Loss type Dice Num classes 2 max_iter 500 [Inference][Inference]