Student Beats the Teacher: Deep Neural Networks for Lateral Ventricles Segmentation in Brain MR
Mohsen Ghafoorian, Jonas Teuwen, Rashindra Manniesing, Frank-Erik de Leeuw, Bram van Ginneken, Nico Karssemeijer, Bram Platel
SStudent Beats the Teacher: Deep Neural Networks forLateral Ventricles Segmentation in Brain MR
Mohsen Ghafoorian a,c,* , Jonas Teuwen a,d,* , Rashindra Manniesing a , Frank-Erik de Leeuw b ,Bram van Ginneken a , Nico Karssemeijer a , and Bram Platel aa Radboud University Medical Center, Diagnostic Image Analysis Group, Department ofRadiology and Nuclear Medicine, Nijmegen, the Netherlands b Donders Institute for Brain, Cognition and Behaviour, Department of Neurology, RadboudUniversity Medical Center, Nijmegen, the Netherlands c TomTom, Amsterdam, the Netherlands d Optics Research Group, Imaging Physics Department, Delft University of Technology, theNetherlands
ABSTRACT
Ventricular volume and its progression are known to be linked to several brain diseases such as dementia andschizophrenia. Therefore accurate measurement of ventricle volume is vital for longitudinal studies on thesedisorders, making automated ventricle segmentation algorithms desirable. In the past few years, deep neuralnetworks have shown to outperform the classical models in many imaging domains. However, the success ofdeep networks is dependent on manually labeled data sets, which are expensive to acquire especially for higherdimensional data in the medical domain. In this work, we show that deep neural networks can be trained on much-cheaper-to-acquire pseudo-labels (e.g., generated by other automated less accurate methods) and still producemore accurate segmentations compared to the quality of the labels. To show this, we use noisy segmentation labelsgenerated by a conventional region growing algorithm to train a deep network for lateral ventricle segmentation.Then on a large manually annotated test set, we show that the network significantly outperforms the conventionalregion growing algorithm which was used to produce the training labels for the network. Our experiments reporta Dice Similarity Coefficient (DSC) of 0 .
874 for the trained network compared to 0 .
754 for the conventionalregion growing algorithm ( p < . Keywords: lateral ventricles, segmentation, deep neural network, fully convolutional neural networks, noisylabels, pseudo-label, large dataset * Equal contribution
1. INTRODUCTION
Lateral ventricles are anatomical parts of the ventricular system in the brain, where the cerebrospinal fluid isproduced. Ventricular volume and its progression are associated with several brain diseases. In certain formsof dementia, the increase of lateral ventricular volume has been associated to decline in cognitive function. Some psychiatric illnesses such as schizophrenia have also been linked to enlargement in ventricular volume. Additionally, asymmetrical shapes between the left and the right lateral ventricles together with the size of theventricles can be indicative of abnormalities in the brain. Even though a rough estimation of the ventricular volume such as the number of slices that the ventriclesappear in, might be sufficient for some applications, more accurate quantitative measurements are necessaryto longitudinally study subtle differences. It has also been shown that leveraging spatial information usingventricles as landmarks are beneficial for the detection of a number of pathologies in the brain including whitematter hyperintensities and lacunes. Though manual annotation of lateral ventricles might be an option on
Send correspondence to Jonas Teuwen: [email protected]. a r X i v : . [ c s . C V ] M a r maller datasets and cross-sectional studies, this would not be feasible otherwise as the task is time-consuming,laborious and subjective. Therefore an accurate, objective and independent segmentation of the left and rightventricles is desirable in clinical practice.With the success of deep neural networks
6, 7 in visual pattern recognition, many studies have been successfullyconducted in the medical image analysis domain during the past few years,
8, 9 that have resulted in intelligentsystems that reach or surpass the level of medical experts on different tasks and domains.
Since the recent deep learning approaches follow a data-driven strategy to learn the optimal representationsfor the specific tasks at hand, these methods often require large sets of annotated data to train on. Several recentstudies have shown strong implications of training dataset size on the quality of trained networks. For instance,it has been shown that even with gigantic datasets, the performance of the trained network linearly scales withlogarithm of the size of the training data. Given the reasoning above, the computer vision community has created enormous labeled datasets usingcrowd sourcing methods, for instance using Amazon mechanical turk. However this solution is not feasible formedical datasets, as the labeling process requires specific expertise that is only possible with medical expertsavailable. Therefore, the high costs of gathering large medical datasets have still hindered feasibility of giganticdatasets that fully leverage the high capacity of the deep neural networks on various medical image analysisdomains.Another strategy to provide large labeled datasets is to use (not necessarily very accurate) available methodsfor the task in order to provide pseudo-labels. Using this, one can provide arbitrarily large datasets as far asunlabeled data is available. This however, arises a few interesting questions to be answered: 1) Considering animposed trade-off between the dataset size and its relative label accuracy, would that make sense to train neuralnetworks with noisy but large datasets rather than smaller ones with more accurate labels, and 2) In case weopt for the latter, is the low accuracy of the provided pseudo-labels necessarily an upper-bound for the accuracyof a trained network?In this study, we aim to answer the aforementioned rather important questions by reporting a deep neuralnetwork that achieves high accuracy in segmenting the left and right ventricles separately, being trained on noisypseudo-labels. We also show that, though desirable, accurate manual labels are not mandatory to produce goodresults, given a large set of (unbiased) noisy-labeled images.
2. METHODS2.1 Material
The data used in this work is obtained from the RUN DMC (Radboud University Nijmegen Diffusion Tensor andMagnetic Resonance Imaging Cohort), which is a longitudinal study of small vessel disease and its progression.The imaging protocol includes a 3D T1-magnetization prepared rapid gradient-echo (MPRAGE) pulse sequencewith voxel size of 1 . × . × . and a fluid attenuation inversion recovery (FLAIR) pulse sequence withvoxel size 0 . × . × . with a slice gap of 1 mm, scanned using a 1.5T MR scanner (Magnetom Sonata,Siemens Medical Solution, Erlangen, Germany).We selected a subset of 397 subjects which was randomly split into sets of 246, 99 and 52 subjects for training,validation and testing purposes respectively. For an accurate segmentation, we need to take into account the possible movement of the patient between theacquisition of the T1 and FLAIR modalities. To align the image coordinates of both modalities, we rigidlyregistered the T1 images with the images using FSL-FLIRT. In order to make the processing easier, we excludenon-brain tissue such as the skull, eyes, etc. We computed the brain mask using FSL-BET on the T1 images.The resulting masks are then transformed using the computed transformation to the FLAIR images. To correctfor the spatial intensity variations on the MR images caused by inhomogeneities in the magnetic field, we performa bias-field correction using FSL-FAST. As a final preprocessing step we normalize each image by dividing itwith the 95 th percentile of all intensities within the same image.igure 1: The fully convolutional network used for training our model. This is a U-net-like architecture
16, 18 with slight modifications as described in the text.
To generate the pseudo-labels for the training set, we used an in-house developed application where automaticallyselected seed points are used to perform a watershed-based segmentation algorithm on the T1 image to provideventricle masks on the whole training, validation, and test sets. The algorithm is available in the commercialversion of MeVisLab (MeVis Medical Solutions AG and Fraunhofer MEVIS; Bremen, Germany). The providedmasks generated by the watershed region growing based algorithm are inaccurate in some cases or totally failingin some others. We therefore excluded 9 cases from the training set where the algorithm failed completely. Inaddition to this for evaluation purposes, the test set was independently manually segmented by an experiencedreader on the registered T1 images, where the FLAIR images was used in cases of ambiguity.
To segment the left and the right ventricles separately, we formulated the problem as a three-class segmentationof the background, left ventricle and right ventricle respectively. We utilized a fully convolutional network basedon the U-net architecture, with a depth of 5, applied slice-by-slice on a two-channel image composed of theT1 and FLAIR modalities. As in the analysis path of the standard U-net, we used 3 × × ,
2) stride. We slightly deviated from the original architecture by using leakyReLus with leakiness 0 .
01 and follow these by dropout with 0 . layer.Additionally, we started the first convolutional layer with 16 filters and we already doubled the number of filtersin each layer, before the max pooling to avoid bottlenecks. We employed a similar scheme in the synthesispath. Details of the network architecture is illustrated in Figure 1.We used the categorical cross-entropy loss function with L regularization with λ = 10 − . To account forclass imbalance, we weighted the loss function on the background by a factor of 0 .
01. The network weights wereinitialized from a Gaussian distribution N (0 , / fan in ). To train the network, we used the Adam update rule with parameters β = 0 . β = 0 .
999 and (cid:15) = 10 − . We trained our network for 200 epochs with an initiallearning rate of 10 − which was decreased in later epochs to 10 − . The final model was selected as the modelwith the lowest validation loss. The network was trained using the imperfect segmentations made by the regiongrowing method as described in section 2.3.igure 2: Receiver Operating Characteristic for the segmentation of both ventricles. Please note that the regiongrowing algorithm is represented by a single point as the mentioned method is not probabilistic in contrast tothe deep network. We selected a threshold of 0 . R, X ) = 2 (cid:80) i | R i ∩ X i | (cid:80) i | R i | + | X i | , where R i are the reference annotations for subject i , X i are either the labels generated by the conventionalapproach or the thresholded network outputs and | · | is the size of the set.We also report and compare receiver operating characteristic (ROC) curves that represent the methods withtheir true and false positive rates (sensitivity and 1 − specificity) on various operating points. We use areaunder the ROC curve (AUC) as a single metric to quantitatively compare ROC curves. Furthermore, we usebootstrapping (over 1000 randomly created bootstraps) on the test set samples to report statistical significancetest p − values. To be more specific, given “method A is no better than method B” as the null-hypothesis toreject, empirical p − value is reported as the proportion of bootstraps where method B results in a higher DSC.
3. RESULTS
Evaluating the methods on the test set, we obtained a DSC of 0 .
874 compared to 0 .
754 for the region growingbased method, with respect to the manual annotations as the reference standard. The deep network significantlyoutperformed the region growing method ( p < . .
750 0 .
723 0 . .
881 0 .
867 0 . .
990 0 .
989 0 .
4. DISCUSSION AND CONCLUSIONS
Interestingly in the experiments, we observed that the deep model trained on imperfect ground truth could stillget a decent training and outperform its ground truth generating method significantly. This is an interestingand important finding for the medical imaging domain where the high costs of generating large manually labeleddatasets might seem to reject the feasibility of training deep neural networks that require gigantic training sets toachieve good performance. These results also show that a relatively low accuracy of the provided pseudo-labelsis not necessarily an upper bound to the performance of the trained network.For this to happen, there are two requirements that need to be satisfied: Firstly, the distribution of thesamples with noisy labels should be adequately randomly scattered over the feature space. Otherwise, if theround truth providing method is biased and constantly repeats the same error patterns the model would mostlikely learn the same error patterns. Secondly, the method should be regularized well enough to maintain itsgeneralizability and not to overfit the noise patterns.In this work, we presented a fully automated algorithm for the segmentation of the lateral ventricles on brainMR images that is well capable of discriminating between the left and right ventricles. Despite the noisy traininglabels, the network achieves a DSC of 0 . REFERENCES [1] J. Haxby, J. Gillette, D. Teichberg, et al. , “Longitudinal changes in lateral ventricular volume in datientswith dementia of the alzheimer type,”
Neurology (10), 2029–2029 (1992).[2] I. C. Wright, S. Rabe-Hesketh, P. W. Woodruff, et al. , “Meta-analysis of regional brain volumes inschizophrenia,” American Journal of Psychiatry (1), 16–25 (2000).[3] A. M. McKinney,
Enlargement or Asymmetry of the Lateral Ventricles Simulating Hydrocephalus , 349–369.Springer International Publishing (2017).[4] M. Ghafoorian, N. Karssemeijer, I. W. van Uden, et al. , “Automated detection of white matter hyperinten-sities of all sizes in cerebral small vessel disease,”
Medical physics (12), 6246–6258 (2016).[5] M. Ghafoorian, N. Karssemeijer, T. Heskes, et al. , “Deep multi-scale location-aware 3d convolutional neuralnetworks for automated detection of lacunes of presumed vascular origin,” NeuroImage: Clinical , 391–399(2017).[6] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature , 436–444 (2015).[7] J. Schmidhuber, “Deep learning in neural networks: An overview,”
Neural Networks , 85–117 (2015).[8] G. Litjens, T. Kooi, B. E. Bejnordi, et al. , “A survey on deep learning in medical image analysis,” MedicalImage Analysis , 60 – 88 (2017).[9] A. Rodriguez-Ruiz, J. Teuwen, S. Vreemann, et al. , “New reconstruction algorithm for digital breast to-mosynthesis: better image quality for humans and computers,” Acta Radiologica (2017).[10] V. Gulshan, L. Peng, M. Coram, et al. , “Development and validation of a deep learning algorithm fordetection of diabetic retinopathy in retinal fundus photographs,”
Jama (22), 2402–2410 (2016).[11] M. Ghafoorian, N. Karssemeijer, T. Heskes, et al. , “Location sensitive deep convolutional neural networksfor segmentation of white matter hyperintensities,”
Scientific Reports (2017).[12] B. E. Bejnordi, M. Veta, P. J. van Diest, et al. , “Diagnostic assessment of deep learning algorithms fordetection of lymph node metastases in women with breast cancer,” Jama (22), 2199–2210 (2017).[13] C. Sun, A. Shrivastava, S. Singh, et al. , “Revisiting Unreasonable Effectiveness of Data in Deep LearningEra,” in
IEEE International Conference on Computer Vision (ICCV) , (2017).[14] A. G. van Norden, K. F. de Laat, R. A. Gons, et al. , “Causes and consequences of cerebral small vesseldisease. the run dmc study: a prospective cohort study. study rationale and protocol,”
BMC neurology (1), 29 (2011).[15] M. Jenkinson, C. F. Beckmann, T. E. Behrens, et al. , “Fsl,” Neuroimage (2), 782–790 (2012).[16] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmen-tation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention ,234–241, Springer (2015).[17] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internalcovariate shift,” in
International Conference on Machine Learning , 448–456 (2015).[18] ¨O. C¸ i¸cek, A. Abdulkadir, S. S. Lienkamp, et al. , “3d u-net: learning dense volumetric segmentation fromsparse annotation,” in
International Conference on Medical Image Computing and Computer-Assisted In-tervention , 424–432, Springer (2016).[19] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980arXiv preprint arXiv:1412.6980