Partial Volume Segmentation of Brain MRI Scans of any Resolution and Contrast
Benjamin Billot, Eleanor D. Robinson, Adrian V. Dalca, Juan Eugenio Iglesias
PPartial Volume Segmentation of BrainMRI Scans of any Resolution and Contrast
Benjamin Billot , Eleanor Robinson , Adrian V. Dalca , , andJuan Eugenio Iglesias , , Centre for Medical Image Computing, University College London, United Kingdom Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Harvard Medical School, USA Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, USA
Abstract.
Partial voluming (PV) is arguably the last crucial unsolved problem in Bayesian segmen-tation of brain MRI with probabilistic atlases. PV occurs when voxels contain multiple tissue classes,giving rise to image intensities that may not be representative of any one of the underlying classes. PVis particularly problematic for segmentation when there is a large resolution gap between the atlas andthe test scan, e.g., when segmenting clinical scans with thick slices, or when using a high-resolutionatlas. In this work, we present PV-SynthSeg, a convolutional neural network (CNN) that tackles thisproblem by directly learning a mapping between (possibly multi-modal) low resolution (LR) scans andunderlying high resolution (HR) segmentations. PV-SynthSeg simulates LR images from HR label mapswith a generative model of PV, and can be trained to segment scans of any desired target contrast andresolution, even for previously unseen modalities where neither images nor segmentations are availableat training. PV-SynthSeg does not require any preprocessing, and runs in seconds. We demonstrate theaccuracy and flexibility of the method with extensive experiments on three datasets and 2,680 scans.The code is available at https://github.com/BBillot/SynthSeg . Keywords:
Partial volume segmentation · brain MRI Segmentation of brain MRI scans is a key step in neuroimaging studies, as it is a prerequisite for an arrayof subsequent analyses, e.g., volumetry or connectivity studies. Although manual segmentation remains thegold standard, this tedious and expensive procedure can be replaced by automated tools, which enablereproducible segmentation of large datasets. However, a well-known problem of automated segmentation isthe partial volume (PV) effect [8, 27]. PV arises when different tissues are mixed within the same voxel duringacquisition, resulting in averaged intensities that may not be representative of any of the underlying tissues.For instance, in a T1 scan, the edge between white matter and cerebrospinal fluid (CSF) will often appearthe same color as gray matter, even though no gray matter is present. This problem particularly affects scanswith low resolution in any orientation (e.g., clinical quality images with thick slices), and fine-detailed brainregions like the hippocampus in research quality scans.Modern supervised segmentation approaches based on convolutional neural networks (CNN) [21, 25, 32]can learn to segment volumes with PV, given appropriate training data. However, they do not generalize wellto test scans with significantly different resolution or intensity distribution [3, 20, 22], despite recent advancesin transfer learning and data augmentation [7, 14, 20, 24, 34, 40].In contrast, Bayesian segmentation methods stand out for their generalization ability, which is why theyare used by all major neuroimaging packages (e.g., FreeSurfer [15], SPM [5], and FSL [30]). Bayesian segmen-tation with probabilistic atlases builds on generative models that combine a prior describing neuroanatomy(an atlas), and a likelihood distribution that models the image formation process (often a Gaussian mixturemodel, or GMM, combined with a model of bias field). Bayesian inference is used to “invert” this generativemodel, and compute the most likely segmentation given the observed intensities and the atlas. Unfortunately,these models can be greatly affected by PV.A popular subclass of Bayesian methods uses an unsupervised likelihood term and estimates the GMMparameters from the test scan, which makes them adaptive to MRI contrast [35, 39, 5, 31, 13]. This is a highlydesirable feature in neuroimaging, since differences in hardware and pulse sequences can have a large impact a r X i v : . [ c s . C V ] A p r Billot, Robinson, Dalca and Iglesias on the accuracy of supervised approaches, which are not robust to such variability. Unsupervised likelihoodmodels also enable the segmentation of in vivo
MRI with high-resolution atlases built with ex vivo modalities(e.g., histology [18]).PV can easily be incorporated into the generative model of Bayesian segmentation by considering ahigh resolution (HR) image generated with the conventional non-PV model, and by appending smoothingand subsampling operations to yield the observed low resolution (LR) image. Unfortunately, inferring themost likely HR segmentation from the LR voxels quickly becomes intractable, as estimating the modelparameters requires to marginalize over the HR label configurations. Early methods attempted to circumventthis limitation by approximating the posterior of the HR label (tissue fraction) [23, 28], or by explicitlymodeling the most common PV classes (e.g., white matter with CSF) with dedicated Gaussian intensitydistributions [29, 33]. Van Leemput et al. [36] formalized the problem and proposed a principled statisticalframework for PV segmentation. They were able to simplify the marginalization and solve it for simplecases, given specific assumptions on the number of mixing classes and blurring kernel. Even with thesesimplifications, their method remains impractical for most real world scans, particularly when multiple MRIcontrasts with different resolutions are involved.In this paper, we present PV-SynthSeg, a novel and fast method for PV-aware segmentation of (possiblymulti-modal) brain MRI scans. Specifically, we propose to synthesize training scans based on the forwardmodel of Bayesian segmentation, with a focus on PV effects. We train a CNN with these scans, which aregenerated on the fly with random model parameters [6]. The CNN can be trained to segment scans of anydesired target resolution and contrast by adjusting the probability distribution of these parameters. As withclassical Bayesian segmentation, the method only needs segmentations (no images) as training data. PV-SynthSeg leverages machine learning to achieve, for the first time, PV segmentation of MRI scans of unseen,arbitrary resolution and contrast without any limiting simplifying assumptions. PV-SynthSeg is very flexibleand can readily segment multi-modal and clinical images, which would be unfeasible with exact Bayesianinference.
Let A be a probabilistic atlas that provides, at each spatial location, a vector with the occurrence probabilitiesfor K neuroanatomical classes. The atlas is spatially warped by a deformation field φ parametrized by θ φ ,which follows a distribution p ( θ φ ). Further, let L = { L j } ≤ j ≤ J be a 3D label map (segmentation) of J voxelsdefined on a HR grid, where L j ∈ { , ..., K } . We assume that each L j is independently drawn from thecategorical distribution given by the deformed atlas at each location: p ( L, θ φ | A ) = p ( θ φ ) p ( L | θ φ , A ) = p ( θ φ ) J (cid:89) j =1 p ( L j | θ φ , A ) . (1)Given a segmentation L , image intensities I = { I j } ≤ j ≤ J at HR are assumed to be independent samplesof a (possibly multivariate) GMM conditioned on the anatomical labels: p ( I, θ G , θ B | L ) = p ( θ G ) p ( θ B ) J (cid:89) j =1 N (cid:0) I j − B j ( θ B ); µ L j , Σ L j (cid:1) , (2)where θ G is a vector grouping the means and covariances associated with each of the K classes, and B j ( θ B )is the bias field at voxel j in logarithmic domain, parameterized by θ B . Both θ G and θ B have associatedprior distributions p ( θ G ) and p ( θ B ), which complete the classical non-PV model.We model PV by assuming that, instead of the HR image I , we observe D ( I ) = {D ( I ) j (cid:48) } ≤ j (cid:48) ≤ J (cid:48) , definedover a coarser LR grid with J (cid:48) < J voxels, where D is a blurring and subsampling operator. If the blurringis linear, the likelihood p ( D ( I ) | L, θ B , θ G ) is still Gaussian (since every LR voxel is a linear combination ofGaussian HR voxels) but, in general, does not factorize over voxels j (cid:48) .Bayesian segmentation often uses point estimates for the model parameters to avoid intractable integrals.This requires finding the most likely model parameters given the atlas and observed image, by maximizing artial Volume Segmentation of MRI Scans of any Resolution and Contrast 3 b) Non-PV scan c) Downsampling d) Upsampling e) Training inputsa) Deformed labels Fig. 1.
Generation of a synthetic multi-modal MRI scan (1 × × × × p ( θ φ , θ B , θ G |D ( I ) , A ). Applying Bayes’ rule and marginalizing over the unknown segmentation, the optimiza-tion problem is: arg max θ φ ,θ B ,θ G p ( θ φ ) p ( θ B ) p ( θ G ) (cid:88) L p ( D ( I ) | L, θ B , θ G ) p ( L | θ φ , A ) . Without PV (i.e., D ( I ) = I ), the sum over segmentations L is tractable because both the prior p ( L | θ φ , A ) andthe likelihood p ( I | L, θ B , θ G ) factorize over voxels. However, in the PV case, blurring introduces dependenciesbetween the underlying HR voxels, and the sum is intractable, as it requires evaluating K J terms. Even withsimplifying assumptions, such as limiting the maximum number of classes mixing in a LR voxel to two, usinga rectangular blurring kernel, and exploiting redundancy in likelihood computations [36], computing thesum is prohibitively expensive: it requires K ( K − ( M − evaluations of the prior and K ( K − M ) / M is the voxel size ratio between LR and HR), and only remains tractablefor very low values of M . Rather than explicitly inverting the PV model of Bayesian segmentation, we employ a CNN that directlylearns the mapping between LR intensities D ( I ) and HR labels L . We train this network with syntheticimages sampled from the generative model (see example in Fig. 1). Specifically, every minibatch consists ofa synthetic MRI scan and a corresponding segmentation, generated as follows. (a) Starting from a training dataset { S t } with T segmentations, we first use a public GPU implementation [6]to sample the non-PV joint distribution: p ( I, L, θ φ , θ G , θ B |{ S t } ) = p ( I | L, θ G , θ B ) p ( L | θ φ , { S t } ) p ( θ φ ) p ( θ G ) p ( θ B ) , (3)where the standard probabilistic atlas prior is replaced by a model where a label map is randomly drawnfrom { S t } and deformed with a field φ , i.e., p ( L | θ φ , { S t } ) = (1 /T ) (cid:80) t δ [ L = ( S t ◦ φ )], where δ is theKronecker delta. This model yields label maps that are more spatially regular than atlas samples (Fig. 1.a).The deformation field φ is obtained by sampling a stationary velocity field as a 10 × × × p ( θ G ) aresampled independently for each MRI contrast, using Gaussian distributions for the means and the logarithmof the variances. The bias field is obtained by sampling a 4 × × I from a HR label map L (Fig. 1.b). Billot, Robinson, Dalca and Iglesias
Table 1.
Ranges of the uniform distributions for the parameters of the generative model: rotation ( θ rot ); scaling( θ sc ); shearing ( θ sh ); translation ( θ tr ); standard deviation for generation of the stationary vector field ( σ v ) and biasfield ( σ b ); and factor for the blurring kernel that simulates voxel thickness ( α ). θ rot ( ◦ ) θ sc θ sh θ tr σ v σ b α [-15, 15] [0.8, 1.2] [-0.01, 0.01] [-20, 20] [0, 4] [0, 0.5] [0.75, 1.25] (b) We simulate voxel thickness independently for each channel of I , by blurring them with anisotropicGaussian kernels to simulate the target resolution of the LR images. Specifically, we design the standarddeviation of the kernel such that the power of the HR signal is divided by 10 at the cut-off frequency. As thestandard deviations in the spatial and discrete frequency domain are related by σ f σ s = (2 π ) − , the standarddeviation of the blurring kernel is: σ s = 2 log(10) / (2 π ) r n /r a ≈ (3 / r n /r a , where r n is the (possibly anisotropic) voxel size of the test scan in channel n , and r a is the isotropic voxel sizeof the atlas. We further multiply σ s by a factor α ( σ s = 0 . α r n /r a ), sampled from a uniform distributionof predefined range, to introduce small resolution variations and increase robustness in the method. (c) Because in real data, slice thickness and slice spacing are not necessarily equal, we simulate slice spacingby subsampling the blurred version of I (still defined in the HR grid) to obtain D ( I ), defined on the LR grid(Fig. 1.c). (d) Finally, we upsample D ( I ) back into the original HR space with linear interpolation (Fig. 1.d). This stepmimics the processing at test time, when we upscale the input to the target isotropic HR, so that the CNNcan obtain a label map on the HR grid that represents anatomy within the LR voxels. We train a 3D U-net [32] with synthetic pairs generated on the fly with the PV model. The U-net has 5levels with 2 layers each (3 × × θ φ and θ B are drawn from uniform distributions withrelatively wide ranges (Table 1), which increases the robustness of the CNN [6]. The hyperparameters of θ G are modality specific. In practice, we estimate them from unlabeled scans as follows. First, we run a publiclyavailable Bayesian segmentation method (SAMSEG [31]). Second, we compute estimates of the means andvariances of each class using robust statistics (median and median absolute deviation). Importantly, theestimated variances are multiplied by the ratio of the voxel size volumes at HR and LR, such that theblurring decreases the variances to the expected levels at LR. And third, we fit a Gaussian distribution tothese parameters. Finally, we artificially increase the estimated standard deviations by a factor of 5, withtwo purposes: making the CNN resilient to changes in acquisition parameters, and mitigating segmentationerrors made by SAMSEG (the results below demonstrate that PV-SynthSeg is highly resilient against sucherrors).
39 1 mm isotropic T1 brain scans with segmentation for 39 regions of interest (ROIs) [15]: 36 cerebral(manual) and 3 extra-cerebral (semi-automated).
FLAIR : 2413 T2-FLAIR scans from ADNI [1] at 1 × × CobraLab: artial Volume Segmentation of MRI Scans of any Resolution and Contrast 5
ADNI-HP:
134 Alzheimer’s disease (AD) cases and 134 controls from ADNI [1], with T1 (1 mm) and T2(.4 × .4 × We evaluate PV-SynthSeg with three sets of experiments:
T1-spacing:
We assess performance at different PV levels with the T1-39 dataset. We simulate sparseclinical scans in coronal, sagittal and axial orientation, at 3, 6 and 9 mm slice spacing, with 3 mm slicethickness. We use our method to train a network to provide segmentations on the 1 mm isotropic grid. Weuse segmentations from 20 cases for training, and the rest of the subjects for testing.
FLAIR:
To evaluate our method on scans representative of clinical quality data, with real thick-slice imagesand a contrast other than T1, we use the same 20 label maps from the T1-39 dataset to train our methodto segment the FLAIR scans, on a 1 mm isotropic grid. The Gaussian hyperparameters are estimated froma subset of 20 FLAIR scans, and the remaining 2393 are used for testing. For each FLAIR scan, we useFreeSurfer [15] segmentation of the corresponding T1 ADNI scan as ground truth. We emphasize that suchT1 scans are often not available in clinical protocols, but here we can use these for evaluation purposes only.
Hippocampus:
We also evaluate our method on a multi-modal MRI dataset with different resolutions foreach channel, in the context of a neuroimaging group study. We use the segmentations from the CobraLabdataset to train the proposed model to segment the hippocampal subregions on the ADNI-HP dataset, onthe 0 . Baselines:
We compare the proposed approach with two other competing methods. First, Bayesian segmen-tation without PV; this is a natural alternative to our approach, as it only requires label maps for supervision,and adapts to MRI contrast (including multi-modal). In the first two experiments, we use SAMSEG [31](trained on the same 20 scans from T1-39) to segment the upsampled HR inputs (we also tried segmentingthe LR scans directly, with inferior results). In the third experiment, we use a publicly available hippocampalsegmentation algorithm [17], with a probabilistic atlas created from the CobraLab data.As an upper bound, we use a supervised CNN trained on LR images from the target modality, whichrequires paired imaging and segmentation data. We test this approach on the first and third experiments,which represent the settings in which manual labels may be available. Specifically, we train the same 3DU-net architecture with real scans blurred to the target resolution, and using the same augmentation strategyas for our method. We emphasize that such methods are only applicable in more rare supervised settings, butthe performance of these networks provides an informative upper bound for the accuracy of PV-SynthSeg.We evaluate all methods on both the HR (“dense”) and the LR grid (“sparse”), obtained by downsamplingthe HR labels.
Figure 2.a shows the mean Dice scores for the T1-spacing experiment. PV-SynthSeg consistently outperformsSAMSEG by up to 6 Dice points, and is robust to large slice spacings: even at 9 mm, it yields competitiveDice scores (0.83 mean), both when evaluated densely and on the sparse slices. Comprehensive structure-wiseresults are shown in Fig. 3; they reveal that, with increasing slice spacing, accuracy decreases the most for thethin and convoluted cerebral cortex. This is also apparent from the example in Fig. 4 (red box, 1 × × Billot, Robinson, Dalca and Iglesias a b
T1-spacing (average across ROIs) FLAIR
PV-SynthSeg densePV-SynthSeg sparse PV-Syn. densePV-Syn. sparse
Fig. 2. (a) Box plot of Dice scores in T1-spacing experiment with 3, 6 and 9 mm spacing in coronal (co), axial (ax),and sagittal (sa) orientations, averaged over 12 representative ROIs: cerebral white matter (WM) and cortex (CT);lateral ventricle (LV); cerebellar white matter (CW) and cortex (CC); thalamus (TH); caudate (CA); putamen (PU);pallidum (PA); brainstem (BS); hippocampus (HP); and amygdala (AM). (b) Box plot of Dice scores for the 12 ROIsin the FLAIR experiment and their average (av).
Table 2.
Effect sizes (Cohen’s d ) and p values of non-parametric Wilcoxon tests, comparing the volumes of thehippocampal substructures in AD subjects vs. controls.Method CA1 CA23 CA4 Subiculum Molec. Layer WholeSupervised: d p < − < − < − < − < − < − PV-SynthSeg: d p < − < − < − < − < − < − Bayesian: d p < − < − < − < − < − < − Fig. 4 (green box), where the pallidum is pointed by the yellow arrow. Despite the fact that PV-SynthSeguses hyperparameters computed with SAMSEG, it successfully recovers the pallidum (Dice ≈ ≈ We have presented PV-SynthSeg, a novel learning-based segmentation method for brain MRI scans withPV effects. PV-SynthSeg can accurately segment most brain ROIs in scans with very large slice thickness,regardless of their contrast (even when previously unseen), and manages to replicate differential atrophypatterns in the hippocampus in an AD study. One general limitation of PV segmentation is the low accuracyfor the cortex at larger spacing, which precludes application to cortical thickness and parcellation analyses.In future work, we will tackle this problem by combining our approach with image imputation. PV-SynthSegenables morphometric analyses of very large clinical datasets of any modality, which has enormous potentialin the discovery of imaging biomarkers in a wide array of neurodegenerative disorders. artial Volume Segmentation of MRI Scans of any Resolution and Contrast 7
PV-SynthSeg densePV-SynthSeg sparse
Fig. 3.
Structure-wise box plots of Dice scores in T1-spacing experiment for each spacing and orientation, including12 ROIs and their average (av). Abbreviations: cerebral white matter (WM) and cortex (CT); lateral ventricle(LV); cerebellar white matter (CW) and cortex (CC); thalamus (TH); caudate (CA); putamen (PU); pallidum (PA);brainstem (BS); hippocampus (HP); and amygdala (AM).
Acknowledgement
Work supported by the European Research Council (Starting Grant 677697, project “BUNGEE-TOOLS”,awarded to JEI, as well as by the EPSRC-funded UCL Centre for Doctoral Training in Medical Imaging(EP/L016478/1) and the Department of Healths NIHR-funded Biomedical Research Centre at UCLH.
Billot, Robinson, Dalca and Iglesias
Ground truthPV-SynthSeg SAMSEG S up e r v . P V - S y n . B a y e s i a n Ground truth SupervisedPV-SynthSeg SAMSEG
N/A I npu t Supervised
Fig. 4.
Examples of dense segmentations. Red box: 1 × × Image Ground Truth PV-SynthSeg SAMSEG Supervised mm a x i a l mm c o r o n a l Fig. 5.
Two more examples from T1-spacing experiment.
Image Ground Truth PV-SynthSeg SAMSEG S a g i tt a l C o r o n a l Fig. 6.
Two more examples of FLAIR segmentations.artial Volume Segmentation of MRI Scans of any Resolution and Contrast 9
Image PV-SynthSeg Bayesian Supervised T T Fig. 7.
Close-up of coronal view of co-registered T1 (1 mm isotropic) and T2 scan (0.4 × × References
1. Alzheimer’s Disease Neuroimaging Initiative, http://adni.loni.usc.edu/
2. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., et al.: TensorFlow: A System for Large-Scale MachineLearning. In: OSDI 16. pp. 265–283 (2016)3. Akkus, Z., Galimzianova, A., Hoogi, A., Rubin, D., Erickson, B.: Deep Learning for Brain MRI Segmentation:State of the Art and Future Directions. Journal of Digital Imaging (4), 449–459 (2017)4. Arsigny, V., Commowick, O., Pennec, X., Ayache, N.: A Log-Euclidean Framework for Statistics on Diffeomor-phisms. In: MICCAI 2006. pp. 924–931 (2006)5. Ashburner, J., Friston, K.: Unified segmentation. NeuroImage , 839–851 (2005)6. Billot, B., Greve, D., Van Leemput, K., Fischl, B., Iglesias, J., Dalca, A.: A Learning Strategy for Contrast-agnostic MRI Segmentation. arXiv:2003.01995 (2020)7. Chaitanya, K., Karani, N., Baumgartner, C., Becker, A., Donati, O., Konukoglu, E.: Semi-supervised and Task-Driven Data Augmentation. In: Information Processing in Medical Imaging. pp. 29–41 (2019)8. Choi, H., Haynor, D., Kim, Y.: Partial volume tissue classification of multichannel magnetic resonance images-amixel model. IEEE Transactions on Medical Imaging (3), 395–407 (1991)9. Chollet, F.: Keras10. Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and Accurate Deep Network Learning by Exponential LinearUnits (ELUs). arXiv:1511.07289 [cs] (2016)11. Dalca, A.V., Balakrishnan, G., Guttag, J., Sabuncu, M.R.: Unsupervised learning of probabilistic diffeomorphicregistration for images and surfaces. Medical image analysis , 226–236 (2019)12. Dalca, A.V., Guttag, J., Sabuncu, M.R.: Anatomical priors in convolutional networks for unsupervised biomedicalsegmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9290–9299 (2018)13. Dalca, A.V., Yu, E.M., Golland, P., Fischl, B., Sabuncu, M., Iglesias, J.E.: Unsupervised deep learning forBayesian brain MRI segmentation. In: MICCAI (2019)14. Eaton-Rosen, Z., Bragman, F., Ourselin, S., Cardoso, M.J.: Improving Data Augmentation for Medical ImageSegmentation. In: International Conference on Medical Imaging with Deep Learning (2018)15. Fischl, B.: FreeSurfer. NeuroImage (2), 774–781 (2012)16. Fox, N., Warrington, E., Freeborough, P., Hartikainen, P., Kennedy, A., Stevens, J., Rossor, M.N.: Presymp-tomatic hippocampal atrophy in Alzheimer’s disease. A longitudinal MRI study. Brain: A Journal of Neurology , 2001–2007 (1996)17. Iglesias, J.E., Augustinack, J.C., Nguyen, K., Player, C.M., Player, A., Wright, M., Roy, N., Frosch, M.P., McKee,A.C., Wald, L.L., et al.: A computational atlas of the hippocampal formation using ex vivo, ultra-high resolutionmri: application to adaptive segmentation of in vivo mri. Neuroimage , 117–137 (2015)18. Iglesias, J.E., Insausti, R., Lerma-Usabiaga, G., Bocchetta, M., Van Leemput, K., Greve, D., van der Kouwe, A.,Fischl, B., Caballero-Gaudes, C., Paz-Alonso, P.: A probabilistic atlas of the human thalamic nuclei combiningex vivo MRI and histology. NeuroImage , 314–326 (2018)19. Jack, C., Petersen, R., Xu, Y., O’Brien, P.C., Smith, G., Ivnik, R., Boeve, B., Waring, S., Tangalos, E., Kokmen,E.: Prediction of AD with MRI-based hippocampal volume in mild cognitive impairment. Neurology (7),1397–1403 (1999)20. Jog, A., Hoopes, A., Greve, D., Van Leemput, K., Fischl, B.: PSACNN: Pulse sequence adaptive fast whole brainsegmentation. NeuroImage , 553–569 (2019)0 Billot, Robinson, Dalca and Iglesias21. Kamnitsas, K., Ledig, C., Newcombe, V., Simpson, J., Kane, A., Menon, D., Rueckert, D., Glocker, B.: Efficientmulti-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical Image Analysis , 61–78 (2017)22. Karani, N., Chaitanya, K., Baumgartner, C., Konukoglu, E.: A Lifelong Learning Approach to Brain MR Seg-mentation Across Scanners and Protocols. In: MICCAI. pp. 476–484 (2018)23. Laidlaw, D., Fleischer, K., Barr, A.: Partial-volume Bayesian classification of material mixtures in MR volumedata using voxel histograms. IEEE Transactions on Medical Imaging (1), 74–86 (1998)24. Long, M., Zhu, H., Wang, J., Jordan, M.I.: Deep transfer learning with joint adaptation networks. In: Proceedingsof the 34th International Conference on Machine Learning-Volume 70. pp. 2208–2217. JMLR. org (2017)25. Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: Fully Convolutional Neural Networks for Volumetric MedicalImage Segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV). pp. 565–571 (2016)26. Mueller, S.G., Yushkevich, P.A., Das, S., Wang, L., Van Leemput, K., Iglesias, J.E., Alpert, K., Mezher, A., Ng,P., Paz, K., et al.: Systematic comparison of different techniques to measure hippocampal subfield volumes inadni2. NeuroImage: Clinical , 1006–1018 (2018)27. Niessen, W., Vincken, K., Weickert, J., Romeny, B., Viergever, M.: Multiscale Segmentation of Three-DimensionalMR Brain Images. International Journal of Computer Vision (2), 185–202 (1999)28. Nocera, L., Gee, J.C.: Robust partial-volume tissue classification of cerebral MRI scans. In: Medical Imaging1997: Image Processing. vol. 3034, pp. 312–322. International Society for Optics and Photonics (1997)29. Noe, A., Gee, J.C.: Partial Volume Segmentation of Cerebral MRI Scans with Mixture Model Clustering. In:Information Processing in Medical Imaging. pp. 423–430 (2001)30. Patenaude, B., Smith, S., Kennedy, D., Jenkinson, M.: A Bayesian model of shape and appearance for subcorticalbrain segmentation. NeuroImage , 907–22 (2011)31. Puonti, O., Iglesias, J.E., Van Leemput, K.: Fast and sequence-adaptive whole-brain segmentation using para-metric Bayesian modeling. NeuroImage , 235–249 (2016)32. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. In:MICCAI. pp. 234–241 (2015)33. Shattuck, D., Sandor-Leahy, S., Schaper, K., Rottenberg, D., Leahy, R.M.: Magnetic Resonance Image TissueClassification Using a Partial Volume Model. NeuroImage (5), 856–876 (2001)34. Shin, H.C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M.: Deep convo-lutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transferlearning. IEEE transactions on medical imaging (5), 1285–1298 (2016)35. Van Leemput, K., Maes, F., Vandermeulen, D., Suetens, P.: Automated model-based tissue classification of MRimages of the brain. IEEE transactions on medical imaging (10), 897–908 (1999)36. Van Leemput, K., Maes, F., Vandermeulen, D., Suetens, P.: A unifying framework for partial volume segmentationof brain MR images. IEEE Transactions on Medical Imaging (1), 105–119 (2003)37. Winterburn, J., Pruessner, J., Chavez, S., Schira, M., Lobaugh, N., Voineskos, A., Chakravarty, M.: A novel invivo atlas of human hippocampal subfields using high-resolution 3 T magnetic resonance imaging. NeuroImage , 254–265 (2013)38. Yushkevich, P.A., Wang, H., Pluta, J., Das, S.R., Craige, C., Avants, B.B., Weiner, M.W., Mueller, S.: Nearlyautomatic segmentation of hippocampal subfields in in vivo focal T2-weighted MRI. Neuroimage (4), 1208–1224(2010)39. Zhang, Y., Brady, M., Smith, S.: Segmentation of brain MR images through a hidden Markov random field modeland the expectation-maximization algorithm. IEEE transactions on medical imaging20