[PDF] Direct White Matter Bundle Segmentation using Stacked U-Nets

Abstract

The state-of-the-art method for automatically segmenting white matter bundles in diffusion-weighted MRI is tractography in conjunction with streamline cluster selection. This process involves long chains of processing steps which are not only computationally expensive but also complex to setup and tedious with respect to quality control. Direct bundle segmentation methods treat the task as a traditional image segmentation problem. While they so far did not deliver competitive results, they can potentially mitigate many of the mentioned issues. We present a novel supervised approach for direct tract segmentation that shows major performance gains. It builds upon a stacked U-Net architecture which is trained on manual bundle segmentations from Human Connectome Project subjects. We evaluate our approach \textit{in vivo} as well as \textit{in silico} using the ISMRM 2015 Tractography Challenge phantom dataset. We achieve human segmentation performance and a major performance gain over previous pipelines. We show how the learned spatial priors efficiently guide the segmentation even at lower image qualities with little quality loss.

Full PDF

DDirect White Matter Bundle Segmentationusing Stacked U-Nets

Jakob Wasserthal, Peter F. Neher, Fabian Isensee, and Klaus H. Maier-Hein

Medical Image Computing Group,German Cancer Research Center (DKFZ), Heidelberg, Germany [email protected]

Abstract.

The state-of-the-art method for automatically segmentingwhite matter bundles in diﬀusion-weighted MRI is tractography in con-junction with streamline cluster selection. This process involves longchains of processing steps which are not only computationally expensivebut also complex to setup and tedious with respect to quality control.Direct bundle segmentation methods treat the task as a traditional im-age segmentation problem. While they so far did not deliver competitiveresults, they can potentially mitigate many of the mentioned issues. Wepresent a novel supervised approach for direct tract segmentation thatshows major performance gains. It builds upon a stacked U-Net archi-tecture which is trained on manual bundle segmentations from HumanConnectome Project subjects. We evaluate our approach in vivo as wellas in silico using the ISMRM 2015 Tractography Challenge phantomdataset. We achieve human segmentation performance and a major per-formance gain over previous pipelines. We show how the learned spatialpriors eﬃciently guide the segmentation even at lower image qualitieswith little quality loss.

Keywords:

Diﬀusion MRI, Fiber Bundle Segmentation, Deep Learning

Diﬀusion MRI is an important tool in the study of the brain’s white matter withtractography being the state of the art method for reconstructing white mat-ter pathways. Tractography typically results in thousands of streamlines thatrequire ﬁltering for false positive removal and generation of anatomically mean-ingful bundle segmentations [14, 12]. In this context, streamline selection canbe performed using diﬀerent approaches. Interactively drawn regions of inter-est (ROIs) are very commonly used but are quite time consuming and requireexperts. Small changes alter the resulting bundles signiﬁcantly and limit re-producibility across subjects and human experts [21, 18]. Automation can beachieved using atlas-guided approaches [17, 7] or selection schemes based on graymatter parcellations [22]. Atlas information can as well be directly integrated intothe tractography process in form of prior information [23, 10, 2]. Such automatedapproaches involve quite complex registration and/or segmentation procedures,i.e. registration of T1-weighted and diﬀusion-weighted images, registration across

Under review as a conference paper at MICCAI 2017. a r X i v : . [ c s . C V ] M a r ndividuals and segmentation of T1-weighted images. In combination with thetractography itself they add up to long chains of processing steps that are dif-ﬁcult to oversee and conﬁgure and that involve computing-intensive operationsin streamline space. Direct bundle segmentation methods potentially solve many of these issues bycircumventing tractography and treating the task as a traditional image seg-mentation problem. Previous studies applied level sets, active contours or re-lated markov random ﬁeld-based methods to this problem [11, 5, 8, 9, 4]. Thesemethods require the speciﬁcation of initial ROIs corresponding to speciﬁc tractsof interest. Prior information about the tracts can be incorporated using atlasinformation [1, 13] or deformable tract templates [3].

Supervised learning has shown great potential in the derivation of spatial priorsfrom training data, while avoiding the issues that atlas based methods encom-pass. However, this potential has been barely explored yet when it comes todirect bundle segmentation. Ratnarajah et al. published, to our knowledge, theonly study towards this direction, using k-NN based classiﬁcation in Riemanniandiﬀusion-tensor spaces to label white matter ﬁber bundles in neonatal brain im-ages [15].We propose a novel approach for direct white matter bundle segmentationusing supervised learning on basis of a stacked U-Net architecture that receivesthe peaks of the ﬁber orientation distribution functions (fODF) as input. Thisresults in a very detailed 3D model for each bundle and does not rely on longchains of processing steps involving atlas based priors or ﬁber tractography. Totrain our model we interactively [18] labeled 30 subjects from the Human Con-nectome Project (HCP), yielding a high quality dataset of the subject speciﬁcmorphology of these bundles. We evaluate our approach qualitatively and quan-titatively on high and low quality in vivo data and show that it is capable ofsegmenting larger and even very thin bundles with high quality in a couple ofseconds. Moreover, we compare our approach to the results of the ISMRM 2015Tractography Challenge [12] and to segmentation results previously reported inthe literature. On in vivo data our method reaches human performance, thusproviding an eﬃcient and precise solution for automatic segmentation of whitematter ﬁber bundles.

Model

Our convolutional neural network is based on the U-Net architecture[16]. Because of the high resolution of our data it would not be memory eﬃcientto use the entire 3D image as input. Thus we propose using a set of 2D slices asinput. We train three individual networks, one for each spatial axis x, y and z.Training slices for the three networks are thus sampled from the x-y, y-z, and z-xplane, respectively. This results in three predictions per voxel. We concatenatethese predictions to create a new image with three channels, on which we traina fourth U-Net that yields the ﬁnal prediction.We considered several diﬀerent types of input features for our network. Sinceusing the raw image values or parametric representations of the signal, such as ig. 1.

Overview of our pipeline: (1) Generation of training data by manual ﬁlteringof tractograms and generation of features, (2) Training stacked U-Net, (3) Final modelfor segmentation of bundles spherical harmonics coeﬃcients, would result in a very large number of featuresand a successively very high memory demand, we decided to use the voxel-wiseprincipal ﬁber direction vector-elements as features. This reduces the number offeatures while preserving important information about the local white matterstructure, e.g. about crossing ﬁbers. To obtain the principal ﬁber directions, weused the constrained spherical deconvolution and peak extraction available inMRtrix [6, 19] with a maximum number of three peaks per voxel. This results ina total number of nine features per voxel. The feature images are processed sliceby slice, where each slice has nine channels corresponding to the nine features. Abinary segmentation of the bundle of interest serves as the training target. Theoutput of our network is the probability for each voxel to belong to a certainbundle.

Training

The following hyperparameters were used to train the network: alearning rate of 0.002, a batch size of 8, 70 training epochs and a dropout prob-ability of 0.4. The learning rate is decreased by 3% per epoch. As loss functionwe chose the categorical crossentropy. Because of the great class imbalance (theﬁber bundle makes up only a small part of the image) we weight the loss of bun-dle voxels higher than the loss of non-bundle voxels. The weighting correspondsto the inverse class frequency. This guides the learning process to focus on thefewer bundle voxels. This weighting is linearly decreased over the epochs. Theinput images were normalized to zero mean and a standard deviation of one.All hyperparameters were optimized on an independent validation datasets. Thenetwork weights of the epoch with the highest Dice score were used for testing.

Datasets

Based on the results of the ISMRM tractography challenge, the orga-nizers assigned each of the 25 tracts used during the challenge to one of threediﬃculty groups (“medium”, “hard” and “very hard”) [12]. To evaluate our ap-proach we trained it separately for one tract out of each category: left superiorongitudinal fasciculus (SLF) (“medium”), the right corticospinal tract (CST)(“hard”) and anterior commissure (CA) (“very hard”).

HCP highRes:

In our in-vivo experiments we used 30 subjects of the HumanConnectome Project. The HCP diﬀusion weighted images were acquired with1.25mm isotropic resolution and 270 gradient directions with three b-values( b = 1000 s/mm , b = 2000 s/mm , b = 3000 s/mm ). The reference segmen-tations of the three tracts needed for training and testing of our approach werecreated manually using the following pipeline: (1) performing standard wholebrain tractography of all subjects using MRtrix [20], (2) manual extraction ofthe desired tracts using ROIs following a white matter atlas [18], (3) conversion ofthe resulting ﬁber bundles into binary segmentation images. Fiber tractographywas performed using MRtrix multi-tissue constrained spherical deconvolution(CSD) of multi-shell data [6] and anatomically-constrained probabilistic stream-line tractography [20], yielding 1 million ﬁbers with a minimum ﬁber length of 80mm per subject. The other parameters were kept at their default values. Fromthe fODFs we extract three peaks as input features for our model using MRtrix sh2peaks . HCP lowRes:

To test our method in a clinically more plausible setting and to becomparable to the ISMRM Tractography Challenge dataset, we sampled the highresolution dataset to 2mm isotropic resolution and removed all b = 2000 s/mm and b = 3000 s/mm gradient directions. From the remaining gradient directionswe sampled 32 gradients evenly distributed over the entire sphere. For estimat-ing the fODFs we use MRtrix constrained spherical deconvolution [19] and alsoextract three peaks as input features. ISMRM Challenge:

We also test our model on the ISMRM Tractography Chal-lenge phantom. The phantom was denoised and corrected for distortions usingMRtrix. fODFs and peaks were extracted the same way as for HCP lowRes.

Experiments

We performed three experiments: (1) Training and testing onHCP highRes, (2) Training and testing on HCP lowRes and (3) Training onHCP lowRes and testing on the ISMRM Tractography Challenge phantom.We split our dataset into 20 train subjects, 5 validation subjects for hyperpa-rameter optimization and 5 test subjects for ﬁnal evaluation. As metric we usethe Dice score. To compare our model to human performance a second expertrater segmented the test subjects. The inter-rater variations of the two expertsare called “Human” performance in section 3. We show results for our stackedU-Net architecture and compare to a plain U-Net that is only trained on 2Dslices from one plain.

In vivo

On the HCP high resolution dataset (HCP highRes) our method achieveshuman performance on all bundles. On the CST und SLF it even outperformsthe human results by a small margin (Figure 2). Our stacked U-Net architectureshows a clear performance improvement compared to a plain U-Net, especiallyn the low resolution data. On low resolution data the performance of our modeldecreases only slightly. This makes it very promising for clinical settings, whereno high resolution data is available. The qualitative evaluation conﬁrms that ourmodel manages very well to segment the ground truth shapes (Figure 3).

Fig. 2.

Results of a plain U-Net and our proposed stacked U-Net on the HCP high andlow resolution datasets in comparison to human performance (second rater).

Fig. 3.

Ground truth (green) and prediction of our model (red) on a randomly chosensubject from the test set of the HCP high resolution dataset.

Phantom

To evaluate our model against the phantom of the ISMRM Tractog-raphy Challenge we trained the model on the HCP lowRes dataset and testedit on the phantom. We also tested it on a version of the phantom that containsno simulated artifacts and noise (“no artifacts”). Figure 4 shows the results.n the CST our model is part of the top three submission and is able to ﬁndthe lateral projections of the CST, which other methods often miss. On theSLF our model shows competitive results. On the CA our model is not able toproperly reconstruct the entire bundle. However, it ﬁnds more than most of thechallenge submissions. As expected on the “no artifacts” phantom we achieveslightly better results.

Fig. 4.

Results on the ISMRM 2015 Tractography Challenge phantom.

Proposed (“noartifacts”) : Our model on phantom without artifacts.

Proposed : Our model on originalphantom.

Challenge submissions : Submissions of all the challenge participants. Forbetter visualization we use a small random displacement in x direction in the plot.

We presented a new stacked U-Net architecture for direct segmentation of anatom-ically meaningful white matter bundles. With this direct method we avoid errorsbeing accumulated by using long processing chains involving atlas registrationsand ﬁber tractography, where each step is an additional source of inaccuracy.Furthermore, this enables us to process a whole brain in less than ﬁve secondsusing a recent GPU, which is very fast compared to many other commonly usedmethods for ﬁber bundle segmentation or tractography.

In vivo we show that our method achieves near-human or even better-than-human performance and that the segmentation quality decreases only slightlyeven when using data of much lower quality. Furthermore, we show that ourapproach of stacked U-Nets with diﬀerent spatial specializations improves theoverall segmentation results signiﬁcantly compared to a single U-Net.A comparison to segmentation results previously reported in the literature isdiﬃcult due to the diversity of employed datasets, analyzed anatomical struc-tures and employed evaluation metrics. In general it is worth noting that otherworks focused on quite large and prominent bundles, such as the arcuate fasci-culus, corticospinal tract (CST), inferior fronto-occipital fasciculus or the fornix[10, 15], while our approach also yields very good results for thin anatomicalstructures such as the anterior commissure. When comparing our Dice scoresor singular anatomical structures we can see that our method outperforms pre-viously reported results: right CST (0 .

84 [proposed] vs < . ∼ .

72 [15], < .

65 [10], < .

51 [10]) and left SLF (0 .