[PDF] An Auxiliary Task for Learning Nuclei Segmentation in 3D Microscopy Images

Abstract

Segmentation of cell nuclei in microscopy images is a prevalent necessity in cell biology. Especially for three-dimensional datasets, manual segmentation is prohibitively time-consuming, motivating the need for automated methods. Learning-based methods trained on pixel-wise ground-truth segmentations have been shown to yield state-of-the-art results on 2d benchmark image data of nuclei, yet a respective benchmark is missing for 3d image data. In this work, we perform a comparative evaluation of nuclei segmentation algorithms on a database of manually segmented 3d light microscopy volumes. We propose a novel learning strategy that boosts segmentation accuracy by means of a simple auxiliary task, thereby robustly outperforming each of our baselines. Furthermore, we show that one of our baselines, the popular three-label model, when trained with our proposed auxiliary task, outperforms the recent StarDist-3D. As an additional, practical contribution, we benchmark nuclei segmentation against nuclei detection, i.e. the task of merely pinpointing individual nuclei without generating respective pixel-accurate segmentations. For learning nuclei detection, large 3d training datasets of manually annotated nuclei center points are available. However, the impact on detection accuracy caused by training on such sparse ground truth as opposed to dense pixel-wise ground truth has not yet been quantified. To this end, we compare nuclei detection accuracy yielded by training on dense vs. sparse ground truth. Our results suggest that training on sparse ground truth yields competitive nuclei detection rates.

Full PDF

AAn Auxiliary Task for Learning Nuclei Segmentation in 3DMicroscopy Images

Peter Hirsch [email protected]

Dagmar Kainmueller [email protected]

Berlin Institute of Health /Max-Delbrueck-Center for Molecular Medicine in the Helmholtz Association

Abstract

Segmentation of cell nuclei in microscopy images is a prevalent necessity in cell biol-ogy. Especially for three-dimensional datasets, manual segmentation is prohibitively time-consuming, motivating the need for automated methods. Learning-based methods trainedon pixel-wise ground-truth segmentations have been shown to yield state-of-the-art resultson 2d benchmark image data of nuclei, yet a respective benchmark is missing for 3d imagedata. In this work, we perform a comparative evaluation of nuclei segmentation algorithmson a database of manually segmented 3d light microscopy volumes. We propose a novellearning strategy that boosts segmentation accuracy by means of a simple auxiliary task,thereby robustly outperforming each of our baselines. Furthermore, we show that one ofour baselines, the popular three-label model, when trained with our proposed auxiliarytask, outperforms the recent

StarDist-3D .As an additional, practical contribution, we benchmark nuclei segmentation againstnuclei detection , i.e. the task of merely pinpointing individual nuclei without generatingrespective pixel-accurate segmentations. For learning nuclei detection, large 3d trainingdatasets of manually annotated nuclei center points are available. However, the impact ondetection accuracy caused by training on such sparse ground truth as opposed to densepixel-wise ground truth has not yet been quantiﬁed. To this end, we compare nucleidetection accuracy yielded by training on dense vs. sparse ground truth. Our results suggestthat training on sparse ground truth yields competitive nuclei detection rates.

Keywords: machine learning, image analysis, instance segmentation, instance detection,nuclei segmentation, auxiliary training task

1. Introduction

Manual segmentation of nuclei in three-dimensional microscopy images is tedious. Hence 3dbenchmark image data with pixel-wise ground truth labelings are barely available. Instead,3d benchmark image data predominantly come with sparse ground truth annotations in theform of nuclei center point locations (see e.g. (Ulman et al., 2017)). Consequently, methodsfor learning 3d nuclei segmentation based on pixel-wise, dense ground truth labelings remainunderstudied to date. Related work is focused on learning nuclei detection based on thegiven sparse ground truth (Ulman et al., 2017; H¨ofener et al., 2018), or for 3d segmentationon leveraging very small sets of 2d slices with dense labelings (C¸ i¸cek et al., 2016) or boundingboxes together with a low number of pixel-wise annotated instances (Zhao et al., 2018).

1. This is opposed to large fully annotated 2d benchmark datasets, see e.g. image set BBBC038v1 availablefrom the Broad Bioimage Benchmark Collection (Ljosa et al., 2012) c (cid:13) P. Hirsch & D. Kainmueller. a r X i v : . [ c s . C V ] F e b irsch Kainmueller Figure 1: Top: Exemplary slice of an image in the dataset. Densely packed nuclei in thenervous system of the

C. elegans

L1 larva (towards the left) are particularly hardto separate, in some cases even by eye. Bottom: Close-up on said nervous system.In this work, we employ a densely annotated dataset of 28 three-dimensional microscopyimages described in (Long et al., 2009), each containing hundreds of nuclei, to benchmark3d nuclei segmentation methods trained on dense pixel-wise ground truth. Figure 1 showsan exemplary image slice. We focus on methods that yield state-of-the-art results fornuclei segmentation in 2d data, and have been established for generic 3d segmentationapplications. In particular, we focus on U-Net based architectures (Ronneberger et al.,2015; C¸ i¸cek et al., 2016) for pixel-wise training and prediction. Furthermore, we compareour results to (Weigert et al., 2019) who recently reported results on the same dataset.Nuclei segmentation is an instance segmentation problem, where the challenge is toseparate object instances that appear in dense clusters. Existing U-Net based instancesegmentation approaches that are directly applicable to 3d nuclei segmentation rely onclassiﬁcation of boundary pixels (Chen et al., 2016; Guerrero-Pena et al., 2018; Caicedoet al., 2019; Falk et al., 2019), classiﬁcation of boundary edges between pixels (Funke et al.,2018), or regression of signed distances to boundary pixels (Heinrich et al., 2018). Thesemethods have in common the property that small variations in the predictions at individualpixels can lead to drastic changes in the resulting instance segmentation; e.g., misclassifyinga single boundary pixel as foreground can lead to a false merge of two nuclei.In this work, we propose a novel auxiliary task that is designed to address this issueduring training. In particular, we propose to train for predicting vectors pointing to thecenter of mass of the containing nucleus as an auxiliary task in addition to training forpredicting the usual output variables. Note that (Li et al., 2018) train similar vectors,however not as an auxiliary task but in a split network branch, the output of which is usedfor additional post-processing. We show in an evaluation on 28 3d microscopy images thattraining for the proposed auxiliary task boosts the accuracy of each underlying baselinemodel. Furthermore, our baseline three-label model (Caicedo et al., 2019), when trained

2. We thank the authors of (Long et al., 2009) for providing the data. n Auxiliary Task for Learning Nuclei Segmentation in 3D Microscopy Images with the proposed auxiliary task, outperforms the proposal-based approach StarDist-3D (Weigert et al., 2019) in terms of average AP.In addition to our methodological contribution, our work also provides a practical contri-bution. Due to the lack of densely annotated benchmark 3d nuclei images, to our knowledge,the impact of training on dense vs. sparse ground truth for 3d nuclei detection has not yetbeen quantiﬁed. In this work we ﬁll this gap by contributing a quantitative comparisonof densely- vs. sparsely trained 3d nuclei detection. In particular, we compare a state-of-the-art sparsely trained method that regresses Gaussian blobs of ﬁxed radius around centerpoints (H¨ofener et al., 2018) to the densely trained U-Net based architectures consideredin the course of our methodological contribution. Our evaluation shows that training onsparse annotations in the form of nuclei center points yields competitive detection rates enpar with baseline segmentation methods, but is outperformed by models trained with ourauxiliary task.In summary, we contribute (1) a novel auxiliary training task that boosts the accuracy ofunderlying baseline models for 3d nuclei segmentation, and (2) a quantitative comparison ofdensely- vs. sparsely trained methods for nuclei detection. Our code for all compared meth-ods is available on https://github.com/Kainmueller-Lab/aux_cpv_loss/tree/arxiv .

2. Method

Methods for pixel- or edge-wise boundary prediction share the property that they rely on asmall number of training pixels for learning to distinguish densely packed clusters of objectsfrom larger, solitary objects, as illustrated in Figure 2B. Existing methods approach thisissue by up-weighting the boundary class with a ﬁxed factor (Caicedo et al., 2019), by adap-tive up-weighting of boundary misclassiﬁcations that cause large segmentation errors (Funkeet al., 2018), or by phrasing the boundary prediction task as a regression problem on the(signed) Euclidean distance transform to the boundary pixels (ﬁltered by some smoothcapping function) (Bai and Urtasun, 2016; Heinrich et al., 2018).Instead, we propose a simple auxiliary training task that does not require any weightingscheme nor any architectural changes and is nevertheless able to focus the training onreducing misclassiﬁcations that induce large segmentation errors. Speciﬁcally, we propose toregress vectors that point from each foreground pixel to the center of mass of the respectivenucleus (see Figure 2C). We employ a sum of squared diﬀerences loss on these center pointvectors, and add that to the main pixel classiﬁcation or regression loss. Thus, in case acluster of objects is mistaken for a single object or vice-versa, all foreground pixels of theinvolved object(s) contribute to form a large loss, instead of just a small subset of (up-weighted) boundary pixels as in (Caicedo et al., 2019; Funke et al., 2018), or pixels close tothese as in (Heinrich et al., 2018).We found our auxiliary task to be most beneﬁcial when training for absolute vectors tocenter points instead of unit vectors pointing towards center points. Note that unit vectorspointing away from object boundaries have been considered with a similar motivation forinstance segmentation in 2d natural images (Bai and Urtasun, 2016), yet not as auxiliarytask but to pre-train part of a model that regresses the distance transform to boundarypixels. Training for absolute vectors is facilitated in our case of nuclei segmentation becauseof the relatively small size and roughly ellipsoidal shape of nuclei. irsch Kainmueller (A) (B) (C) (D)Exemplary nuclei Boundary label Center pointvectors PredictionFigure 2: Illustration of the beneﬁt of center point vectors as auxiliary training task. 1strow: Image detail focused on a single nucleus. 2nd row: Two densely packednuclei, hard to distinguish by eye. (A) Raw image details. (B) Sketch of boundarypixels. (C) Sketch of a few exemplary center point vectors. While boundarylabels distinguish between the presence of two vs. one nucleus at only the fewpixels where two nuclei touch, center point vectors exhibit large diﬀerences atall foreground pixels. (D) Center point vectors per pixel predicted as auxiliaryvariables by our proposed model (3d, only the x-component is shown here).Altogether, the proposed auxiliary task is easily integrated into standard end-to-endtrainable models, with the only required architectural change being three additional outputchannels (one for each component of the 3d center point vectors). The one other trainingcomponent that is aﬀected by integrating our auxiliary task is elastic augmentation, wherewe generate training center point vectors on the ﬂy. This can be done eﬃciently so thattraining performance is not aﬀected.

3. Experiments

We evaluate our methods and baselines on a set of 28 3d confocal microscopy images of wildtype

C. elegans at the L1 larval stage, as described in (Long et al., 2009). The nuclei of allcells were stained with DAPI, and all nuclei that can be distinguished by eye were segmentedmanually. Each 3d image captures a single

C. elegans larva. Due to the stereotypical natureof

C. elegans , this amounts to about 558 nuclei per image. All images have a near-isotropicvoxel size of 0 . × . × . µm and an average size of 140 × × n Auxiliary Task for Learning Nuclei Segmentation in 3D Microscopy Images

1. A model with a single scalar output regressing the signed Euclidean distance transformto the boundary pixels ﬁltered by a tanh function, with sum of squared diﬀerencesloss( sdt , see (Heinrich et al., 2018))2. A 3-label model trained to classify background, foreground, and boundary pixels witha softmax cross entropy loss ( , see (Caicedo et al., 2019))3. An edge aﬃnity model for pairwise, direct neighbor aﬃnities with binary cross entropyloss ( aﬃnities , see (Fowlkes et al., 2003; Funke et al., 2018)).We denote training with our auxiliary task of regressing nuclei center point vectors as +cpv . We use a 3 layer 3d U-Net backbone architecture in all experiments, with 10 ﬁl-ters at the ﬁrst layer, and a ﬁlter increase factor of 4. All scenarios are trained withthe identical network conﬁguration and hyper-parameters. We use standard elastic, inten-sity, ﬂipping/transposing and rotation augmentations and the Adam optimizer with defaultparameters (Kingma and Ba, 2014). For more details see Appendix B. To obtain segmenta-tions, we form a topographic map from the outputs of our respective models and employ aseed threshold to locate basins. These are used as seeds in an oﬀ-the-shelf watershed trans-form (Coelho, 2013) and grown until a foreground threshold is reached. See Appendix Cfor more details on post-processing.For the nuclei detection task we evaluate the above models with respect to their detectionperformance, and add a purely detection-based method, namely a model with a singlescalar output for regressing Gaussian blobs around center point annotations with sum ofsquared diﬀerences loss ( gauss , see (H¨ofener et al., 2018)). The Gaussian regression scenariorequires post-processing of the predicted maps to extract center point locations. We use non-maximum suppression (nms) to locate local minima above a threshold, with an additionalhyper-parameter for the window size.We use the validation set to determine the number of training epochs and the post-processing hyper-parameters for each scenario individually. For details on the resultinghyper-parameters see Appendix D. All results are averaged over the test set over threeindependently trained and validated models.As error metric we use the precision metric used in the kaggle 2018 data science bowl : AP = T PT P + F P + F N . For pixel-wise segmentation of nuclei, we evaluate AP at a rangeof Intersection over Union (IoU) thresholds. For nuclei center point detection, we deﬁneTP, FP and FN as follows: A true positive detection is a center point that lies within aground truth label and only one such center point counts per label. All other center pointdetections count as false positives, and ground truth labels that do not contain any centerpoint detection count as false negatives.Table 1 and Figure 3 list our quantitative evaluation of segmentation results on thetest set. An extended table can be found in Appendix C. Table 2 and Figure 4 list ourquantitative evaluation of nuclei detection results. Figure 5 shows exemplary results.With regards to nuclei segmentation trained on dense ground truth, our proposed auxil-iary task boosts segmentation accuracy by up to more than 4% in terms of AP . . Detectionaccuracy improves by around 1% – 2% in terms of AP when compared to the respectivebaselines. As for nuclei detection trained on sparse ground truth, the Gaussian regression irsch Kainmueller AP avAP AP . AP . AP . AP . AP . AP . AP . AP . AP . StarDist-3D 0.628 0.936 0.926 0.905 AP = T PT P + F P + F N ) for multiple intersection over union (IoU) thresholds.

StarDist-3D results from (Weigert et al., 2019). Numbers at all IoU thresholds corroboratethe ﬁnding that the models trained with our proposed auxiliary task outperformtheir respective baseline. The best model (3-label +cpv) outperforms the moresophisticated

StarDist-3D model in terms of avAP and in terms of some, butnot all, IoU thresholds.

IoU threshold A v e r a g e P r e c i s i o n A P sdtsdt+cpv IoU threshold A v e r a g e P r e c i s i o n A P IoU threshold A v e r a g e P r e c i s i o n A P affinitiesaffinities+cpv Figure 3: Quantitative evaluation of nuclei segmentation results: Each plot shows the base-line model and the respective model trained with the auxiliary task (+cpv). Theshaded area indicates the performance of the respective best and worst perform-ing model of three independent runs. The +cpv models consistently outperformthe base models. Best viewed on screen with zoom. See Table 1 for numbers. n Auxiliary Task for Learning Nuclei Segmentation in 3D Microscopy Images metric sdt sdt+cpv 3-label 3-label+cpv aﬀ. aﬀ.+cpv gaussdetection AP AP . AP . approximately coincides with detection AP. sdt sdt+cpv affinities affinities+cpv 3-class 3-class+cpv gauss Method A v e r a g e P r e c i s i o n A P Figure 4: Quantitative evaluation of nuclei detection results. The height of each box showsthe average detection performance of three independently trained models (see Ta-ble 2 for numbers), the black bars indicate the performance of the respective bestand worst performing model. The purely detection-based Gaussian regressionmodel is en par with the baseline models, but is outperformed by models trainedwith the auxiliary task. irsch Kainmueller ( a )( b )( c )Figure 5: Exemplary results of the head including the nervous system of one worm of thetest set using our model. Grayscale: Maximum intensity projectionof raw 3d data. True positives: Blue: Ground truth. Green: Prediction. Cyan:Overlay. Others: Red: False positives. Yellow: False negatives. a) Detection re-sults: Most nuclei are detected and localized precisely. b) True positive instances:Only a few pixels at the borders in blue or green, the rest cyan, indicating thematch of prediction and ground truth. c) False positive and false negative in-stances: Most cells are detected, indicated by the high overlap of yellow and redand the missing red and yellow dots in a), however the IoU overlap is too smallfor some to count as correct segmentations. (see Figure 6 (appendix) for wholeworms, visualized with napari (napari contributors, 2019), best viewed on screenwith zoom and in color)model keeps up surprisingly well and shows comparable performance to the baseline models.It is outperformed though by around 1% – 3% by the models trained with our the auxiliarytask. These gains are achieved without actually using the predicted vectors to separateclusters of nuclei in the post-processing, just training with the auxiliary task improves thequality of the main prediction. n Auxiliary Task for Learning Nuclei Segmentation in 3D Microscopy Images

4. Conclusion

We have proposed an auxiliary task to be used for training deep ConvNets for cell nucleisegmentation in microscopy images. On a database of 28 3d microscopy images, we showthat training with this auxiliary task consistently improves performance in terms of nucleidetection and segmentation accuracy. Our proposed auxiliary task is simple and easy tointegrate into existing deep learning based 3d segmentation frameworks.Furthermore, we have shown in a quantitative comparison that nuclei center point de-tection shows competitive performance when trained on sparse center point ground truthannotations as compared to training on dense ground truth labels.

Acknowledgments

We thank the authors of (Long et al., 2009) for providing the 3d nuclei data. P.H. and D.K.were funded by the Berlin Institute of Health and the Max-Delbrueck-Center for MolecularMedicine in the Helmholtz Association. P.H. was funded by HFSP grant RGP0021/2018-102. P.H. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.

References

Min Bai and Raquel Urtasun. Deep watershed transform for instance segmentation.

CoRR ,abs/1611.08303, 2016.Juan C Caicedo, Jonathan Roth, Allen Goodman, Tim Becker, Kyle W Karhohs, Matthieu Broisin,Molnar Csaba, Claire McQuin, Shantanu Singh, Fabian Theis, and Anne Carpenter. Evaluationof deep learning strategies for nucleus segmentation in ﬂuorescence images.

BioRxiv , page 335216,2019.Hao Chen, Xiaojuan Qi, Lequan Yu, and Pheng-Ann Heng. Dcan: deep contour-aware networksfor accurate gland segmentation. In

Proceedings of the IEEE conference on Computer Vision andPattern Recognition , pages 2487–2496, 2016.¨Ozg¨un C¸ i¸cek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, and Olaf Ronneberger. 3d u-net: Learning dense volumetric segmentation from sparse annotation. In Sebastien Ourselin, LeoJoskowicz, Mert R. Sabuncu, Gozde Unal, and William Wells, editors,

Medical Image Comput-ing and Computer-Assisted Intervention – MICCAI 2016 , pages 424–432, Cham, 2016. SpringerInternational Publishing. ISBN 978-3-319-46723-8.Luis Pedro Coelho. Mahotas: Open source software for scriptable computer vision.

Journal of OpenResearch Software , 1, July 2013. doi: http://dx.doi.org/10.5334/jors.ac.Thorsten Falk, Dominic Mai, Robert Bensch, ¨Ozg¨un C¸ i¸cek, Ahmed Abdulkadir, Yassine Marrakchi,Anton B¨ohm, Jan Deubner, Zoe J¨ackel, Katharina Seiwald, et al. U-net: deep learning for cellcounting, detection, and morphometry.

Nature methods , 16(1):67, 2019.C. Fowlkes, D. Martin, and J. Malik. Learning aﬃnity functions for image segmentation: combiningpatch-based and gradient-based approaches. In , volume 2, pages II–54, June 2003.doi: 10.1109/CVPR.2003.1211452. 9 irsch Kainmueller

Jan Funke, Fabian David Tschopp, William Grisaitis, Arlo Sheridan, Chandan Singh, StephanSaalfeld, and Srinivas C Turaga. Large scale image segmentation with structured loss baseddeep learning for connectome reconstruction.

IEEE transactions on pattern analysis and machineintelligence , 2018.Fidel A Guerrero-Pena, Pedro D Marrero Fernandez, Tsang Ing Ren, Mary Yui, Ellen Rothenberg,and Alexandre Cunha. Multiclass weighted loss for instance segmentation of cluttered cells. In , pages 2451–2455. IEEE,2018.Larissa Heinrich, Jan Funke, Constantin Pape, Juan Nunez-Iglesias, and Stephan Saalfeld. Synapticcleft segmentation in non-isotropic volume electron microscopy of the complete drosophila brain.In

International Conference on Medical Image Computing and Computer-Assisted Intervention ,pages 317–325. Springer, 2018.Henning H¨ofener, Andr´e Homeyer, Nick Weiss, Jesper Molin, Claes F Lundstr¨om, and Horst KHahn. Deep learning nuclei detection: A simple approach can deliver state-of-the-art results.

Computerized Medical Imaging and Graphics , 70:43–52, 2018.Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980 , 2014.Y. Li, X. Bian, M. Chang, L. Wen, and S. Lyu. Pixel oﬀset regression (por) for single-shot instancesegmentation. In , pages 1–6, Nov 2018. doi: 10.1109/AVSS.2018.8639428.Vebjorn Ljosa, Katherine L Sokolnicki, and Anne E Carpenter. Annotated high-throughput mi-croscopy image sets for validation.

Nature methods , 9(7):637–637, 2012.Fuhui Long, Hanchuan Peng, Xiao Liu, Stuart K Kim, and Eugene Myers. A 3d digital atlas of c.elegans and its application to single-cell analyses.

Nature methods , 6(9):667, 2009.napari contributors. napari: a multi-dimensional image viewer for python, 2019. URL https://github.com/napari/napari .O. Ronneberger, P.Fischer, and T. Brox. U-net: Convolutional networks for biomedical image seg-mentation. In

Medical Image Computing and Computer-Assisted Intervention (MICCAI) , volume9351 of

LNCS , pages 234–241. Springer, 2015. URL http://lmb.informatik.uni-freiburg.de/Publications/2015/RFB15a . (available on arXiv:1505.04597 [cs.CV]).Vladim´ır Ulman, Martin Maˇska, Klas EG Magnusson, Olaf Ronneberger, Carsten Haubold, NathalieHarder, Pavel Matula, Petr Matula, David Svoboda, Miroslav Radojevic, et al. An objectivecomparison of cell-tracking algorithms.

Nature methods , 14(12):1141, 2017.Martin Weigert, Uwe Schmidt, Robert Haase, Ko Sugawara, and Gene Myers. Star-convex polyhedrafor 3d object detection and segmentation in microscopy. arXiv:1908.03636 , 2019.Zhuo Zhao, Lin Yang, Hao Zheng, Ian H Guldner, Siyuan Zhang, and Danny Z Chen. Deep learningbased instance segmentation in 3d biomedical images using weak annotation. In

InternationalConference on Medical Image Computing and Computer-Assisted Intervention , pages 352–360.Springer, 2018. 10 n Auxiliary Task for Learning Nuclei Segmentation in 3D Microscopy Images

Appendix A. Visualization ( a )( b )( c )Figure 6: Exemplary results of the worst and best test images using our model.Grayscale: MIP of raw 3d data. True positives: Blue: Ground truth. Green: Pre-diction. Cyan: Overlay. Others: Red: False positives. Yellow: False negatives.a) Detection results: Most false positive or false negative detections are in thefront part of the worm, especially in the densely packed nervous system. b) Truepositive instances: Only a few pixels at the borders in blue or green, the restcyan, indicating the match of prediction and ground truth. c) False positive andfalse negative instances: Most cells are detected, however the IoU overlap is toosmall for some to count as correct segmentations. (visualized with napari (naparicontributors, 2019), best viewed on screen with zoom and in color) irsch Kainmueller Appendix B. Network details

We use the identical 3d U-net architecture for all models. The network has 10 featuremaps after the ﬁrst convolution and has three layers of 2 convolutions each. After each 2 × down/upsampling the number of feature maps is in/decreased four-fold. We use 3 × × −

4. We do notweigh the diﬀerent loss components (main loss and auxiliary task loss), with the exceptionof the sdt model trained with the auxiliary task. To bring the components to a reasonablysimilar magnitude we used a factor of 100 for the sdt loss; however, we did not optimizethis factor.The sdt model has a single output and uses the sum of squared diﬀerences loss function.The aﬃnity model has four outputs, three for the three direct neighbors (aﬃnities) and onefor a separate foreground/background segmentation and uses the binary cross entropy losswith a sigmoid activation per channel. The 3-label model has three outputs, one for thebackground, one for the boundary and one for the interior and uses the categorical crossentropy loss with a softmax activation. The extended cpv models have each three additionalchannels for the three dimensional center point vector and use the sum of squared diﬀerencesloss for the auxiliary task.

Appendix C. Post-processing details

For the sdt models the network output can directly be used in the watershed transform,all regions smaller than the seed threshold are seeds and are extended until they reach theforeground as determined by the foreground threshold. The euclidean distance transformis computed from the boundary pixel within the instances. Thresholding the foreground atzero leaves the instances slightly small. Validation performance improves by still threshold-ing at zero but additionally dilating all instances by one . For the 3-label models we use oneminus the softmax value of the interior class for the watershed map. One minus the softmaxvalue of the background class is used to deﬁne the foreground. For the aﬃnity models weuse one minus the average aﬃnity value per pixel for the map. To determine the seedsthe aﬃnity values are thresholded individually, if two out of three are above the threshold,the pixel belongs to a seed region. The prediction performance of aﬃnities improves if theboundary between neighboring instances is more than a single pixel. We erode all instancesbefore training once to achieve this eﬀect. To reverse the eﬀect of this on the prediction,the resulting instances are dilated once.We additionally experimented with using the center point vector predictions not only asan auxiliary task during training but also for the post-processing. To this end we transformthe three dimensional vector map into a one dimensional height map by accumulating foreach pixel a counter of how many vectors point to it. Thresholding this height map resultsin regions which can be used, as before, as seeds for the watershed. The performance wascomparable to using the main prediction. For detection we also list models for which thehyper-parameters were validated and selected wrt. their segmentation performance, insteadof their detection performance ( val. column). (see Table 3 for the extended results) https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.binary_dilation.html n Auxiliary Task for Learning Nuclei Segmentation in 3D Microscopy Images metric val. seeds sdt sdt+cpv 3-label 3-label+cpv aﬀ. aﬀ.+cpv gausssegmentation: AP . seg main 0.701 0.745 0.734 AP . seg cpv - 0.732 - - 0.725 -detection: AP det main 0.878 0.901 0.900 AP seg main 0.879 0.898 0.896 0.909 0.871 0.892 - AP det cpv - 0.889 - 0.878 - 0.896 - AP seg cpv - 0.887 - 0.881 - 0.890 -Table 3: Quantitative evaluation results comparing the segmentation results at an IoUthreshold of 0.5 and the detection results: We additionally compare seeds com-puted based on the main prediction (depending on the base model) and basedon the cpv prediction as well as detection performance with the hyper-parametersselected based on the validation performance on the detection task versus on thesegmentation task. The models trained with the auxiliary task consistently per-form better than their respective baseline model. The detection performance ofthe Gaussian regression model is en par with the baseline models. irsch Kainmueller Appendix D. Hyper-parameter details