[PDF] CryoNuSeg: A Dataset for Nuclei Instance Segmentation of Cryosectioned H&E-Stained Histological Images

Abstract

Nuclei instance segmentation plays an important role in the analysis of Hematoxylin and Eosin (H&E)-stained images. While supervised deep learning (DL)-based approaches represent the state-of-the-art in automatic nuclei instance segmentation, annotated datasets are required to train these models. There are two main types of tissue processing protocols, namely formalin-fixed paraffin-embedded samples (FFPE) and frozen tissue samples (FS). Although FFPE-derived H&E stained tissue sections are the most widely used samples, H&E staining on frozen sections derived from FS samples is a relevant method in intra-operative surgical sessions as it can be performed fast. Due to differences in the protocols of these two types of samples, the derived images and in particular the nuclei appearance may be different in the acquired whole slide images. Analysis of FS-derived H&E stained images can be more challenging as rapid preparation, staining, and scanning of FS sections may lead to deterioration in image quality. In this paper, we introduce CryoNuSeg, the first fully annotated FS-derived cryosectioned and H&E-stained nuclei instance segmentation dataset. The dataset contains images from 10 human organs that were not exploited in other publicly available datasets, and is provided with three manual mark-ups to allow measuring intra-observer and inter-observer variability. Moreover, we investigate the effects of tissue fixation/embedding protocol (i.e., FS or FFPE) on the automatic nuclei instance segmentation performance of one of the state-of-the-art DL approaches. We also create a baseline segmentation benchmark for the dataset that can be used in future research. A step-by-step guide to generate the dataset as well as the full dataset and other detailed information are made available to fellow researchers at this https URL

Full PDF

CCRYONUSEG: A DATASET FOR NUCLEI INSTANCE SEGMENTATION OFCRYOSECTIONED H&E-STAINED HISTOLOGICAL IMAGES

Amirreza Mahbod (cid:63) , Gerald Schaefer † , Benjamin Bancher ‡ , Christine L¨ow (cid:63) ,Georg Dorffner ‡ , Rupert Ecker § , Isabella Ellinger (cid:63)(cid:63) Institute for Pathophysiology and Allergy Research, Medical University of Vienna, Austria † Department of Computer Science, Loughborough University, U.K. ‡ Section for Artiﬁcial Intelligence and Decision Support, Medical University of Vienna, Austria § Department of Research and Development, TissueGnostics GmbH, Austria

ABSTRACT

Nuclei instance segmentation plays an important role in theanalysis of Hematoxylin and Eosin (H&E)-stained images.While supervised deep learning (DL)-based approaches rep-resent the state-of-the-art in automatic nuclei instance seg-mentation, annotated datasets are required to train these mod-els. There are two main types of tissue processing protocols,namely formalin-ﬁxed parafﬁn-embedded samples (FFPE)and frozen tissue samples (FS). Although FFPE-derived H&Estained tissue sections are the most widely used samples,H&E staining on frozen sections derived from FS samples isa relevant method in intra-operative surgical sessions as it canbe performed fast. Due to differences in the protocols of thesetwo types of samples, the derived images and in particularthe nuclei appearance may be different in the acquired wholeslide images. Analysis of FS-derived H&E stained imagescan be more challenging as rapid preparation, staining, andscanning of FS sections may lead to deterioration in imagequality.In this paper, we introduce CryoNuSeg, the ﬁrst fullyannotated FS-derived cryosectioned and H&E-stained nu-clei instance segmentation dataset. The dataset containsimages from 10 human organs that were not exploited inother publicly available datasets, and is provided with threemanual mark-ups to allow measuring intra-observer and inter-observer variability. Moreover, we investigate the effects oftissue ﬁxation/embedding protocol (i.e., FS or FFPE) on theautomatic nuclei instance segmentation performance of oneof the state-of-the-art DL approaches. We also create a base-line segmentation benchmark for the dataset that can be usedin future research.A step-by-step guide to generate the dataset as well asthe full dataset and other detailed information are made avail-able to fellow researchers at https://github.com/masih4/CryoNuSeg . Index Terms — Medical Image Analysis, ComputationalPathology, Nuclei Segmentation, H&E Staining, Frozen Tis- sue Samples, Deep Learning, Intra-Observer Variability,Inter-Observer Variability

1. INTRODUCTION

Digital pathology enables the acquisition, management andsharing of information retrieved from stained digitised tissuesections from patient-derived biopsies in a digital environ-ment. This offers many beneﬁts including image interpre-tation by remotely located specialists or further use of thesamples for scientiﬁc purposes [1]. An additional advantageof digitised samples is their utilisation for computer-mediatedquantitative image analysis [2]. In combination, digitalpathology and computational image analysis are expected tosigniﬁcantly improve clinical practice, for instance throughsecond opinion systems supporting the work of patholo-gists [3].Examination of Hematoxylin and Eosin (H&E)-stainedtissue sections can reveal important information about indi-vidual cells and their functional status [4]. Consequently,judgement of these histopathological images remains the“gold standard” in diagnosing a variety of diseases includingalmost all types of cancer. Nuclei morphology, shape, type,count, and density are the key components in the evaluationprocess of H&E-stained tissue images. To extract these fea-tures automatically with a computer-based method, nucleiinstance segmentation is required [5].Nuclei instance segmentation masks should provide ex-act boundaries around each individual nucleus and alsodistinguish overlapping or touching objects. While manycomputer-based nuclei instance segmentation methods havebeen proposed in the literature, supervised deep learning(DL)-based approaches are the most promising [5, 6, 7].However, they require fully annotated datasets to be able totrain the neural networks. Annotated data is also requiredto quantitatively evaluate any model’s performance. Whilemanual labelling of nuclei by biomedical experts is consid-ered as the gold standard method to create nuclei instance a r X i v : . [ ee ss . I V ] J a n egmentation ground truth, this is very time demanding andmay suffer from intra-observer and inter-observer variability.A number of fully manually annotated nuclei instancesegmentation datasets of H&E-stained images have beenformerly introduced as shown in Table 1. In addition, Pan-Nuke [8] is a recent semi-automatically created dataset offormalin-ﬁxed and parafﬁn-embedded samples with nearly200,000 nuclei from 19 different organs where the nucleimasks were ﬁrst created automatically and then revised byhumans.While these datasets contain H&E-stained images andcorresponding instance segmentation ground truths, an im-portant information regarding the utilised tissue processingapproach is missing. There are two main types of tissue pro-cessing methods that can be applied before H&E-staining ofsections, namely formalin (chemical)-ﬁxation and parafﬁn-embedding (FFPE) of tissue samples, and preparation offrozen samples (FS). The general workﬂows to obtain H&E-stained sections using these two methods are shown in Fig. 1.H&E staining of FFPE samples is the more widely usedapproach. Here, samples are ﬁxed with formalin and embed-ded in blocks of parafﬁn wax. Thin slices of these blocksare then stained and used for diagnostic purposes. Whilethe samples derived form this workﬂow are very durable, theentire procedure is rather time-consuming, taking hours todays [17]. On the other hand, preparation of sections fromFS is very fast and takes between 5 and 20 minutes [17]. Thesamples are rapidly frozen (for structure preservation) andthen a cryo-microtome is used to section them.Following H&E-staining, the samples are investigated un-der a microscope. FS staining is usually required in oncologi-cal surgeries, where rapid and online diagnosis and decisionsare important for further processes. Distinguishing a benignlesion from a malignant lesion, online cancer grading, deter-mining the borders around the abnormal tissues, and access-ing the invasion status of lesions are some applications whereFS are used [18]. A 92%–98% diagnostic accuracy with im-ages derived from FS can be achieved by experienced pathol-ogists. However, in some challenging cases, different diag-nostic outcomes are reported for FFPE- and FS-derived H&Estained images. This is mainly due to technical problems ofrapid sample preparation in FS-derived samples which maylead to poor quality of the derived whole slide images (WSIs).Due to the same technical issue, the nuclei appearance may bedifferent in FS-derived samples compared to FFPE-derivedsamples. More condensed nuclear chromatin or nuclear icecrystals are some of the artefacts of FS-derived stained sec-tions [18]. These artefacts can not only affect the interpre-tation by medical experts but also may have a signiﬁcant im-pact on the performance of DL-based algorithms for nuclei in-stance segmentation. In general, due to insufﬁcient numbersof available FS-derived H&E-stained WSIs, only few studieshave analysed FS-derived H&E-stained tissue sections [19,20, 21, 22] and, to our knowledge, no research has been con- ducted to explicitly analyse nuclei instance segmentation ofFS-derived H&E-stained images.In this paper, we present CryoNuSeg, the ﬁrst fully an-notated nuclei instance segmentation dataset based solely onFS-derived H&E-stained images. Nuclei are labelled manu-ally by two experts, which thus allows the measurement ofinter-observer variability. Moreover, one of the annotatorsre-labelled the entire dataset so that intra-observer variabil-ity can also be investigated. The dataset, which is availableon the Kaggle platform , comprises images from human or-gans that have not been used in formerly released datasets,and we provide step-by-step descriptions of sample selection,sample preparation, and generation of segmentation masksthroughout this paper and in the publicly available repositoryat https://github.com/masih4/CryoNuSeg . Wefurther exploit a state-of-the-art DL-based nuclei segmenta-tion algorithm [23] and investigate the effect of tissue ﬁxa-tion/embedding protocol (i.e., FS or FFPE) on instance seg-mentation performance. We also create a baseline segmen-tation benchmark for the dataset that can be used in futurestudies.

2. METHOD2.1. Dataset

The Cancer Genome Atlas (TCGA) contains more than30,000 WSIs from more than 50 human organs and tissues.Using the ﬁltering options provided in TCGA, we selectedFS-derived H&E-stained images acquired at 40x magniﬁ-cation and chose images from organs not present in otherpublicly available datasets. With the help of a senior cell bi-ologist from the Medical University of Vienna, we selected 30WSIs from 10 different human organs (three WSIs per organ),namely the adrenal gland, larynx, lymph node, mediastinum,pancreas, pleura, skin, testis, thymus, and thyroid gland. Tomaximise data variability, we also aimed to include samplesfrom different scanning centres, different disease types anddifferent sexes. Using QuPath , we extracted image patchesof a ﬁxed size of × pixels (one image patch per WSI)while aiming to extract the patches from the most represen-tative parts of the WSIs. To perform manual segmentation,we used ImageJ and its pre-built region of interest (ROI)manager tool. Further technical step-by-step descriptions ofWSI selection, WSI patch extraction by QuPath, and manualannotation with ImageJ are detailed in the released GitHubrepository. Manual instance nuclei segmentation was per-formed by two trained annotators, a biologist (Annotator 1)and a bioinformatician (Annotator 2). Both annotators wereinstructed in the same way to segment the nuclei. Annotator https://portal.gdc.cancer.gov/repository https://qupath.github.io/ https://imagej.nih.gov/ij/ able 1 . Publicly available datasets for nuclei instance segmentation. Note that the Kaggle Data Science Dataset contains bothH&E-stained bright-ﬁeld and ﬂuorescence microscopic images while all others contain only H&E-stained images.MoNuSeg = Multi-Organ Nuclei Segmentation; MoNuSAC = Multi-Organ Nuclei Segmentation and Classiﬁcation; CoNSep= Colorectal Nuclear Segmentation and Phenotypes; CPM: Computational Precision Medicine; TNBC: Triple Negative BreastCancer; CRCHisto: Colorectaladeno Carcinomas. TCGA: The Cancer Genome Atlas; UHCW: University Hospitals Coventryand Warwickshire. Dataset

Kumar et al. [5] 30 21,623 40 × × TCGAMoNuSeg [6] 44 28,846 40 × × TCGAMoNuSAC [9] 209 31,411 40 × × to × TCGACoNSeP [10] 41 24,319 40 × × UHCWCPM-15 [11] 15 2,905 40 × , 20 × × , × TCGACPM-17 [11] 32 7,570 40 × , 20 × × to × TCGATNBC [12] 50 4,022 40 × × Curie Inst.CRCHisto [13] 100 29,756 20 × × UHCWKaggle Data Science [14] 670 29,464 n/a n/a × to × n/aJanowczyk [15] 143 12,000 40 × × n/aCrowedsource [16] 64 2,532 40 × × TCGA

Fig. 1 . Preparation workﬂow of frozen sections (top) and formalin-ﬁxed parafﬁn-embedded (FFPE) sections (bottom). Theﬁxation and embedding procedure of FFPE samples is much more time consuming compared to the freezing step required toprepare frozen tissue samples. also performed a second round of manual mark-ups with agap of about three months between the two annotations.In the entire dataset, Annotator 1 identiﬁed and segmented7,596 and 8,044 nuclei in the ﬁrst and second round of manualmark-ups, respectively while Annotator 2 segmented 8,251nuclei. Besides creating labelled and binary masks, we alsocreated additional auxiliary segmentation masks that can beuseful in training supervised DL-based approaches. Theseauxiliary masks include binary segmentation masks generatedby removing touching borders, distance maps, and weightedmaps that give more weight to the pixels between close nuclei.Such masks have been shown useful for nuclei instance seg-mentation in former studies [24, 23, 12, 11], and the relatedcodes to create both conventional and auxiliary segmentationmasks are made available in our GitHub repository.Fig. 2 gives some examples of raw image patches to-gether with the resulting conventional and auxiliary segmen-tation masks. In Fig. 3, we show some inconsistent casesbetween the segmentation masks of Annotator 1 and Annota-tor 2 (inter-observer variability) and some mismatched casesbetween the ﬁrst and second segmentations of Annotator 1(intra-observer variability).To investigate the effects of the tissue ﬁxation/embeddingprotocol on DL-based nuclei segmentation, we use theMoNuSeg dataset [6] mentioned in Table 1. Informationregarding the tissue processing type is not provided in formerdatasets, however, since TCGA is also the data source of theMoNuSeg dataset, by tracking down the image codes it ispossible to retrieve the tissue ﬁxation/embedding protocol(FS or FFPE) of the MoNuSeg images. Of the 44 imagepatches in the MoNuSeg dataset, 27 images are from FS-derived sections and 17 from FFPE-derived sections. Basedon this, we separate the images to form two subsets, namelyMoNuSeg-FS and MoNuSeg-FFPE. We use the same numberof images per organ to be able to compare the nuclei seg-mentation results. For instance, of the nine kidney imagesin MoNuSeg, seven are FS and two FFPE samples. Thus, tohave a balanced datasets, we randomly choose two FS kidneyimages for the MoNuSeg-FS subset. We exclude colon andlung images since their samples are all of the same tissue ﬁx-ation/embedding protocol. The number of annotated nuclei inthe derived MoNuSeg-FS and MoNuSeg-FFPE datasets are7639 and 7503, respectively, and are split amongst the organsas shown in Table 2.

We use our recently published state-of-the-art DL-based in-stance segmentation algorithm [23] as the baseline segmenta-tion model. The general workﬂow of this method is illustratedin Fig. 4.The method proceeds in two stages. A U-Net [24]-basedmodel (segmentation U-Net) is used to separate foregroundand background, while a regression encoder-decoder-based

Table 2 . Number of images and nuclei per organ in theMoNuSeg-FS and MoNuSeg-FFPE datasets.organ MoNuSeg-FS MoNuSeg-FFPE ×

512 pixels [25, 26]. In the inferencephase, we apply two post-processing steps on the ﬁnal seg-mentation results which remove very small instances from thesegmentation masks (objects with areas containing less than20 pixels) and ﬁll holes inside detected objects (using mor-phological operations).

To evaluate the segmentation performance, we employ threeevaluation indexes [27, 5, 10], namely the Dice score, ag-gregate Jaccard index (AJI), and panoptic quality (PQ) score.The Dice score evaluates the general performance of seman-tic segmentation, while both AJI and PQ measure instancesegmentation performance. Compared to AJI, PQ is a morerobust score and does not suffer from over-penalisation. Todetect statistically signiﬁcant differences in the obtained re-sults, we use a two-sided Wilcoxon signed-rank test [28].

The code to generate the CryoNuSeg segmentation masksfrom the ImageJ ROI ﬁles is implemented in Matlab 2018a, ig. 2 . Sample CryoNuSeg images from three human organs and their corresponding segmentation masks (from ﬁrst segmen-tations of Annotator 1). For each sample, we show (from left to right) the raw image patch, the manually labelled nuclei, thebinary segmentation mask, the binary mask with touching borders removed, the distance map, and the weighted map.

Fig. 3 . Examples of deviations in the annotations. Some mis-matching annotations exemplifying inter-observer variability(comparison between second/third column and fourth col-umn) and intra-observer variability (comparison between sec-ond and third column) are shown in yellow boxes. While thereare more mismatched pairs in the masks, we limit the numberof mismatched pairs to four for better visualisation. while for the segmentation model we use the Keras DLframework (version 2.3.1). All experiments are performedon a single workstation with an Intel Core i7-8700 3.20 GHzCPU, 32 GB of RAM, and a TITIAN V NVIDIA GPU cardwith 12 GB of installed memory.

3. RESULTS

We ﬁrst use the MoNuSeg-FS dataset as well as the MoNuSeg-FFPE dataset as training sets and employ the entire Cry-oNuSeg dataset as the test set. We perform an identicaltraining scheme for training both models as described inSection 2.2. The main aim of this set of experiments is toinvestigate the effect of tissue ﬁxation/embedding protocolon the performance of the nuclei segmentation model. Theobtained results are given in Table 3.We perform an additional experiment using the combinedMoNuSeg-FS and MoNuSeg-FFPE datasets as training dataand the entire CryoNuSeg dataset as test set. The resultsof this yield an average Dice score of 79.1 ± ± ± p -values of 0.0196, 0.0001, and 0.0001 for Dice score, AJI, andPQ score, respectively, when comparing to the results fromMoNuSeg-FS, while p -values of 0.0068 (Dice), 0.0005 (AJI),and 0.0001 (PQ) are obtained when comparing to the resultsderived from the MoNuSeg-FFPE dataset. ig. 4 . Flowchart of the employed instance segmentation algorithm. Table 3 . Segmentation results on CroNuSeg when trained on MoNuSeg-FS and MoNuSeg-FFPE, respectively. The results, interms of Dice score, AJI, and PQ score, are based on the manual segmentation masks from Annotator 1 (ﬁrst round of mark-ups)as ground truth. The reported p -values in the bottom row show the results of the statistical signiﬁcance test when comparingthe two training sets. test organ Dice score (%) AJI (%) PQ score (%) MoNuSeg-FS MoNuSeg-FFPE MoNuSeg-FS MoNuSeg-FFPE MoNuSeg-FS MoNuSeg-FFPE adrenal gland larynx lymph node mediastinum pancreas pleura skin testis thymus thyroid gland average ± ± ± ± ± ± p -value able 4 . 10CV segmentation results of CryoNuSeg dataset interms of Dice score, AJI, and PQ score (based on the segmen-tation masks of Annotator 1 in the ﬁrst round of mark-ups). test organ Dice (%) AJI (%) PQ (%)adrenal gland larynx lymph node mediastinum pancreas pleura skin testis thymus thyroid gland average ± ± ± ± ± ±

4. DISCUSSION

In this paper, we introduce the ﬁrst fully manually anno-tated nuclei segmentation dataset based on FS-derived H&E-stained sections. Further, we investigate the impact of thetissue ﬁxation/embedding procedure on the segmentation per-formance of a state-of-the-art DL-based nuclei segmentation

Table 5 . Segmentation results, in terms of Dice score, AJI,and PQ score, from comparing the manual segmentationmasks from Annotators 2 to those from Annotator 1 (ﬁrstround) to show inter-observer variability. test organ Dice (%) AJI (%) PQ (%)adrenal gland larynx lymph node mediastinum pancreas pleura skin testis thymus thyroid gland average ± ± ± Table 6 . Segmentation results, in terms of Dice score, AJI,and PQ score, from comparing Annotator 1’s manual segmen-tation masks of the ﬁrst round to those of the second round toshow intra-observer variability. test organ Dice (%) AJI (%) PQ (%)adrenal gland larynx lymph node mediastinum pancreas pleura skin testis thymus thyroid gland average ± ± ± .The results in Table 3 show the impact of the tissueﬁxation/embedding procedure on instance segmentationperformance based on training models derived from theMoNuSeg-FS and MoNuSeg-FFPE datasets which containonly FS-derived and only FFPE-derived H&E-stained im-ages, respectively. While FS- and FPPE-derived images cannever be obtained from the exact same sample, we havetried to minimise all other data variability differences usingthe same number of images per organ and samples of thesame organs (breast, bladder, prostate, brain stomach, liver,and kidney) which results also in roughly the same numberof segmented nuclei (7639 for MoNuSeg-FS and 7503 forMoNuSeg-FFPE). As Table 3 shows, the overall and organ-wise segmentation results are very competitive (except forAJI and PQ score for the adrenal gland) in both cases (i.e.,when trained on MoNuSeg-FS and MoNuSeg-FFPE, respec-tively). The Wilcoxon signed-rank test yields p -values largerthan 0.05 for all three evaluation measures (Dice score, AJI,and PQ score). Thus, we ﬁnd nuclei instance segmentationperformance to not be signiﬁcantly affected by the employedtissue ﬁxation/embedding procedure. As there are ratherlarge segmentation performance differences based on AJIand PQ score for the adrenal gland for the MoNuSeg-FS andMoNuSeg-FFPE datasets, further research with a larger train-ing sample of adrenal gland images is required to investigatethe impact of tissue ﬁxation protocol on the nuclei instancesegmentation for this type of tissue.The results from the combined MoNuSeg-FS/MoNuSeg-FFPE experiment show a slight but statistically signiﬁcant im-provement in the segmentation performance, speciﬁcally forPQ score, with the boost in segmentation performance likelyrelated to the increased number of training images (30 insteadof 15 images) in this experiment. Further research is requiredto investigate the impact of the tissue ﬁxation/embedding protocol on other histological image analysis tasks such asgland/tumour segmentation or WSI classiﬁcation/grading.As a segmentation benchmark for our CryoNuSeg dataset,we perform 10-fold cross-validation whereby the images ofnine organs are used for training while testing on the tenthand repeating the process so that each organ is once used fortesting. This separation method had two advantages over arandom division. First, the segmentation results reported inTable 4 show the generalisation ability of the model better,since in each fold an unseen test organ is used for the eval-uation. Second, for future studies that use this dataset fordeveloping instance segmentation models, identical folds canbe easily created to fairly compare segmentation results.The results from the experiment, where MoNuSeg-FS andMoNuSeg-FFPE are also used for training, show, as expected,a slight improvement in the segmentation performance.Comparing the results from Table 3 and Table 4, we cansee that segmentation performance improves when trainingon CryoNuSeg images compared to training on MoNuSegimages (either MoNuSeg-FS or MoNuSeg-FFPE). The dif-ference is statistically signiﬁcant as conﬁrmed by a pair-wisestatistical test for each of the utilised evaluation indices. Theobtained p -values are well below 0.05 for all six cases (threeevaluation measures each for MoNuSeg-FS vs. CryoNuSegand MoNuSeg-FFPE vs. CryoNuSeg). This could be relatedto the larger number of training images in the CryoNuSegdataset compared to the MoNuSeg-FS/MoNuSeg-FFPE im-ages although all three datasets have roughly the same num-ber of annotated nuclei. Another reason could be related tothe inter-observer variability since CryoNuSeg, a single per-son annotated the training and test images (Annotator 1 inour experiments), while for the MoNuSeg-FS and MoNuSeg-FFPE images there are different annotators for the trainingand test images.Our CryoNuSeg dataset comes with segmentations fromtwo annotators (a biologist and a bioinformatician); both wereinstructed identically on how the segmentation task shouldbe obtained (from a technical viewpoint). To measure theinter-observer variability, Table 5 compares the segmentationmasks of two annotators. While in an ideal scenario, a per-fect match would be achieved, as the results show there isa signiﬁcant difference. The difference is relatively small interms of Dice score but more evident for AJI and PQ score.This indicates that the overall agreement of the two annota-tors to distinguish the background and foreground is muchbetter than to distinguishing touching or overlapping nuclei.By visual inspection of the manual segmentation masks fromthe two annotators, it can be noticed that Annotator 2 has atendency to over-segment the nuclei as shown by some ex-amples in Fig. 3 and as demonstrated by the higher numberof identiﬁed nuclei compared to Annotator 1. The issue ofinter-observer variability was also observed in a subset of theMoNuSeg dataset where the agreement between two annota-tors (based on AJI) was reported to be only 65% [6].ince two segmentations are available from Annotator 1,we can measure intra-observer variability as reported in Ta-ble 6. While a perfect match would represent the ideal sit-uation, as the results in Table 6 show, there is a signiﬁcantdifference for all three evaluation indexes (and much moreevident for AJI and PQ score). By comparing the average andorgan-wise results in Table 5 and Table 6, it can be observedthat inter-observer variability has a larger impact on the seg-mentation performance in comparison to intra-observer vari-ability.Fuzzy or unclear borders between touching nuclei, foldedtissues and other acquisition artefacts, manual annotation er-rors in the nuclei borders, and sensitivity loss of the annota-tors due to fatigue are some parameters that can cause inter-observer and intra-observer variability issues. These prob-lems can be partially resolved by removing vague areas in themanual segmentation masks [9], but this requires extra super-vision and time.We can also compare the manual segmentation resultsof the second annotator from Table 5 with the 10CV resultsfrom Table 4 to fairly compare an automated segmentationalgorithm with a human annotator. From the tables, we cansee that the results are relatively close. Statistical tests forthe three evaluation indexes yield p -values of 0.055 for Dicescore, 0.052 for AJI, and 0.003 for PQ score which suggestthat the baseline segmentation method we provide is withinreach of the performance of a manual annotator (although thePQ score results are statistically signiﬁcantly different).

5. CONCLUSIONS

In this paper, we have introduced the ﬁrst fully manually an-notated FS-derived H&E-stained nuclei segmentation dataset.Our CryoNuSeg dataset is publicly available to fellow re-searchers together with extensive documentation and sampleimplementations. It can thus be used for developing andcomparing nuclei instance segmentation algorithms. We havealso provided a baseline segmentation benchmark foundedon a state-of-the-art DL-based approach for this purpose. Inaddition, we have investigated the impact of the tissue ﬁx-ation/embedding method on the segmentation performance,while further work is planned to investigate and compare theimpact of tissue ﬁxation and embedding technique on dif-ferent histological image analysis tasks. We hope that thedataset will prove useful for the community and are lookingforward to its use in future studies.

Acknowledgements

This work was supported by the Austrian Research PromotionAgency (FFG), No. 872636, and a Kaggle open data researchgrant. We would like to also thank NVIDIA for their generousGPU donation.

6. REFERENCES [1] Alexi Baidoshvili, Anca Bucur, Jasper van Leeuwen,Jeroen van der Laak, Philip Kluin, and Paul J vanDiest, “Evaluating the beneﬁts of digital pathologyimplementation: time savings in laboratory logistics,”

Histopathology , vol. 73, no. 5, pp. 784–794, 2018.[2] Metin N Gurcan, Laura Boucheron, Ali Can, AnantMadabhushi, Nasir Rajpoot, and Bulent Yener,“Histopathological image analysis: A review,”

IEEEReviews in Biomedical Engineering , vol. 2, pp.147–171, 2009.[3] Laura Barisoni, Kyle J Lafata, Stephen M Hewitt, AnantMadabhushi, and Ulysses GJ Balis, “Digital pathologyand computational image analysis in nephropathology,”

Nature Reviews Nephrology , vol. 16, no. 11, pp. 669–685, 2020.[4] John K. C. Chan, “The wonderful colors of thehematoxylin-eosin stain in diagnostic surgical pathol-ogy,”

International Journal of Surgical Pathology , vol.22, no. 1, pp. 12–32, 2014.[5] N. Kumar, R. Verma, S. Sharma, S. Bhargava, A. Va-hadane, and A. Sethi, “A dataset and a techniquefor generalized nuclear segmentation for computationalpathology,”

IEEE Transactions on Medical Imaging ,vol. 36, no. 7, pp. 1550–1560, 2017.[6] N. Kumar, R. Verma, D. Anand, Y. Zhou, O. F. On-der, E. Tsougenis, H. Chen, P. A. Heng, J. Li, Z. Hu,Y. Wang, N. A. Koohbanani, M. Jahanifar, N. Z. Tajed-din, A. Gooya, N. Rajpoot, X. Ren, S. Zhou, Q. Wang,D. Shen, C. K. Yang, C. H. Weng, W. H. Yu, C. Y.Yeh, S. Yang, S. Xu, P. H. Yeung, P. Sun, A. Mahbod,G. Schaefer, I. Ellinger, R. Ecker, O. Smedby, C. Wang,B. Chidester, T. V. Ton, M. Tran, J. Ma, M. N. Do,S. Graham, Q. D. Vu, J. T. Kwak, A. Gunda, R. Chun-duri, C. Hu, X. Zhou, D. Lotﬁ, R. Safdari, A. Kasce-nas, A. O’Neil, D. Eschweiler, J. Stegmaier, Y. Cui,B. Yin, K. Chen, X. Tian, P. Gruening, E. Barth, E. Ar-bel, I. Remer, A. Ben-Dor, E. Sirazitdinova, M. Kohl,S. Braunewell, Y. Li, X. Xie, L. Shen, J. Ma, K. D.Baksi, M. A. Khan, J. Choo, A. n. Colomer, V. Naranjo,L. Pei, K. M. Iftekharuddin, K. Roy, D. Bhattacharjee,A. Pedraza, M. G. Bueno, S. Devanathan, S. Radhakr-ishnan, P. Koduganty, Z. Wu, G. Cai, X. Liu, Y. Wang,and A. Sethi, “A multi-organ nucleus segmentation chal-lenge,”

IEEE Transactions on Medical Imaging , vol. 39,no. 5, pp. 1380–1391, 2019.[7] Bingchao Zhao, Xin Chen, Zhi Li, Zhiwen Yu, Su Yao,Lixu Yan, Yuqian Wang, Zaiyi Liu, Changhong Liang,nd Chu Han, “Triple u-net: Hematoxylin-aware nu-clei segmentation with progressive dense feature aggre-gation,”

Medical Image Analysis , vol. 65, pp. 101786,2020.[8] Jevgenij Gamper, Navid Alemi Koohbanani, SimonGraham, Mostafa Jahanifar, Syed Ali Khurram, AyeshaAzam, Katherine Hewitt, and Nasir Rajpoot, “Pan-Nuke dataset extension, insights and baselines,” arXivpreprint arXiv:2003.10778 , 2020.[9] Ruchika Verma, Neeraj Kumar, Abhijeet Patil,Nikhil Cherian Kurian, Swapnil Rane, and Amit Sethi,“Multi-organ nuclei segmentation and classiﬁcationchallenge 2020,” .[10] Simon Graham, Quoc Dang Vu, Shan E Ahmed Raza,Ayesha Azam, Yee Wah Tsang, Jin Tae Kwak, andNasir Rajpoot, “Hover-Net: Simultaneous segmenta-tion and classiﬁcation of nuclei in multi-tissue histologyimages,”

Medical Image Analysis , vol. 58, pp. 101563,2019.[11] Quoc Dang Vu, Simon Graham, Tahsin Kurc, MinhNguyen Nhat To, Muhammad Shaban, Talha Qaiser,Navid Alemi Koohbanani, Syed Ali Khurram, JayashreeKalpathy-Cramer, Tianhao Zhao, et al., “Methods forsegmentation and classiﬁcation of digital microscopytissue images,”

Frontiers in Bioengineering andBiotechnology , vol. 7, pp. 53, 2019.[12] P. Naylor, M. La´e, F. Reyal, and T. Walter, “Segmenta-tion of nuclei in histopathology images by deep regres-sion of the distance map,”

IEEE Transactions on Medi-cal Imaging , vol. 38, no. 2, pp. 448–459, Feb 2019.[13] Korsuk Sirin, Shan E Ahmed Raza, Yee-Wah Tsang,David RJ Snead, Ian A Cree, and Nasir M Rajpoot, “Lo-cality sensitive deep learning for detection and classiﬁ-cation of nuclei in routine colon cancer histology im-ages.,”

IEEE Transaction on Medical Imaging , vol. 35,no. 5, pp. 1196–1206, 2016.[14] Juan C Caicedo, Allen Goodman, Kyle W Karhohs,Beth A Cimini, Jeanelle Ackerman, Marzieh Haghighi,CherKeng Heng, Tim Becker, Minh Doan, Claire Mc-Quin, et al., “Nucleus segmentation across imaging ex-periments: the 2018 data science bowl,”

Nature meth-ods , vol. 16, no. 12, pp. 1247–1253, 2019.[15] Andrew Janowczyk, “Andrew janowczyk - tidbitsfrom along the way, use case 1: Nuclei segmen-tation,” , 2019, Accessed: 2019-09-01. [16] Humayun Irshad, Laleh Montaser-Kouhsari, Gail Waltz,Octavian Bucur, JA Nowak, Fei Dong, Nicholas WKnoblauch, and Andrew H Beck, “Crowdsourcing im-age annotation for nucleus detection and segmentationin computational pathology: evaluating experts, auto-mated methods, and the crowd,” in

Paciﬁc symposiumon biocomputing Co-chairs , 2014, pp. 294–305.[17] DA Novis and RJ Zarbo, “Interinstitutional comparisonof frozen section turnaround time. a college of americanpathologists q-probes study of 32868 frozen sections in700 hospitals,”

Archives of Pathology & Amp; Labora-tory Medicine , vol. 121, no. 6, pp. 559–567, June 1997.[18] Hasnan Jaafar, “Intra-operative frozen section consul-tation: concepts, applications and limitations,”

TheMalaysian Journal of Medical Sciences , vol. 13, no. 1,pp. 4–12, 2006.[19] Daisuke Komura and Shumpei Ishikawa, “Machinelearning methods for histopathological image analysis,”

Computational and Structural Biotechnology Journal ,vol. 16, pp. 34–42, 2018.[20] Dario Sitnik, Gorana Aralica, Arijana Paˇci´c, Mari-jana Popovi´c Hadˇzija, Mirko Hadˇzija, and Ivica Ko-priva, “Deep learning approaches for intraoperativepixel-based diagnosis of colon cancer metastasis in aliver from phase-contrast images of unstained speci-mens,” in

Medical Imaging 2020: Digital Pathology .2020, vol. 11320, pp. 47 – 56, SPIE.[21] Dario Sitnik, Ivica Kopriva, Gorana Aralica, ArijanaPaˇci´c, Marijana Popovi´c Hadˇzija, and Mirko Hadˇzija,“Transfer learning approach for intraoperative pixel-based diagnosis of colon cancer metastasis in a liverfrom hematoxylin-eosin stained specimens,” in

MedicalImaging 2020: Digital Pathology , John E. Tomaszewskiand Aaron D. Ward, Eds. International Society for Op-tics and Photonics, 2020, vol. 11320, pp. 57 – 67, SPIE.[22] Fazly Salleh Abas, Hamza Numan Gokozan, BehiyeGoksel, Jose J. Otero, and Metin N. Gurcan, “Intra-operative neuropathology of glioma recurrence: cell de-tection and classiﬁcation,” in

Medical Imaging 2016:Digital Pathology . International Society for Optics andPhotonics, 2016, vol. 9791, pp. 59 – 68, SPIE.[23] Amirreza Mahbod, Gerald Schaefer, Isabella Ellinger,Rupert Ecker, ¨Orjan Smedby, and Chunliang Wang, “Atwo-stage U-Net algorithm for segmentation of nuclei inH&E-stained tissues,” in

Digital Pathology . 2019, pp.75–82, Springer International Publishing.[24] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Con-volutional networks for biomedical image segmenta-tion,” in

International Conference on Medical Imageomputing and Computer-Assisted Intervention , 2015,pp. 234–241.[25] Amirreza Mahbod, Gerald Schaefer, Chunliang Wang,Georg Dorffner, Rupert Ecker, and Isabella Ellinger,“Transfer learning using a multi-scale and multi-network ensemble for skin lesion classiﬁcation,”

Com-puter Methods and Programs in Biomedicine , vol. 193,pp. 105475, 2020.[26] Amirreza Mahbod, Philipp Tschandl, Georg Langs, Ru-pert Ecker, and Isabella Ellinger, “The effects of skin le-sion segmentation on the performance of dermatoscopicimage classiﬁcation,”

Computer Methods and Programsin Biomedicine , vol. 197, pp. 105725, 2020.[27] Alexander Kirillov, Kaiming He, Ross Girshick, CarstenRother, and Piotr Dollar, “Panoptic segmentation,” in

Conference on Computer Vision and Pattern Recogni-tion , June 2019, pp. 9404–9413.[28] Jean Dickinson Gibbons and Subhabrata Chakraborti,

Nonparametric Statistical Inference: Revised and Ex-panded , CRC Press, 2014.[29] X. Yang, H. Li, and X. Zhou, “Nuclei segmentation us-ing marker-controlled watershed, tracking using mean-shift, and Kalman ﬁlter in time-lapse microscopy,”