[PDF] CPP-Net: Context-aware Polygon Proposal Network for Nucleus Segmentation

Abstract

Nucleus segmentation is a challenging task due to the crowded distribution and blurry boundaries of nuclei. Recent approaches represent nuclei by means of polygons to differentiate between touching and overlapping nuclei and have accordingly achieved promising performance. Each polygon is represented by a set of centroid-to-boundary distances, which are in turn predicted by features of the centroid pixel for a single nucleus. However, using the centroid pixel alone does not provide sufficient contextual information for robust prediction. To handle this problem, we propose a Context-aware Polygon Proposal Network (CPP-Net) for nucleus segmentation. First, we sample a point set rather than one single pixel within each cell for distance prediction. This strategy substantially enhances contextual information and thereby improves the robustness of the prediction. Second, we propose a Confidence-based Weighting Module, which adaptively fuses the predictions from the sampled point set. Third, we introduce a novel Shape-Aware Perceptual (SAP) loss that constrains the shape of the predicted polygons. Here, the SAP loss is based on an additional network that is pre-trained by means of mapping the centroid probability map and the pixel-to-boundary distance maps to a different nucleus representation. Extensive experiments justify the effectiveness of each component in the proposed CPP-Net. Finally, CPP-Net is found to achieve state-of-the-art performance on three publicly available databases, namely DSB2018, BBBC06, and PanNuke. Code of this paper will be released.

Full PDF

11 CPP-Net: Context-aware Polygon ProposalNetwork for Nucleus Segmentation

Shengcong Chen, Changxing Ding, Minfeng Liu, and Dacheng Tao

Abstract — Nucleus segmentation is a challenging taskdue to the crowded distribution and blurry boundaries ofnuclei. Recent approaches represent nuclei by means ofpolygons to differentiate between touching and overlappingnuclei and have accordingly achieved promising perfor-mance. Each polygon is represented by a set of centroid-to-boundary distances, which are in turn predicted by featuresof the centroid pixel for a single nucleus. However, usingthe centroid pixel alone does not provide sufﬁcient contex-tual information for robust prediction. To handle this prob-lem, we propose a Context-aware Polygon Proposal Net-work (CPP-Net) for nucleus segmentation. First, we samplea point set rather than one single pixel within each cell fordistance prediction. This strategy substantially enhancescontextual information and thereby improves the robust-ness of the prediction. Second, we propose a Conﬁdence-based Weighting Module, which adaptively fuses the pre-dictions from the sampled point set. Third, we introduce anovel Shape-Aware Perceptual (SAP) loss that constrainsthe shape of the predicted polygons. Here, the SAP loss isbased on an additional network that is pre-trained by meansof mapping the centroid probability map and the pixel-to-boundary distance maps to a different nucleus represen-tation. Extensive experiments justify the effectiveness ofeach component in the proposed CPP-Net. Finally, CPP-Netis found to achieve state-of-the-art performance on threepublicly available databases, namely DSB2018, BBBC06,and PanNuke. Code of this paper will be released.

Index Terms — Nucleus segmentation, Instance segmen-tation, Contextual information, Perceptual loss.

I. I

NTRODUCTION N UCLEUS segmentation is a process aimed at detectingand delineating each nucleus in microscopy images. Thisprocess is capable of providing rich spatial and morphologicalinformation about nuclei; therefore, it plays an important rolein many cell analysis applications, such as cell-counting, cell-tracking, phenotype classiﬁcation and treatment planning [1].Manual nucleus segmentation is time-consuming, meaningthat automatic nucleus segmentation methods have becomeincreasingly necessary.

Shengcong Chen and Changxing Ding are with the Schoolof Electronic and Information Engineering, South China Univer-sity of Technology, Guangzhou, Guangdong 510006, China (e-mail:[email protected]; [email protected]).Minfeng Liu is with Nanfang Hospital, Southern MedicalUniversity, Guangzhou, Guangdong 510515, China (e-mail:[email protected]).Dacheng Tao is with the School of Computer Science, Faculty ofEngineering, University of Sydney, Darlington, NSW 2008, Australia (e-mail: [email protected]). (a) Exemplar Patches (b) Bounding Boxes (c) Nucleus Boundary

Fig. 1 : (a) Exemplar patches that contain touching nuclei. (b)As nuclei tend to overlap with each other, the bounding box forone instance may also cover other nuclei. (c) The boundariesbetween touching nuclei tend to be blurry, which increases thedifﬁculty of the nucleus segmentation task.However, automatic nucleus segmentation still remains achallenging task in terms of robustness due to the crowdeddistribution of nuclei and their blurry boundaries, as illustratedin Fig. 1. Unlike objects in natural images, nuclei tend tooverlap with each other. As a result, the bounding box for oneinstance often covers other nuclei, which negatively impactsthe robustness of traditional bounding box-based detectionmethods, such as Mask R-CNN [2]. Another major challengelies in the blurry boundary between touching nuclei, whichincreases the difﬁculty of inferring their boundaries.A large number of approaches have been proposed to handlethe above challenges [3]–[5], [7]–[9], [11], [12], [17], [20],[21]. For example, Chen et al. [3] differentiate instances ofnuclei according to their boundaries. Graham et al. [4] rep-resent nucleus instances using pixel-to-centroid distance mapsin both the horizontal and vertical directions. Koohbanani etal. [5] infer nucleus instances by clustering bounding boxespredicted on each pixel within nuclei. When attempting toﬁnally obtain nucleus instances, the above approaches typi-cally resort to complex post-processing operations, such asmorphologic operations [3], watershed algorithms [4], [11],[12], and clustering [5]. Several recent works [13], [14], [29]represent each instance using a polygon, which is realized a r X i v : . [ c s . C V ] F e b by predicting a set of centroid-to-boundary distances. Theyrequire only light-weight post-processing operations, i.e., non-maximum suppression, to remove redundant proposals; there-fore, their pipelines are more straightforward and efﬁcient.However, these approaches predict polygons using featuresof the centroid pixel for each instance only, whereas thecentroid alone lacks contextual information [30], [31]. In par-ticular, the centroid is located far away from boundary pixelsfor large-sized nuclei, which degrades the distance predictionaccuracy. Moreover, supervision is imposed on each respectivedistance value and there is a lack of global constraint on theshape of each nucleus.In this paper, we propose a Context-aware Polygon ProposalNetwork (CPP-Net) to improve the robustness of polygon-based methods [13] for nucleus segmentation. The contribu-tions of this paper are made from three perspectives. First,CPP-Net explores more contextual information to improvethe prediction accuracy for the centroid-to-boundary distances;speciﬁcally, it adopts the StarDist [13] model to conduct initialdistance prediction along a set of pre-deﬁned directions. It thensamples a set of points between the centroid and the initiallypredicted boundary along each direction. As these points arecloser to the boundary than the centroid pixel, their distanceto the ground-truth boundary can be predicted much moreaccurately. Correspondingly, the initially predicted centroid-to-boundary distance value can be reﬁned with reference tothe predictions for those sampled points.Second, the prediction conﬁdence of these sampled pointstypically varies according to their feature quality. For example,the errors contained in the distances initially predicted byStarDist [13] can be ampliﬁed in case where some sam-pled points actually fall outside the nucleus. Accordingly,the weights of the sampled points should change depend-ing on their prediction conﬁdence. We therefore propose aConﬁdence-based Weighting Module (CWM) that adaptivelyfuses the predicted distances for these points. With the assis-tance of CWM, CPP-Net can more robustly utilize contextualinformation from the sampled points.Third, we introduce a novel Shape-Aware Perceptual (SAP)loss, which constrains CPP-Net’s predictions regarding thenucleus shape. The original perceptual loss [32] penalizesthe differences in the hidden feature maps of a pre-trainedclassiﬁcation network between two input images. To encodethe shape information of the nucleus into the perceptual loss,we train an encoder-decoder model that maps the representa-tion of nucleus shape in CPP-Net, i.e., the pixel-to-boundarydistance maps and the centroid probability map, to other shaperepresentations, such as nucleus bounding boxes. By beingtrained in this way, this model is capable of extracting richshape information related to nuclei. We then adopt the encoderpart to extract feature maps for the predictions and the ground-truth output of CPP-Net, respectively. The SAP loss penalizesthe differences between these extracted feature maps. In thisway, the shapes of nuclei during training are constrained.In this paper, we conduct ablation study on the pro-posed components in CPP-Net on the DSB2018 [1] andBBBC006 [33] databases. Our experimental results justifythe effectiveness of these components. Finally, we compare the performance of CPP-Net with state-of-the-art methods onDSB2018, BBBC006, and PanNuke [34], [35]; under thesecircumstances, CPP-Net consistently achieves state-of-the-artperformance.The remainder of this paper is organized as follows. Re-lated works on nucleus segmentation are reviewed brieﬂy inSection II. The proposed methods are described in SectionIII, while implementation details are presented in Section IV.Experimental results are presented in Section V, along withtheir analysis. Finally, we conclude this paper in Section VI. II. R

ELATED W ORKS

A number of effective approaches for nucleus segmentationhave been proposed. In this section, we divide the recentresearches into two categories, namely traditional methods anddeep-learning based methods.Many traditional methods are based on the watershed al-gorithm [22]–[24]. For example, Malpica et al. [22] proposeda morphological watershed-based algorithm, which is assistedby means of empirically designed image processing opera-tions. This approach utilizes both intensity and morphologyinformation for nucleus segmentation. However, this is likelyto cause over-segmentation, and also results in limitationsin the processing of overlapping nuclei [23], [24]. Yang etal. [23] proposed a new marker extraction method based oncondition erosion to alleviate the over-segmentation problem.Tareef et al. [24] proposed a Multi-Pass Fast Watershedmethod that adaptively and efﬁciently segments overlappingcervical cells. Moreover, the active contour model (ACM)has also been widely adopted for nucleus segmentation [25],[26]. For example, Molna et al. [26] proposed to promotethe performance of ACM by exploring prior knowledge,speciﬁcally the understanding that nuclei usually have ellipse-shaped boundaries. Other traditional methods, such as level-set[27] and template-matching [28], have also been adopted fornucleus segmentation. The common downside of traditionalmethods is that they typically require hand-crafted features,which depend on human expertise and have limitations interms of their representation power.In recent years, deep-learning based approaches haveachieved notable success on nucleus segmentation tasks [3]–[21]. These works can be further categorized into two-stageand one-stage methods.Two-stage methods consist of a detection stage, whichlocates nucleus instances, and a segmentation stage, whichpredicts a foreground mask for each instance. One representa-tive method of this kind is Mask R-CNN [2], [18], whichdetects nucleus instances using bounding boxes. However,the shape of nuclei tends to be elliptical, and severe occlu-sion typically exists between instances; this means that eachbounding box may contain pixels representing two or moreinstances, indicating that bounding boxes may be ultimatelysub-optimal for nucleus segmentation [13], [16]. To handlethis problem, SpaNet [5] detects instance centroids and per-forms semantic segmentation in its ﬁrst stage. In its secondstage, it predicts the bounding box of the associated instanceaccording to the feature of each foreground pixel. Finally, it

HEN et al. : CPP-NET: CONTEXT-AWARE POLYGON PROPOSAL NETWORK FOR NUCLEUS SEGMENTATION 3 separates overlapping nuclei by clustering the above pixel-wisepredictions using the centroids as clustering centers. Moreover,BRP-Net [16] is also a two-stage network. It includes adetection stage, which generates region proposals based oninstance boundaries, and a reﬁnement stage, which reﬁnes theforeground area of each instance. Notably, neither SpaNet [5]nor BRP-Net [16] is designed in an end-to-end manner, whichincreases the complexity of the entire system.By contrast, one-stage methods adopt a single network.Based on the network prediction, they utilize post-processingoperations to obtain nucleus instances. Depending on thenetwork prediction property being utilized, one-stage methodscan be further subdivided into classiﬁcation-based models andregression-based models.As the name suggests, classiﬁcation-based models out-put classiﬁcation probability maps. Existing works inthis sub-category include boundary-based [3], [6]–[10] andconnectivity-based [17] methods. Boundary-based methodstypically include a boundary detection branch and a semanticsegmentation branch [3], [7], [8]; for example, DCAN [3] con-structs two separate decoders for boundary detection and se-mantic segmentation, respectively. Because these two tasks arerelated, BES-Net [7] and CIA-Net [8] respectively introduceuni- and bi-directional connections between the two branches.These methods process images in the RGB color space. Incomparison, Zhao et al. [9] leveraged the optical characteristicsof Haemotoxylin and Eosin (H&E) staining, and proposed aHematoxylin-aware Triplet U-Net, which makes predictionswith reference to the extracted Hematoxylin component in theimage. By subtracting instance boundaries from the segmenta-tion maps, overlapped nuclei can be separated; the downside ofthis is that such a subtraction operation may result in a loss ofsegmentation accuracy [16]. Moreover, we term PatchPerPix[17] as a connectivity-based method, since the prediction itmakes indicates whether a pixel is located in the same instanceas each of its neighbors. Due to the advantages it offers in thecontext of describing the local shape of instances in smallpatches, PatchPerPix is capable of segmenting instances withsophisticated shapes.In comparison, the regression-based models output regres-sion maps, e.g., distances or coordinate offsets for each pixelof the input image. For example, HoVer-Net [4] predicts thedistances from each foreground pixel to its correspondingnucleus centroid in both the horizontal and vertical directions.It then employs the marker-controlled watershed algorithm aspost-processing to obtain nucleus instances. The performanceof these approaches is affected by the empirically designedpost-processing strategies. Recently, Schmidt et al. [13] pro-posed the StarDist approach, which predicts both the centroidprobability maps and distances from each foreground pixel toits associated instance boundary along a set of pre-deﬁneddirections. In the post-processing step, StarDist generatespolygon proposals based on the set of predicted distancesfor each centroid pixel. Each polygon represents one nucleusinstance. In this method, polygons are predicted using thefeatures of the centroid pixel only; as a result, contextualinformation for large-sized nucleus instances is lacking, whichaffects the prediction accuracy. Our proposed CPP-Net is a one-stage method and relatesclosely to StarDist [13]. CPP-Net improves the robustnessof StarDist by integrating rich contextual information from asampled point set for each centroid pixel. Moreover, CPP-Netadopts a novel Shape-Aware Perceptual loss, that constrainsCPP-Net’s predictions according to the shape prior of nuclei.

III. M

ETHODS

A. Overview

Fig. 2 presents the structure of CPP-Net for nucleus seg-mentation. The backbone of CPP-Net is a simple U-Net. Threeparallel × convolutional (Conv) layers are attached to thebackbone. These layers predict the pixel-to-boundary distancemaps D ∈ R H × W × K , the conﬁdence maps C ∈ R H × W × K ,and the centroid probability map P c ∈ R H × W , respectively. H and W represent the height and width of the image,respectively. For clarity, we denote the coordinate space ofthe input image as Ω and the total number of elements in Ω as | Ω | . The same as [13], each element in the k -th channel of D refers to the distance between a foreground pixel and theboundary of its associated instance along the k -th pre-deﬁneddirection. K denotes the number of total directions. Elementsin P c indicate the probability of each foreground pixel beingthe instance centroid.In what follows, we ﬁrst propose a Context EnhancementModule (CEM), which samples a point set to explore morecontextual information for pixel-to-boundary distance predic-tion. We then design a Conﬁdence-based Weighting Module(CWM) that adaptively combines the predictions from thesampled points. Finally, we introduce the Shape-Aware Per-ceptual (SAP) loss, which further promotes the segmentationaccuracy. B. Context Enhancement Module

The nucleus segmentation task comprises two subtasks:instance detection and instance-wise segmentation. The re-cently developed StarDist approach [13] performs these twosubtasks in parallel. The ﬁrst detects the centroid of eachnucleus, whereas the second segments each instance using apolygon, which is represented using the distances from thecentroid pixel to the instance boundary along K pre-deﬁneddirections. In [13], the distances are predicted using only thefeatures of the centroid. However, the size of nuclei may varydramatically, meaning that the centroid pixel alone may lackcontextual information for precise distance predictions.To handle the above problem, we propose CEM, whichutilizes pixels that are closer to the boundaries to reﬁne thedistance prediction. To achieve this goal, CEM ﬁrst samples N points between each pixel and its predicted boundary positionalong each direction. It then merges the predicted pixel-to-boundary distances of the N points, and adaptively updatesthe pixel-to-boundary distance of the initial pixel. Formallyspeaking, the reﬁned pixel-to-boundary distance along the k -th direction for one pixel ( x, y ) can be obtained as follows: D rk ( x, y ) = (cid:88) n ∈ [0 ,N ] w n ( D k ( x nk , y nk ) + nN D k ( x, y )) , (1) C , , 0 ≤ ≤ Backbone CEM (the -th Direction) Post-processingCWM Max PoolingBilinear Upsampling

Sampling on and for pixel , C C Concatenation C Inner Product , , , , , 0 ≤ ≤ Fig. 2 : The architecture of CPP-Net. This model adopts U-Net as its backbone, which makes three types of predictions for eachinput image: the pixel-to-boundary distance maps D , the prediction conﬁdence maps C , and the centroid probability maps P c . In this ﬁgure, we take the k -th direction as an illustrative example. The Context Enhancement Module (CEM) conductssampling on D according to Eq. (1). Coordinates of the sampled points are computed according to Eq. (2) and Eq. (3). TheConﬁdence-based Weighting Module (CWM) performs sampling on C in the same location as above. It then produces weightsthat are used to fuse the distance predictions of the sampled points. In this way, CPP-Net predicts the reﬁned pixel-to-boundarydistance maps, i.e., D r , more robustly through the use of rich contextual information. Best viewed in color.where D k ( x, y ) denotes the initially predicted pixel-to-boundary distance in D along the k -th direction for ( x, y ) . ≤ k ≤ K − , where k indexes the sampling directions. D k ( x k , y k ) is equal to D k ( x, y ) . In this paper, we uniformlysample the N points between the initial pixel and its predictedboundary along each speciﬁed direction. The coordinates ( x nk , y nk ) for the n -th sampled point are accordingly computedas follows: x nk = x + nN D k ( x, y ) cos ( 2 kK π ) , (2) y nk = y + nN D k ( x, y ) sin ( 2 kK π ) . (3)Finally, w n in Eq. (1) denotes the weight of the n -thsampled point. One simple weighting strategy for use isaveraging, i.e., setting all w n to N +1 . C. Conﬁdence-based Weighting Module

Although the averaging strategy is effective for Eq. (1),it is also sub-optimal as it neglects the impact of predictionquality on the sampled points. Prediction quality is affected byboth image quality and the position of the sampled points. Inparticular, sampled points near to the boundary may actuallylie outside of the nucleus, as D k ( x, y ) in Eq. (1) containserrors. Therefore, the prediction accuracy on the sampledpoints is variable. Accordingly, we propose a Conﬁdence-based Weighting Module (CWM) that adaptively fuses pre-dictions on these sampled points. As Fig. 2 illustrates, we attach an extra × Conv layer tothe backbone model in order to produce conﬁdence maps C ,the sizes of which are the same as those of D . Each element in C measures the prediction conﬁdence for the correspondingelement in D . We then perform sampling on both D and C using coordinates computed according to Eq. (2) and Eq.(3) along each sampling direction, respectively. Sizes of theresulting tensors are therefore H × W × ( N + 1) for eachdirection. The tensor sampled from C is fed into a × Conv layer and a Softmax layer. The output dimension of the × Conv layer is also N + 1 . The Softmax layer outputs thenormalized weights; these normalized weights are used as w n in Eq. (1). It is worth noting that the K sampling directionsshare parameters of the × Conv layer.

D. Loss Functions

The StarDist model [13] utilizes two loss terms: the binarycross entropy loss for centroid probability prediction, and theweighted L1 loss for pixel-to-boundary distance regression.These two loss terms are formulated as follows: L prob = 1 | Ω | (cid:88) ( x,y ) ∈ Ω P gtc ( x, y ) log( P c ( x, y ))+(1 − P gtc ( x, y )) log(1 − P c ( x, y )) , (4) HEN et al. : CPP-NET: CONTEXT-AWARE POLYGON PROPOSAL NETWORK FOR NUCLEUS SEGMENTATION 5 Boundary-based representation

Max Pooling

Bilinear Upsampling1x1 Conv

Prediction

Ground Truth

Shape-Aware Feature ExtractorShape-Aware Feature Extractor

Shape-AwareFeature SpacePre-training of the Shape-Aware Feature Extractor Training of CPP-Net with SAP loss

Bounding box-basedrepresentation

Fig. 3 : Illustration of the SAP loss. The transformation model in the left sub-ﬁgure converts the instance representationsutilized in CPP-Net to other forms of instance representation. After the training of the transformation model is completed, theparameters of its encoder are ﬁxed. The encoder can extract high-level shape features of the nuclei; and is therefore used as ashape-aware feature extractor in the SAP loss, as shown in the right sub-ﬁgure. L dist = 1 K | Ω | (cid:88) ( x,y ) ∈ Ω K − (cid:88) k =0 P gtc ( x, y ) | D gtk ( x, y ) − D k ( x, y ) | , (5) L SD = L prob + L dist , (6)where P gtc ( x, y ) and P c ( x, y ) represent elements in theground-truth and predicted centroid probability maps, respec-tively. We follow the same process as that outlined in [13] toobtain the ground-truth centroid probability map, i.e., utilizingthe normalized pixel-to-boundary distance map as centroidprobability map. D gtk ( x, y ) and D k ( x, y ) denote elementsof the ground-truth and predicted pixel-to-boundary mapsrespectively along the k -th direction.For CPP-Net, there are two predicted distance maps, namely D and D r . D is predicted by the backbone model, while D r represents the ﬁnal pixel-to-boundary distance predictionby CPP-Net. Accordingly, we modify Eq. (5) for CPP-Net asfollows: L (cid:48) dist = 1 K | Ω | (cid:88) ( x,y ) ∈ Ω K − (cid:88) k =0 P gtc ( x, y )( | D gtk ( x, y ) − D k ( x, y ) | + | D gtk ( x, y ) − D rk ( x, y ) | ) , (7)where D rk ( x, y ) denotes the reﬁned pixel-to-boundary distancein D r along the k -th direction for ( x, y ) .Eq. (5) and Eq. (7) penalize the prediction error in eachrespective pixel-to-boundary distance value, while the overallshapes of nucleus instances are ignored. In fact, nucleusinstances typically have similar shapes; this can be utilized asprior knowledge to facilitate accurate nucleus segmentation.However, it is challenging to explicitly represent the overallshape of a single nucleus instance. To deal with this problem,we adopt an implicit approach inspired by the perceptual loss[32], which is proposed for style transformation and super-resolution tasks. In [32], a network pre-trained for image classiﬁcation on ImageNet [37] is used as a feature extractor,with the differences between the extracted features of oneimage pair being penalized. This approach encourages thehigh-level information of the two images to be similar. Inspiredby the original perceptual loss, we propose a Shape-AwarePerceptual (SAP) loss for nucleus segmentation.The aim of the SAP loss is to penalize the differences inshape feature between the predicted and ground-truth nucleusrepresentations. To encode the shape information in a deepmodel, we propose transforming the nucleus representationsin CPP-Net, i.e., the pixel-to-boundary distance maps D andthe centroid probability map P c , to other representation forms[4], [5], [8], [13]. This transformation is accomplished usingan encoder-decoder structure as illustrated in Fig. 3.This paper mainly considers two nucleus representationstrategies: ﬁrst, the semantic segmentation and boundary de-tection maps in boundary-based approaches [8]; second, thelocation and the size of the associated bounding box foreach nucleus. During training of the transformation model, weconcatenate the ground-truth D gt and P gtc for each imageto create the inputs. The binary cross-entropy loss and L1loss are adopted for the two target representation strategies,respectively.After training is completed, we adopt the encoder of thetransformation model for the SAP loss to train CPP-Net. TheSAP loss can be formulated as follows: S = f e ( D gt , P gtc ) − f e ( D , P c ) , (8) S r = f e ( D gt , P gtc ) − f e ( D r , P c ) , (9) L SAP = 1 | Ω (cid:48) | (cid:88) ( x (cid:48) ,y (cid:48) ) ∈ Ω (cid:48) (cid:107) s ( x (cid:48) , y (cid:48) ) (cid:107) + (cid:107) s r ( x (cid:48) , y (cid:48) ) (cid:107) , (10)where Ω (cid:48) denotes the 2D coordinate space of the extractedshape-aware feature maps, while s ( x (cid:48) , y (cid:48) ) and s r ( x (cid:48) , y (cid:48) ) are the vectors in S and S r at the location of ( x (cid:48) , y (cid:48) ) , respec-tively. Moreover, f e denotes the encoder of the pre-trainedtransformation model. The parameters of f e are ﬁxed duringthe training of CPP-Net. Finally, the entire loss of CPP-Net issummarized as follows: L CP P = L prob + L (cid:48) dist + L SAP . (11)In the interests of simplicity, we adopt equal weights for thethree terms in L CP P . IV. E

XPERIMENTAL S ETUP

To justify the effectiveness of CPP-Net, we conduct ex-tensive experiments on three publicly available datasets, i.e.,DSB2018 [1], BBBC006 [33], and PanNuke [34].

A. Datasets

1) DSB2018:

Data Science Bowl 2018 (DSB2018) [1] isa nucleus detection and segmentation competition, in which adataset of 670 images and manual annotations are available. Tofacilitate fair comparisons with existing approaches, we followthe evaluation protocol outlined in [13]. In this protocol, thetraining, validation, and testing sets include 380, 67, and 50images, respectively.

2) BBBC006:

Images in BBBC006 [33] were captured byone 384-well microplate containing stained U2OS cells. Twoﬁelds of view are selected for each well to obtain images.There are two images for each ﬁeld of view: one Hoechstimage and one phalloidin image. Accordingly, BBBC006contains 1,536 images from 768 ﬁelds of view. In our experi-ments, we randomly divide the dataset into training, validation,and testing sets, which contains 924, 306, and 306 images,respectively.

3) PanNuke:

PanNuke [34], [35] is an H&E stained imageset, containing 7,904 × patches from a total of 19different tissue types. The nuclei are classiﬁed into neoplastic,inﬂammatory, connective/soft tissue, dead, and epithelial cells.We follow the evaluation protocol outlined in [35], whichdivides the patches into three folds containing 2,657, 2,524,and 2,723 images, respectively. Three different dataset splitsare then made based on these three folds. One fold of datais used for training, with the remaining two folds used asvalidation and testing sets respectively. B. Implementation Details

On DSB2018 and BBBC006, we adopt a very similar U-Net backbone as that used in [13] for CPP-Net to facilitatefair comparison. This backbone includes three down-samplingblocks in its encoder and three up-sampling blocks in itsdecoder. The only change is that we replace all Batch Nor-malization (BN) layers [38] with Group Normalization (GN)layers [39], since we use a small batch size of 1 for training.On PanNuke, we make two changes to this backbone. First,to ensure fair comparison with existing approaches [4], wereplace the encoder of this backbone with ResNet-50 [40], andinitialize its weights with those pre-trained on ImageNet [37].Second, we attach another decoder to classify nucleus types for each input image pixel. Loss functions for this decoderinclude the summation of the Cross Entropy loss and the Diceloss [41].We adopt a deeper structure for the encoder-decoder modelin the SAP loss. This model includes four down-samplingand four up-sampling blocks that are used to extract morehigh-level information. The other details of the architectureare the same as the U-Net backbone in CPP-Net, except thatthe encoder-decoder model does not utilize shortcuts.The Adam algorithm [42] is employed for optimization. Theinitial learning rate is set to × − , and is reduced throughmultiplying by 0.5 if the validation loss no longer reduces.The training process halts if the learning rate is reduced toless than × − . We adopt online data augmentation ofrandom rotation and horizontal ﬂipping during training. Asfor the encoder-decoder model, we use the same trainingsettings outlined as above, except that data augmentation isnot employed. C. Evaluation Metrics

For DSB2018 and BBBC006, we adopt the same evaluationmetric as in [1] and [13]. According to the metric, the averageprecision (AP) with IoU thresholds ranging from 0.5 to 0.9with a step size of 0.05 are computed. For the PanNukedatabase, we adopt the Panoptic Quality (PQ) presented in[34] as the evaluation metric. PQ has been widely adopted inpanoptic segmentation tasks and was introduced into nucleussegmentation in [4]. We report the PQs of all 19 tissues.Besides, both multi-class PQ (mPQ) and binary PQ (bPQ)are computed for evaluation. The mPQ averages the PQperformance on each of the ﬁve nucleus categories, while thebPQ directly computes the overall performance on images ofall ﬁve nucleus categories.

V. E

XPERIMENTAL R ESULTS

In what follows, we ﬁrst conduct experiments on twopublicly available databases, DSB2018 [13] and BBBC006[33], to determine the optimal number of sampling points N and demonstrate the effectiveness of the CEM module. Wethen justify the effectiveness of the CWM module and theSAP loss. Finally, we compare the performance of CPP-Netwith other methods on all three databases. A. Evaluation of CEM

In this experiment, we evaluate the optimal number ofsampling points in CEM. To facilitate clean comparison, weremove the SAP loss for CPP-Net, and consistently adoptCWM as the weighting strategy in Eq. (1). We further changethe number of sampling points, i.e., N , from 0 to 7, andreport the experimental results in Table I. When N is equalto 0, CPP-Net reduces to the StarDist model [13]. As Table Ishows, the performance of CPP-Net continues to improve as N increases from 0 to 6; however, its performance saturateswhen N exceeds 6. Therefore, we consistently set N to 6 inthe following experiments.It is clear that a single sampling point alone is able tosigniﬁcantly boost the APs on both databases, especially for HEN et al. : CPP-NET: CONTEXT-AWARE POLYGON PROPOSAL NETWORK FOR NUCLEUS SEGMENTATION 7

TABLE I : Ablation study on numbers of sampling points in CEM.

Dataset

N AP . AP . AP . AP . AP . AP . AP . AP . AP . MeanDSB2018 0 0.8731 0.8481 0.8220 0.7849 0.7368 0.6591 0.5709 0.4401 0.2566 0.66571 0.8762 0.8568 0.8332 0.8042 0.7608 0.6968 0.6057 0.4805 0.3264 0.69342 0.8758 0.8538 0.8310 0.8037 0.7585 0.6947 0.6128 0.4918 0.3407 0.69593 0.8784 0.8555 0.8357 0.8027 0.7681 0.6955 0.6076 0.4872 0.3309 0.69574 0.8753 0.8508 0.8317 0.7995 0.7606 0.6950 0.6128 0.4887 0.3530 0.69645 0.8742 0.8566 0.8359 0.8024 0.7618 0.6983 0.6198 0.4921 0.3461 0.69866 0.8801 0.8576 0.8352 0.8021 0.7631 0.7024 0.6185 0.4974 0.3445

TABLE II : Ablation study investigating different weighting strategies in CEM.

Dataset Method AP . AP . AP . AP . AP . AP . AP . AP . AP . MeanDSB2018 baseline 0.8731 0.8481 0.8220 0.7849 0.7368 0.6591 0.5709 0.4401 0.2566 0.6657equal weights 0.8758 0.8589 0.8305 0.8023 0.7597 0.6934 0.6102 0.4848 0.3255 0.6935na¨ıve attention 0.8758 0.8585 0.8364 0.8042 0.7612 0.6999 0.6170 0.4923 0.3330 0.6976CWM 0.8801 0.8576 0.8352 0.8021 0.7631 0.7024 0.6185 0.4974 0.3445

BBBC006 baseline 0.8405 0.8167 0.7895 0.7517 0.7025 0.6396 0.5637 0.4834 0.4038 0.6657equal weights 0.8462 0.8217 0.7928 0.7556 0.7070 0.6437 0.5678 0.4912 0.4268 0.6725na¨ıve attention 0.8443 0.8205 0.7925 0.7567 0.7076 0.6467 0.5720 0.4957 0.4298 0.6740CWM 0.8411 0.8173 0.7899 0.7558 0.7094 0.6491 0.5772 0.5022 0.4389

APs under high IoU thresholds. Moreover, when N is equal to6, CEM improves the mean APs by . on the DSB2018database and . on the BBBC006 database. The aboveexperiments justify the effectiveness of CEM. B. Evaluation of CWM

The results of the ablation study on the CWM moduleare summarized in Table II. In this table, ‘baseline’ refersto the StarDist model [13], i.e., setting N in CPP-Net to 0.In addition to CWM, another two weighting strategies areevaluated. ‘Equal weights’ denotes the averaging weightingstrategy for Eq. (1), while ‘Na¨ıve attention’ represents learningﬁxed weights for the N + 1 points in Eq. (1), using a trainablevector with N + 1 elements.It is shown that CEM consistently outperforms the baselinemodel by large margins, regardless of the speciﬁc weightingstrategy in Eq. (1). Moreover, compared with the other twoweighting strategies, CWM achieves the best mean AP perfor-mance. CWM’s advantage lies mainly in its APs under highIoU thresholds, which indicates that the instance segmentationaccuracy is increased. This performance improvement can beascribed to the superior ﬂexibility of CWM. In short, unlikethe two weighting strategies that adopt ﬁxed weights, CWMcan adaptively weigh each sampled point according to thequality of its features. The above experimental results justifythe effectiveness of CWM. C. Evaluation of the SAP Loss

In this experiment, we justify the effectiveness of the SAPloss. Utilizing the SAP loss requires pre-training an encoder-decoder model that transforms the instance representations in CPP-Net to other types of representations (as described inSection III-D). Accordingly, we evaluate the following threetypes of representation strategies for the SAP loss. The ﬁrststrategy is boundary-based, in that it predicts both semanticsegmentation masks and instance boundaries [3], [7], [8]; thesecond strategy is bounding box-based, in that it regressesboth the coordinates of nucleus centroids and bounding boxpositions for each pixel inside one instance [5]. The thirdstrategy predicts both the above mentioned representations.For simplicity, these three strategies are denoted as ‘seg &bnd’, ‘bbox’, and ‘both’ in Table III.In Table III, we ﬁrst show the performance of CPP-Netwithout using the SAP loss. On both datasets, the SAP losspromotes performance in terms of mean AP. Speciﬁcally, theSAP loss improves the mean AP by . on DSB2018 and . on BBBC006. Furthermore, it is also clear that the im-provement is mainly from APs under high IoU thresholds: forexample, . , . , . , . , and . improve-ments on AP . . . on DSB2018. For APs with lower IoUvalues, SAP loss does not introduce signiﬁcant performancepromotion. From this phenomenon, we can conclude that theSAP loss primarily penalizes the prediction error in nucleusshape, rather than the localization or detection errors.We also train the CPP-Net with another variant of theSAP loss, in which the encoder-decoder model is trainedto reconstruct its input representations, i.e., the ground-truthcentroid probability and pixel-to-boundary distance maps. Theresults of CPP-Net trained with this variant are denoted as‘recons.’ in Table III. The results show that the proposedSAP loss achieves better performance than this variant. Theadvantage achieved by our proposed SAP loss can be attributedto the transformation between different representation strate- TABLE III : Ablation study investigating the Shape-Aware Perceptual (SAP) loss.

Dataset SAP loss AP . AP . AP . AP . AP . AP . AP . AP . AP . MeanDSB2018 - 0.8801 0.8576 0.8352 0.8021 0.7631 0.7024 0.6185 0.4974 0.3445 0.7001seg & bnd 0.8770 0.8598 0.8382 0.8103 0.7691 0.7067 0.6239 0.5040 0.3494 0.7043bbox 0.8791 0.8587 0.8356 0.8087 0.7686 0.7066 0.6188 0.4994 0.3440 0.7022both 0.8760 0.8554 0.8385 0.8141 0.7770 0.7147 0.6326 0.5142 0.3550 recons. 0.8734 0.8525 0.8312 0.8045 0.7686 0.6996 0.6259 0.5074 0.3603 0.7026BBBC006 - 0.8411 0.8173 0.7899 0.7558 0.7094 0.6491 0.5772 0.5022 0.4389 0.6757seg & bnd 0.8472 0.8215 0.7933 0.7571 0.7125 0.6495 0.5782 0.5051 0.4436 0.6787bbox 0.8459 0.8205 0.7934 0.7592 0.7100 0.6487 0.5770 0.5035 0.4383 0.6774both 0.8448 0.8207 0.7962 0.7619 0.7150 0.6560 0.5831 0.5060 0.4398 recons. 0.8447 0.8199 0.7926 0.7542 0.7078 0.6459 0.5749 0.5008 0.4384 0.6755 (a) (b) (c) (d) (e)

Fig. 4 : Qualitative comparisons between StarDist, CPP-Net(w/o SAP loss), and CPP-Net trained with SAP loss. The ﬁvecolumns from left to right are the original images in DSB2018(a), the ground truth segmentation results (b), and predictionsby each of the three methods (c-e). Best viewed with zoom-in.gies. Through the use of this transformation task, the encoder-decoder model is forced to extract essential information relatedto the nucleus shape. By contrast, the ‘recons.’ variant islikely to only memorize the input information. Accordingly,our proposed SAP loss achieves better overall performancethan all other three variants. In the following, we adopt ourproposed SAP loss to train CPP-Net.

D. Qualitative Comparisons

In this experiment, we conduct qualitative comparisonsbetween StarDist [13], CPP-Net (w/o SAP loss), and CPP-Nettrained with SAP Loss, the results of which are presented inFig. 4. As is shown in the ﬁrst and second rows, StarDist maymistakenly segment a single nucleus instance into multiplenuclei; for its part, CPP-Net achieves more robust segmen-tation. Results in the third and fourth rows further indicatethat the predictions of CPP-Net are more accurate regarding instance boundaries (e.g., the concave areas along nucleusboundaries). This can be attributed to CEM’s ability to exploremore contextual information for centroid-to-boundary distanceprediction. Finally, the SAP loss further corrects nucleus shapeprediction errors, e.g., the highlighted instances in the lower-left and upper-right corners of the ﬁrst example image. Theabove qualitative comparisons justify the effectiveness of theCEM module and SAP loss, respectively.

E. Comparisons with State-of-the-Art Methods

1) Comparisons on the DSB2018 database:

We comparethe performance of CPP-Net with Mask-RCNN [2], [13],KeypointGraph [19], HoVer-Net [4], PatchPerPix [17], andStarDist [13]. The results of this comparison are tabulatedin Table IV. It is notable here that some above-mentionedmethods were evaluated using different training and testingdata split protocols in their respective papers. In the interests offair comparison, we evaluate the performance of Hover-Net [4]and KeypointGraph [19] by ourselves using codes released bythe authors, under the same evaluation protocol as [13], [17].We also reimplement the StarDist approach on DSB2018 andreplace its BN layers with GN layers. Accordingly, we achievebetter performances than the results reported in [13].As shown in Table IV, StarDist and PatchPerPix are twopowerful approaches and have their own respective advan-tages. Speciﬁcally, StarDist achieves higher AP . than Patch-PerPix, but much lower APs under high IoU thresholds. Weconjecture StarDist may be affected by prediction accuracyregarding the shape of nucleus boundaries. This is becauseStarDist adopts the features of centroid pixels only for shapeprediction; however, the centroid pixel alone lacks contextualinformation. In comparison, CPP-Net consistently achievesbetter performance than StarDist; in particular, it signiﬁcantlyimproves the performance at high IoU thresholds. Finally,CPP-Net achieves the best mean AP performance amongall methods. The above comparison experiments justify theeffectiveness of CPP-Net.We further summarize the inference time of different modelsin Table VI. Here, inference time includes the network predic-tion time and the associated post-processing time. We comparethe inference time under the same hardware conditions: oneNVIDIA TITAN Xp GPU, Intel(R) Core(TM) i7-6850K [email protected], and 128GB RAM. As shown in Table VI, StarDist[13] is the fastest among all compared approaches, while CPP-Net increases costs by only around . relative to StarDist. HEN et al. : CPP-NET: CONTEXT-AWARE POLYGON PROPOSAL NETWORK FOR NUCLEUS SEGMENTATION 9

TABLE IV : Comparisons with SOTA methods on DSB2018 and BBBC006. * denotes methods evaluated by ourselves.

Dataset Methods AP . AP . AP . AP . AP . AP . AP . AP . AP . MeanDSB2018 Mask R-CNN [2] 0.8323 0.8051 0.7728 0.7299 0.6838 0.5974 0.4893 0.3525 0.1891 0.6058StarDist [13] 0.8641 0.8361 0.8043 0.7545 0.6850 0.5862 0.4495 0.2865 0.1191 0.5983KeypointGraph* [19] 0.8244 0.8142 0.7916 0.7557 0.7083 0.6600 0.5799 0.4721 0.2989 0.6561HoVer-Net* [4] 0.7838 0.7676 0.7547 0.7391 0.7165 0.6668 0.6135 0.5102 0.3978 0.6611PatchPerPix [17] 0.8680 0.8480 0.8270 0.7950 0.7550 0.7160 0.6350 0.5180 0.3790 0.7046StarDist* [13] 0.8731 0.8481 0.8220 0.7849 0.7368 0.6591 0.5709 0.4401 0.2566 0.6657CPP-Net* 0.8760 0.8554 0.8385 0.8141 0.7770 0.7147 0.6326 0.5142 0.3550

BBBC006 InstanceEmbedding* [20] 0.6277 0.5929 0.5572 0.5133 0.4670 0.4242 0.3815 0.2264 0.0130 0.4226KeypointGraph* [19] 0.6115 0.5787 0.5425 0.5080 0.4737 0.4335 0.3611 0.1778 0.0173 0.4116HoVer-Net* [4] 0.8146 0.7896 0.7627 0.7321 0.6870 0.6274 0.5561 0.4827 0.4284 0.6534StarDist* [13] 0.8405 0.8167 0.7895 0.7517 0.7025 0.6396 0.5637 0.4834 0.4038 0.6657CPP-Net* 0.8448 0.8207 0.7962 0.7619 0.7150 0.6560 0.5831 0.5060 0.4398

TABLE V : Comparisons with SOTA methods on the PanNuke database. * denotes methods evaluated by ourselves.

Tissue Mask R-CNN Micro-Net HoVer-Net StarDist* CPP-Net* StarDist*with ResNet50 CPP-Net*with ResNet50mPQ bPQ mPQ bPQ mPQ bPQ mPQ bPQ mPQ bPQ mPQ bPQ mPQ bPQAdrenal Gland 0.3470 0.5546 0.4153 0.6440 0.4812 0.6962 0.4855 0.6764 0.4799 0.6913 0.4868 0.6972 0.4922 0.7031Bile Duct 0.3536 0.5567 0.4124 0.6232 0.4714 0.6696 0.4492 0.6417 0.4518 0.6569 0.4651 0.6690 0.4650 0.6739Bladder 0.5065 0.6049 0.5357 0.6488 0.5792 0.7031 0.5718 0.6798 0.5887 0.6847 0.5793 0.6986 0.5932 0.7057Breast 0.3882 0.5574 0.4407 0.6029 0.4902 0.6470 0.4946 0.6507 0.5031 0.6610 0.5064 0.6666 0.5066 0.6718Cervix 0.3402 0.5483 0.3795 0.6101 0.4438 0.6652 0.4544 0.6659 0.4580 0.6718 0.4628 0.6690 0.4779 0.6880Colon 0.3122 0.4603 0.3414 0.4972 0.4095 0.5575 0.4009 0.5534 0.4102 0.5646 0.4205 0.5779 0.4296 0.5888Esophagus 0.4311 0.5691 0.4668 0.6011 0.5085 0.6427 0.5206 0.6465 0.5266 0.6554 0.5331 0.6655 0.5410 0.6755Head & Neck 0.3946 0.5457 0.3668 0.5242 0.4530 0.6331 0.4613 0.6331 0.4596 0.6244 0.4768 0.6433 0.4667 0.6468Kidney 0.3553 0.5092 0.4165 0.6321 0.4424 0.6836 0.4902 0.6802 0.4736 0.6889 0.4880 0.6998 0.5092 0.7001Liver 0.4103 0.6085 0.4365 0.6666 0.4974 0.7248 0.4891 0.7007 0.4941 0.7144 0.5145 0.7231 0.5099 0.7271Lung 0.3182 0.5134 0.3370 0.5588 0.4004 0.6302 0.4032 0.6165 0.4061 0.6247 0.4128 0.6362 0.4234 0.6364Ovarian 0.4337 0.5784 0.4387 0.6013 0.4863 0.6309 0.5170 0.6499 0.5197 0.6709 0.5205 0.6668 0.5276 0.6792Pancreatic 0.3624 0.5460 0.4041 0.6074 0.4600 0.6491 0.4410 0.6331 0.4789 0.6540 0.4585 0.6601 0.4680 0.6742Prostate 0.3959 0.5789 0.4341 0.6049 0.5101 0.6615 0.4998 0.6473 0.5098 0.6674 0.5067 0.6748 0.5261 0.6903Skin 0.2665 0.5021 0.3223 0.5817 0.3429 0.6234 0.3537 0.6063 0.3399 0.6042 0.3610 0.6289 0.3547 0.6192Stomach 0.3684 0.5976 0.3872 0.6293 0.4726 0.6886 0.4191 0.6636 0.4365 0.6939 0.4477 0.6944 0.4553 0.7043Testis 0.3512 0.5420 0.4088 0.6300 0.4754 0.6890 0.4767 0.6661 0.4903 0.6787 0.4942 0.6869 0.4917 0.7006Thyroid 0.3037 0.5712 0.3712 0.6555 0.4315 0.6983 0.4166 0.6807 0.4431 0.7054 0.4300 0.6962 0.4344 0.7094Uterus 0.3683 0.5589 0.3965 0.5821 0.4393 0.6393 0.4428 0.6305 0.4610 0.6443 0.4480 0.6599 0.4790 0.6622Average across tissues 0.3688 0.5528 0.4059 0.6053 0.4629 0.6596 0.4625 0.6485 0.4700 0.6609 0.4744 0.6692

STD across splits 0.0047 0.0076 0.0082 0.0050 0.00760 0.0036 0.0078 0.0054 0.0082 0.0062 0.0037 0.0014 0.0057 0.0018

Compared with other approaches presented in Table VI, CPP-Net and StarDist are more efﬁcient owing to their light-weightbackbone and their simple post-processing operations.

TABLE VI : Average inference time on the DSB2018 database.

Methods Average Inference Time(second per image)KeypointGraph [19] 0.8556HoVer-Net [4] 1.5556PatchPerPix [17] 5.8767StarDist [13] 0.2327CPP-Net 0.2519

2) Comparisons on the BBBC006 database:

To facilitate faircomparison, we train StarDist [13], HoVer-Net [4], Keypoint-Graph [19], and InstanceEmbedding [20] using the same datasplit protocol as ours. Experimental results are summarizedin Table IV. As the table shows, similar to the results onDSB2018, the StarDist model achieves a promising AP . score but an unsatisfactory AP . score. By contrast, theproposed CPP-Net promotes the nucleus segmentation per-formance and maintains its advantages in terms of nucleusdetection. It also continues to outperform all other state-of-the-art methods. Experimental results on this database justifythe effectiveness of CPP-Net. Moreover, it is worth noting that BBBC006 consists of two types of images, speciﬁcallyHoechst images and phalloidin images. The latter image typecontains a signiﬁcant amount of noise, which affects the per-formance of KeypointGraph [19] and InstanceEmbedding [20].In comparison, StarDist, CPP-Net, and HoVer-Net continues toachieve promising results, which shows their robustness whenprocessing noisy images.

3) Comparisons on the PanNuke database:

We provide theperformance of StarDist and CPP-Net with two differentbackbones. The ﬁrst backbone adopts the same encoder asthat used in the DSB2018 database, while the second employsResNet-50 as the encoder. Their performance is compared withthat of Mask-RCNN [2], Micro-Net [15], and HoVer-Net [4]in Tables V. We further adopt the same evaluation metrics asthose in [34]. In Table V, both bPQ and mPQ are computedfor each of the 19 tissues.As the experimental results in Table V demonstrate, CPP-Net consistently outperforms StarDist using each of the twobackbones. Moreover, when CPP-Net is equipped with thesame ResNet-50 backbone as HoVer-Net, it achieves betteraverage performance than all other methods: for example, itoutperforms StarDist by . and . in mPQ and bPQ,respectively. Results of the above comparisons are consistentlywith those on the ﬁrst two databases, which further justiﬁes the effectiveness of CPP-Net. VI.

CONCLUSION

In this paper, we improve the performance of StarDist fromtwo aspects. First, we propose a Context Enhancement Modulethat enables us to explore more contextual information andaccordingly predict the centroid-to-boundary distances morerobustly, especially for large-sized nuclei. We further propose aConﬁdence-based Weighting Module that adaptively fuses thepredictions of the sampled points in the CEM module. Second,we propose a Shape-Aware Perceptual loss, which constrainsthe high-level shape information contained in the centroidprobability and pixel-to-boundary distance maps. We conductextensive ablation studies to justify the effectiveness of eachproposed component. Finally, our proposed CPP-Net modelis found to signiﬁcantly outperform the StarDist model andachieve state-of-the-art performance on three popular datasetsfor nucleus segmentation. R EFERENCES [1] J.C. Caicedo et al., “Nucleus segmentation across imaging experiments:the 2018 Data Science Bowl,”

Nat. Methods , vol. 16, no. 12, pp. 1247-1253, Oct. 2019.[2] K. He, G. Gkioxari, P. Doll`ar, and R. Girshick, “Mask R-CNN,”

IEEETrans. Pattern Anal. Mach. Intell. , vol. 42, no. 2, pp. 386-397, 2020.[3] H. Chen, X. Qi, L. Yu, and P.A. Heng,“DCAN: Deep contour-awareNnetworks for accurate gland segmentation,” in

Proc. IEEE Conf.Comput. Vis. Pattern Recognit. (CVPR) , Jun. 2016, pp. 2487-2496.[4] S. Graham et al., “HoVer-Net: Simultaneous segmentation and classiﬁ-cation of nuclei in multi-tissue histology,”

Med. Image Anal. , vol. 58,p. 101563, Dec. 2019.[5] N.A. Koohbanani, M. Jahanifar, A. Gooya, and N. Rajpoot,“Nuclear in-stance segmentation using a proposal-free spatially aware deep learningframework,” in

Proc. Int. Conf. Med. Image Comput. Comput.-Assist.Intervent. (MICCAI) , Oct. 2019, pp. 622-630.[6] N. Kumar, R. Verma, S. Sharma, S. Bhargava, A. Vahadane and A.Sethi, “A dataset and a technique for generalized nuclear segmentationfor computational pathology,”

IEEE Trans. Med. Imag. , vol. 36, no. 7,pp. 1550-1560, Jul. 2017.[7] H. OdaEmail et al., “BESNET: Boundary-enhanced segmentation ofcells in histopathological images,” in

Proc. Int. Conf. Med. ImageComput. Comput.-Assist. Intervent. (MICCAI) , Sep. 2018, pp. 228-236.[8] Y. Zhou, O.F. Onder, Q. Dou, E. Tsougenis, H. Chen, and P.A.Heng, “CIA-net: Robust nuclei instance segmentation with contour-aware information aggregation,” in

Proc. IPMI , 2019, pp. 682-693.[9] B. Zhao et al., “Triple U-net: Hematoxylin-aware nuclei segmentationwith progressive dense feature aggregation,”

Med. Image Anal. , vol. 65,p. 101786, Oct. 2020.[10] M. W. Lafarge, E. J. Bekkers, J. P.W. Pluim, R. Duits, and M. Veta,“Roto-translation equivariant convolutional networks: Application tohistopathology image analysis,”

Med. Image Anal. , vol. 68, p. 101849,Feb. 2021.[11] P. Naylor, M. La´e, F. Reyal and T. Walter “Segmentation of nuclei inhistopathology images by deep regression of the distance map,” in

IEEETrans. Med. Imag. , vol. 38, no. 2, pp. 448-459, Feb. 2019.[12] S. Wolf et al., “The mutex watershed algorithm for efﬁcient segmen-tation without seeds,” in

Proc. Eur. Conf. Comput. Vis. (ECCV) , Sep.2018, pp. 546-562.[13] U. Schmidt, M. Weigert, C. Broaddus, and G. Myers, “Cell detectionwith star-convex polygons,” in

Proc. Int. Conf. Med. Image Comput.Comput.-Assist. Intervent. (MICCAI) , Sep, 2018, pp. 265-273.[14] F. C. Walter, S. Damrich, and F. A. Hamprecht, “MultiStar: In-stance segmentation of overlapping bbjects with star-convex polygons,” arXiv:2011.13228 , 2020.[15] S.E.A. Raza et al., “Micro-Net: A uniﬁed model for segmentation ofvarious objects in microscopy images,”

Med. Image Anal. , vol. 52, pp.160-173, Feb. 2019.[16] S. Chen, C. Ding, and D. Tao, “Boundary-assisted region proposalnetworks for nucleus segmentation,” in

Proc. Int. Conf. Med. ImageComput. Comput.-Assist. Intervent. (MICCAI) , Sep. 2020, pp. 279-288. [17] P. Hirsch, L. Mais, D. Kainmueller, “PatchPerPix for instance segmenta-tion,” in

Proc. Eur. Conf. Comput. Vis. (ECCV) , Sep. 2018, pp. 546-562.[18] A. O. Vuola, S. U. Akram, and J. Kannala “Mask-RCNN and U-Netensembled for nuclei segmentation,” in

Proc. IEEE Int. Symp. Biomed.Imag. (ISBI) , Apr. 2019, pp. 208-212.[19] J. Yi et al., “Multi-scale cell instance segmentation with keypointgraph based bounding boxes,” in

Proc. Int. Conf. Med. Image Comput.Comput.-Assist. Intervent. (MICCAI) , Oct. 2019, pp. 369-377.[20] C. Long, M. Strauch, and D. Merhof, “Instance segmentation ofbiomedical images with an object-aware embedding learned with localconstraints,” in

Proc. Int. Conf. Med. Image Comput. Comput.-Assist.Intervent. (MICCAI) , Oct. 2019, pp.451-459.[21] N. Dietler et al., “A convolutional neural network segments yeastmicroscopy images with high accuracy,”

Nat. Commun. , vol. 11, no.1, p. 5723, 2020.[22] N. Malpica et al., “Applying watershed algorithms to the segmentationof clustered nuclei,”

Cytometry , vol. 28, pp. 289-297, 1997.[23] X. Yang, H. Li and X. Zhou, “Nuclei segmentation using marker-controlled watershed, tracking using mean-shift, and Kalman ﬁlter intime-lapse microscopy,”

IEEE Trans. Circuits Syst. I, Reg Papers , vol.53, no. 11, pp. 2405-2414, Nov. 2006.[24] A. Tareef et al., “Multi-pass fast watershed for accurate segmentationof overlapping cervical cells,”

IEEE Trans. Med. Imag. , vol. 37, no. 9,pp. 2044-2059, Sep. 2018.[25] P. Bamford and B. Lovell, “Unsupervised cell nucleus segmentation withactive contours,”

Signal Process. , vol. 71, no. 2, pp. 203-213, 1998.[26] C. Molna et al., “Accurate morphology preserving segmentation ofoverlapping cells based on active contours,”

Sci. Rep. , vol. 6, p. 32412,2016.[27] Z. Lu, G. Carneiro and A. P. Bradley, “An improved joint optimization ofmultiple level set functions for the segmentation of overlapping cervicalcells,”

IEEE Trans. Image Process , vol. 24, no. 4, pp. 1261-1272, Apr.2015.[28] C. Chen, W. Wang, J. A. Ozolek and G. K. Rohde, “A ﬂexible and robustapproach for segmenting cell nuclei from 2d microscopy images usingsupervised learning and template matching,”

Cytometry A , vol. 83A, no.5, pp. 495-507, 2013.[29] E. Xie et al., “PolarMask: Single shot instance segmentation with polarrepresentation,” in

Proc. IEEE Conf. Comput. Vis. Pattern Recognit.(CVPR) , Jun. 2020, pp. 12193-12202.[30] F. Wei, X. Sun, H. Li, J. Wang, S. Lin, “Point-set anchors for objectdetection, instance segmentation and pose estimation,” in

Proc. Eur.Conf. Comput. Vis. (ECCV) , Aug. 2020, pp. 527-544.[31] Y. Meng et al., “CNN-GCN Aggregation Enabled Boundary Regressionfor Biomedical Image Segmentation,” in

Proc. Int. Conf. Med. ImageComput. Comput.-Assist. Intervent. (MICCAI) , Sep. 2020, pp. 352-362.[32] J. Johnson, A. Alahi, L. Fei-Fei, “Perceptual losses for real-time styletransfer and super-resolution,” in

Proc. Eur. Conf. Comput. Vis. (ECCV) ,Oct. 2016, pp. 694-711.[33] V. Ljosa, K.L. Sokolnicki, and A.E. Carpenter, 2012. “Annotated high-throughput microscopy image sets for validation,”

Nat. Methods , vol. 9,no. 7, pp. 637-637, Jun. 2012.[34] J. Gamper, N.A. Koohbanani, K. Benet, A. Khuram, and N. Rajpoot,“PanNuke: An open pan-cancer histology dataset for nuclei instancesegmentation and classiﬁcation,” in

Proc. Eur. Congr. Digit. Pathol.(ECDP) , 2019, pp. 11-19.[35] J. Gamper et al., “PanNuke dataset extension, insights and baselines,” arXiv:2003.10778 , 2020.[36] A. Y. Ng et al., “On spectral clustering: Analysis and an algorithm,” in

Proc. Advances in Neural Information Processing Systems (NeurIPS) ,2002, pp. 849-856.[37] J. Deng et al., “ImageNet: A large-scale hierarchical image database,” in

Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) , Jun. 2009,pp. 248-255.[38] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deepnetwork training by reducing internal covariate shift,” in

Proc. Int. Conf.Mach. Learn. (ICML) , Feb. 2015, pp. 448-456.[39] Y. Wu and K. He, “Group normalization,” in

Proc. Eur. Conf. Comput.Vis. (ECCV) , Sep. 201, pp. 3-198.[40] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning forimage recognition,” in

Proc. IEEE Conf. Comput. Vis. Pattern Recognit.(CVPR) , June 2016, pp. 770-778.[41] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-Net: Fully convolutionalneural networks for volumetric medical image segmentation,” in

Proc.Int. Conf. 3D Vis. , Oct. 2016, pp. 565-571.[42] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”in