[PDF] aura-net : robust segmentation of phase-contrast microscopy images with few annotations

Abstract

We present AURA-net, a convolutional neural network (CNN) for the segmentation of phase-contrast microscopy images. AURA-net uses transfer learning to accelerate training and Attention mechanisms to help the network focus on relevant image features. In this way, it can be trained efficiently with a very limited amount of annotations. Our network can thus be used to automate the segmentation of datasets that are generally considered too small for deep learning techniques. AURA-net also uses a loss inspired by active contours that is well-adapted to the specificity of phase-contrast images, further improving performance. We show that AURA-net outperforms state-of-the-art alternatives in several small (less than 100images) datasets.

Full PDF

AAURA-NET: ROBUST SEGMENTATION OF PHASE-CONTRAST MICROSCOPY IMAGESWITH FEW ANNOTATIONS

Ethan Cohen (cid:63) † Virginie Uhlmann (cid:63)(cid:63)

European Bioinformatics Institute, European Molecular Biology Laboratory, Cambridge, UK † ´Ecole Normale Sup´erieure Paris-Saclay, Paris, France ABSTRACT

We present AURA-net, a convolutional neural network(CNN) for the segmentation of phase-contrast microscopyimages. AURA-net uses transfer learning to accelerate train-ing and Attention mechanisms to help the network focuson relevant image features. In this way, it can be trainedefﬁciently with a very limited amount of annotations. Ournetwork can thus be used to automate the segmentation ofdatasets that are generally considered too small for deeplearning techniques. AURA-net also uses a loss inspiredby active contours that is well-adapted to the speciﬁcity ofphase-contrast images, further improving performance. Weshow that AURA-net outperforms state-of-the-art alternativesin several small (less than images) datasets.

Index Terms — Bioimage analysis, segmentation, ma-chine learning, convolutional neural networks, phase-contrastmicroscopy.

1. INTRODUCTION

Label-free imaging denotes a family of non-invasive, non-toxic microscopy techniques that offer an alternative to ﬂuo-rescence microscopy, particularly for live-cell imaging exper-iments [1]. Phase-contrast (PC) is one of the oldest label-freeimaging modality [2] and remains routinely used in biologicalexperiments. PC uses an optical conﬁguration that translatesthe phase shifts of light when traversing transparent biolog-ical objects into amplitude modulations, resulting in visibledifferences in image intensity. Objects in PC images exhibithalos and shade-offs, uneven edges appearance, and gener-ally less contrast than in ﬂuorescence microscopy. Imagingbiological samples in an undisturbed state, therefore, comesat the cost of images that are more challenging to automati-cally segment. Segmentation algorithms initially designed forﬂuorescence microscopy images are indeed observed to be oflimited use in PC datasets [3].Active contour (AC) algorithms have been extensivelyconsidered for the segmentation of PC images [4]. AC con-sist of a curve model that deforms from an initial position inan image by minimizing an energy functional, which can bedesigned to capture various types of visual features such as regions or edges [5]. The major limitation of AC algorithmsis their sensitivity to initial conditions and hyper-parametersettings, restricting their ability to scale and generalize. Inrecent years, CNN have established themselves as robust al-ternatives to AC for bioimage segmentation tasks [6]. Theability of CNN to learn relevant features directly from datamakes them more robust and allows them to generalize better.Overall, CNN thus tend to be preferred over handcrafted tech-niques. While many architectures have been proposed in theﬁrst years of the deep learning era, U-net [7] emerged as anefﬁcient versatile backbone, as exempliﬁed by the many cus-tom versions derived from it [8]. Most U-net modiﬁcationsare however designed to perform best on images featuringobjects with strong edges, and few CNN solutions have beendedicated to the segmentation of PC images. Existing solu-tions either require thousands of ground-truth annotations [9],requesting signiﬁcant manual labor and restricting their useto large datasets, or rely on ﬂuorescence microscopy data fortraining [10], thus moving away from the label-free imagingparadigm. Others perform segmentation as an intermediatestep towards another analysis goal, such as cell tracking,and therefore only generate rough approximations of celloutlines [11].In this work, we present AURA-net, a CNN dedicatedto the semantic segmentation of PC microscopy images thatonly requires a small number of training annotations. Ourmethod integrates elements of AC algorithms with state-of-the-art deep learning strategies: AURA-net combines trans-fer learning and Attention mechanisms in a U-net architecturetrained on a loss inspired by the AC without edges (ACWE)algorithm. The ACWE loss, recently introduced in [12], hasbeen used so far in classical U-net and Dense-net [13] archi-tectures. Here, we incorporate it into a more complex CNNand show that its performance outperforms related state-of-the-art CNN methods.The paper is organized as follows. In Section 2, we intro-duce the building blocks of our network and describe AURA-net’s architecture and loss. We then provide experimental re-sults in Section 3, and concluding remarks in Section 4. a r X i v : . [ ee ss . I V ] F e b . METHOD2.1. Building blocks AURA-net’s architecture is based on a U-Net integratingtransfer learning and Attention mechanisms. In the follow-ing, we present in more detail these different building blocks.

U-net [7] is an encoder-decoder CNN composed of a contract-ing (downsampling) and an expansive (upsampling) path. Inthe downsampling path, pairs of × convolution layers arerepeatedly applied, followed by a rectiﬁed linear unit (ReLU)and a × max pooling operation with a stride of . Theupsampling path then similarly applies blocks of × con-volutions together with upsampling to expand the image backto its original size. U-net uses skip-connections to preservefeatures at different resolutions and improve localization in-formation. Classically, the skip connections apply a concate-nation operator between feature maps from the encoder pathsand convolution layers in the decoder path, although variantshave been proposed. As a result, a large number of featuresmaps are available in the decoder path, allowing informationto be transferred efﬁciently. A popular modiﬁcation of U-net involves transfer learn-ing [14]. Transfer learning uses the knowledge a model hasacquired in previous training to learn a new related task moreefﬁciently. Using transfer learning in U-Net consists of re-placing the downsampling path with a pre-trained deepernetwork such as ResNet [15]. Although available versions ofResNet are typically pre-trained on ImageNet [16], a databaseof natural images that signiﬁcantly differ in appearance frommicroscopy data, the ﬁrst layers of the network can detectshapes, colors, and textures at different levels of abstractionand therefore generalize successfully. This type of transferlearning strategy is essential when only a small amount oftraining data is available.

Attention [17] is a popular deep learning concept which aimsat identifying discriminating characteristics in internal acti-vation maps, using this knowledge to better represent task-speciﬁc data, and ultimately improving the performance ofa network. Attention mechanisms help remove less relevantfunctionalities and focus on more important ones for a givenlearning task. Attention gates have been introduced in theU-net architecture and shown to improve its performance invarious kinds of biomedical image segmentation tasks [18].

Putting together the concepts described in Section 2.1, wepropose AURA-net, an Attention U-net ResNet with an ACloss.

The architecture of AURA-net consists of a deeper AttentionU-net model, in which we replace the downsampling pathwith a ResNet-18 model pre-trained on ImageNet. The modelis depicted in Figure 1. First, the pre-trained ResNet acceler-ates learning when few annotations are available during train-ing. Second, Attention mechanisms help the network focuson relevant features in poorly contrasted images. Finally, theAC loss captures region information, making up for the faintor absent edges. As a result, our model, combining thesestrategies in a U-Net architecture, is perfectly suited to thesegmentation of PC microscopy datasets when only a smallset of annotations is available.

We train AURA-net on a custom loss function composed ofthree different terms: the well-known pixel-wise binary cross-entropy (BCE) [19] and Dice [20] losses, denoted as L BCE and L D respectively, and an AC loss L AC adapted from [12].The Dice and BCE losses reﬂect the accuracy of independentpixel-wise predictions but do not enforce region consistency.The addition of the AC term allows incorporating area in-formation, resulting in a loss that combines pixel-wise andregion-wise penalties.The -dimensional AC loss is inspired by the classicalACWE algorithm [21] and is deﬁned as L AC = L + λA, (1)with λ ∈ R . Considering I pred , I GT : Ω → [0 , the pre-dicted and ground truth (reference) image labelling, respec-tively, L is deﬁned over the image domain Ω as L = (cid:88) i,j ∈ Ω (cid:113) | ( ∇ x I pred i,j ) + ( ∇ y I pred i,j ) | + (cid:15), (2)where ∇ n denotes the image gradient in direction n and (cid:15) > is the machine epsilon to avoid numerical errors. This ﬁrstterm captures the length of the segmentation contour and actsas a regularizer. The second term, A , is expressed as A = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i,j ∈ Ω I pred i,j (1 − I GT i,j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:88) i,j ∈ Ω (1 − I pred i,j ) I i,j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (3) esNet18 S S S S S Attention gate1x1 conv layer + sigmoid + dropout2x (3x3 conv layer + ReLu)

Fig. 1 . AURA-net’s architecture. The network combines apre-trained ResNet-18 with an Attention-U-net and is trainedon an AC loss.It characterizes how closely the predicted objects matchground truth ones in terms of the area they occupy in theimage, thus encouraging shape preservation.We performed line search in the [0 , interval in order toidentify optimal λ values. We experimentally observed thatthe value of λ does not strongly inﬂuence results as long asit remains larger than , conﬁrming the conclusions of therobustness analysis provided in [12]. In practice, we chose torely on a value of λ = 5 to balance the length and area termsin (1).Our ﬁnal loss function is deﬁned as L = γ L AC + β ( α L BCE + (1 − α ) L D ) . (4)The hyper-parameters α , β , and γ were chosen as follows.First, α is set to . to give equal importance to the Dice andBCE losses. We then relied on line search to retrieve optimalvalues of β and γ . We identiﬁed β = 0 . and γ = 0 . to result in best performance. Note that varying α ∈ [0 , while setting γ = 0 , β = 1 results in classical pixel-wisesegmentation losses.

3. EXPERIMENTAL RESULTS & DISCUSSION3.1. Datasets

For our experiments, we rely on three publicly availablePC microscopy image datasets from the Boston UniversityBiomedical Image Library (BU-BIL) [22].• Dataset consists of raw × images. Eachimage contains a single rat smooth muscle cell. In ourexperiments, we randomly selected images for thetraining set and images for the testing set.• Dataset is composed of raw images ranging from × to × . Each image contains a sin-gle rabbit smooth muscle cell. We randomly split thedataset into training and testing images. • Dataset consists of raw images of sizes rangingfrom × to × . Each image containsa single ﬁbroblast. In our experiments, we randomlypicked images for training and for testing.Ground truth binary masks were available for each imagefor all considered datasets. For each dataset, we resized andcropped images (dataset 1: × , datasets 2 and 3: × ) for processing. The images were ﬁrst resizedwithout modifying their aspect ratio such that their smallestdimension matched the target size, and then cropped alongtheir largest dimension.We conducted classical data augmen-tation, including random ﬂips, rotations, shifts, and scaling,to enrich the training datasets. Besides, we also carried outadditional augmentations that are relevant to bioimages, suchas shearing, CLAHE [23], and elastic deformations. We ap-plied the same augmentation techniques for all baselines andfor AURA-net. AURA-net is implemented in PyTorch and is available at github.com/uhlmanngroup/AURA-Net .We quantitatively compared the performance of AURA-net against those of U-net [7], CE-net [24], and Attention U-net [18]. Being the most popular all-around performer forbioimage segmentation, the original U-net and its modiﬁedversion incorporating Attention mechanisms appeared as nat-ural candidates for comparison. CE-net consists of a modiﬁedU-net with pre-trained ResNet blocks in the encoder path andwas therefore also a relevant alternative to AURA-net.All networks were trained on an NVIDIA Tesla K80 witha batch size of , using the Adam optimizer with a learningrate of − . We let training run for epochs for Dataset , epochs for Dataset , and epochs for Dataset . In allcases, we monitored the evolution of the train and validationlosses over the epochs to ensure convergence and detect over-ﬁtting. We evaluate the performance of each approach with ﬁve clas-sical metrics: intersection over union (IoU), Dice score, preci-sion, recall, and Hausdorff Distance (HD). We provide resultson datasets , , and in Tables 1, 2, and 3, respectively.In Figure 2, we also illustrate representative segmentation re-sults on dataset for each methods. IoU Dice Precision Recall HDU-net 36.47% 49.92% 62.66% 63.92% 9.71CE-net 59.79% 72.87% 63.28% % 8.28Attention U-net 67.44% 79.77% 82.19% 81.57% 6.75AURA-net % % % 83.98% Table 1 . Comparison of AURA-net with relevant state-of-the-art alternatives on dataset . oU Dice Precision Recall HDU-net 60.74% 70.00% 72.53% 69.78% 8.56CE-net 67.67% 77.63% 70.82% 87.46% 7.98Attention U-net 75.15% 85.14% 80.35% % 5.51AURA-net % % % 89.87% Table 2 . Comparison of AURA-net with relevant state-of-the-art alternatives on dataset . IoU Dice Precision Recall HDU-net 34.19% 45.61% 43.95% 65.69% 5.33CE-net 60.33% 74.93% 63.05% 94.44% 3.83Attention U-net 69.62% 80.80% % 85.42% 3.55AURA-net % % 73.03% % Table 3 . Comparison of AURA-net with relevant state-of-the-art alternatives on dataset .The original U-net model produces poor segmentation re-sults on datasets and while performing better on dataset . CE-net performs better than U-net and results in a strongrecall in all three datasets. Attention U-net outperforms bothCE-net and U-net on most metrics. The Attention gates incor-porated in AURA-net allow outperforming CE-net, while itspre-trained layers help improve over Attention U-net. On topof that, the AC loss provides the network with additional in-formation on object regions. As a result, AURA-net generallyoutperforms competing approaches. It occasionally scoresclosely to Attention U-net and CE-net, and even concedes alead on recall in datasets and and on precision in dataset . It however performs best overall, with a consistent clearadvantage on the IoU, Dice, and HD metrics. Using dataset , we carried out ablation studies to assess theeffect of each component of our model. The results (Table 4)illustrate that transfer learning and Attention mechanisms arethe most critical elements contributing to the performance ofAURA-net. The AC loss further allows the model to improveits performance on all metrics but recall. Raw GT U-net CE-net Attention U-net AURA-net

Fig. 2 . Qualitative comparison of AURA-net segmentationresults with relevant state-of-the-art alternatives on dataset . IoU Dice Precision Recall HDU-net 36.47% 49.92% 62.66% 63.92% 9.71U-net+ResNet 69.96% 81.39% 83.83% 83.71% 6.77U-net+Attention 67.44% 79.77% 82.19% 81.57% 6.75U-net+AC loss 38.29% 53.70% 70.50% 55.46% 11.25U-net+ResNet+AC loss 70.31% 82.04% 83.18% % 6.71U-net+Attention+AC loss 68.42% 79.16% 81.89% 80.07% 6.65U-net+ResNet+Attention 71.27% 82.36% 84.22% 83.90% 6.57AURA-net % % % 83.98% Table 4 . Ablation studies on dataset . Raw GT AURA-net

Fig. 3 . Failure cases in dataset . In Figure 3, we illustrate failure cases in dataset . In the ﬁrstexample, AURA-net fails to correctly segment the bottom partof the object. This outcome is unsurprising considering thatthe raw image exhibits a lower SNR than any of the trainingdata. In the second example, the segmentation mask predictedby AURA-net contains several objects, yielding a poor over-lap with the ground truth annotation featuring a single cell.However, the original image reveals the presence of a sec-ond, partially cropped non-annotated cell. In this case, part ofAURA-net’s “false” detection are actually correct predictionsthat have been omitted in ground truth annotations.

4. CONCLUSIONS

To the best of our knowledge, AURA-net is the ﬁrst modelincorporating state-of-the-art deep learning strategies in a U-net variation targeted to the segmentation of PC microscopydatasets and requiring few training annotations. The qual-itative and quantitative results we provide demonstrate thatAURA-net is consistently able to accurately segment singlecells in PC images relying on only ∼ training examples.Future work include extending the method to handle imagesfeaturing multiple cells in close proximity. Although we fo-cused on PC, our approach is likely to be useful to segmentimages acquired with other label-free imaging modalitiessuch as differential interference contrast (DIC) microscopy. Acknowledgements

The authors thank Jamie Hackett andMatthieu Boulard for discussions that have inspired this project,s well as Johannes Huger and Julien Fageot for helpful commentson the manuscript. This work was supported by EMBL core fund-ings and by the EMBL-EBI French Embassy Internship programme.The authors have no relevant ﬁnancial or non-ﬁnancial conﬂict ofinterest to disclose.

Compliance with Ethical Standards

This is a computationalstudy for which no ethical approval was required.

5. REFERENCES [1] R. Kasprowicz et al., “Characterising live cell be-haviour: Traditional label-free and quantitative phaseimaging approaches,”

The International Journal of Bio-chemistry & Cell Biology , vol. 84, pp. 89–95, 2017.[2] F. Zernike, “Phase contrast, a new method for the micro-scopic observation of transparent objects part ii,”

Phys-ica , vol. 9, no. 10, pp. 974–986, 1942.[3] T. Vicar et al., “Cell segmentation methods for label-freecontrast microscopy: review and comprehensive com-parison,”

BMC Bioinformatics , vol. 20, no. 1, pp. 360,2019.[4] C. Zimmer et al., “Segmentation and tracking of mi-grating cells in videomicroscopy with parametric activecontours: A tool for cell-based drug testing,”

IEEETransactions on Medical Imaging , vol. 21, no. 10, pp.1212–1221, 2002.[5] R. Delgado-Gonzalo et al., “Snakes on a plane: A per-fect snap for bioimage analysis,”

IEEE Signal Process-ing Magazine , vol. 32, no. 1, pp. 41–48, 2014.[6] E. Meijering, “A bird’s-eye view of deep learningin bioimage analysis,”

Computational and StructuralBiotechnology Journal , vol. 18, pp. 2312–2325, 2020.[7] T. Falk et al., “U-net: deep learning for cell counting,detection, and morphometry,”

Nature Methods , vol. 16,no. 1, pp. 67–70, 2019.[8] Y.K.T. Xu et al., “Deep learning for high-throughputquantiﬁcation of oligodendrocyte ensheathment atsingle-cell resolution,”

Communications Biology , vol.2, no. 1, pp. 1–12, 2019.[9] S.U. Akram et al., “Cell segmentation proposal net-work for microscopy image analysis,” in

Proceedingsof DLMIA’16 , Athens, Greece, October 21, 2016, pp.21–29.[10] C. Ling et al., “Analyzing u-net robustness for singlecell nucleus segmentation from phase contrast images,”in

Proceedings of CVPR’20 , Virtual, June 14-19, 2020,pp. 966–967. [11] H.-F. Tsai et al., “Usiigaci: Instance-aware cell trackingin stain-free phase contrast microscopy enabled by ma-chine learning,”

SoftwareX , vol. 9, pp. 230–237, 2019.[12] X. Chen et al., “Learning active contour modelsfor medical image segmentation,” in

Proceedings ofCVPR’19 , Long Beach, CA, USA, June 16-20, 2019,pp. 11624–11632.[13] G. Huang et al., “Densely connected convolutional net-works,” in

Proceedings of CVPR’17 , Honolulu, HI,USA, July 21-26, 2017, pp. 2261–2269.[14] J. Yosinski et al., “How transferable are features in deepneural networks?,” in

Proceedings of NIPS’14 , Mon-treal, Canada, December 8-13, 2014, pp. 3320–3328.[15] K. He et al., “Deep residual learning for image recog-nition,” in

Proceedings of CVPR’16 , Las Vegas, NV,USA, June 26-July 1, 2016, pp. 770–778.[16] J. Deng et al., “Imagenet: A large-scale hierarchicalimage database,” in

Proceedings of CVPR’09 , Miami,FL, USA, June 20-25, 2009, pp. 248–255.[17] A. Vaswani et al., “Attention is all you need,” in

Pro-ceedings of NIPS’17 , Long Beach, CA, USA, December4-9, 2017, pp. 5998–6008.[18] O. Oktay et al., “Attention u-net: Learning where to lookfor the pancreas,” in

Proceedings of MIDL’18 , Amster-dam, The Netherlands, July 4-6, 2018.[19] I. Goodfellow, Y. Bengio, and A. Courville,

DeepLearning , MIT Press, Camnbridge, MA, USA, 2016.[20] F. Milletari et al., “V-net: Fully convolutional neuralnetworks for volumetric medical image segmentation,”in

Proceedings of 3DV’16 , Stanford, CA, USA, October25-28, 2016, pp. 565–571.[21] T.F. Chan and L.A. Vese, “Active contours withoutedges,”

IEEE Transactions on Image Processing , vol.10, no. 2, pp. 266–277, 2001.[22] D. Gurari et al., “How to collect segmentations forbiomedical images? a benchmark evaluating the per-formance of experts, crowdsourced non-experts, and al-gorithms,” in

Proceedings of WACV’15 , Waikoloa, HI,USA, January 6-8, 2015, pp. 1169–1176.[23] S.M. Pizer et al., “Contrast-limited adaptive histogramequalization: speed and effectiveness,” in

Proceedingsof VBC’90 , Atlanta, GA, USA, May 22-25, 1990, pp.337–345.[24] Z. Gu et al., “Ce-net: Context encoder network for 2dmedical image segmentation,”