[PDF] Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation

Abstract

Automatic segmentation methods are an important advancement in medical image analysis. Machine learning techniques, and deep neural networks in particular, are the state-of-the-art for most medical image segmentation tasks. Issues with class imbalance pose a significant challenge in medical datasets, with lesions often occupying a considerably smaller volume relative to the background. Loss functions used in the training of deep learning algorithms differ in their robustness to class imbalance, with direct consequences for model convergence. The most commonly used loss functions for segmentation are based on either the cross entropy loss, Dice loss or a combination of the two. We propose a Unified Focal loss, a new framework that generalises Dice and cross entropy-based losses for handling class imbalance. We evaluate our proposed loss function on three highly class imbalanced, publicly available medical imaging datasets: Breast Ultrasound 2017 (BUS2017), Brain Tumour Segmentation 2020 (BraTS20) and Kidney Tumour Segmentation 2019 (KiTS19). We compare our loss function performance against six Dice or cross entropy-based loss functions, and demonstrate that our proposed loss function is robust to class imbalance, outperforming the other loss functions across datasets. Finally, we use the Unified Focal loss together with deep supervision to achieve state-of-the-art results without modification of the original U-Net architecture, with a mean Dice similarity coefficient (DSC)=0.948 on BUS2017, enhancing tumour region DSC=0.800 on BraTS20 and kidney tumour DSC=0.758 on KiTS19. This highlights the importance of carefully selecting a suitable loss function prior to the use of more complex architectures.

Full PDF

MM. Yeung et al. : A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image Segmentation

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after whichthis version may no longer be accessible.

VOLUME x, 2021 a r X i v : . [ ee ss . I V ] F e b Mixed Focal Loss Function forHandling Class Imbalanced MedicalImage Segmentation

MICHAEL YEUNG , EVIS SALA , CAROLA-BIBIANE SCHÖNLIEB , LEONARDO RUNDO Department of Radiology, University of Cambridge, Cambridge CB2 0QQ, United Kingdom School of Clinical Medicine, University of Cambridge, Cambridge CB2 0SP, United Kingdom Cancer Research UK Cambridge Institute, Cambridge CB2 0RE, United Kingdom Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge CB3 0WA, United Kingdom

Corresponding authors: Michael Yeung (e-mail: [email protected]), Leonardo Rundo (e-mail: [email protected]).

ABSTRACT

Automatic segmentation methods are an important advancement in medical imaging analysis.Machine learning techniques, and deep neural networks in particular, are the state-of-the-art for mostautomated medical image segmentation tasks, ranging from the subcellular to the level of organ systems.Issues with class imbalance pose a signiﬁcant challenge irrespective of scale, with organs, and especiallywith tumours, often occupying a considerably smaller volume relative to the background. Loss functionsused in the training of segmentation algorithms differ in their robustness to class imbalance, with crossentropy-based losses being more affected than Dice-based losses. In this work, we ﬁrst experimentwith seven different Dice-based and cross entropy-based loss functions on the publicly available KidneyTumour Segmentation 2019 (KiTS19) Computed Tomography dataset, and then further evaluate the topthree performing loss functions on the Brain Tumour Segmentation 2020 (BraTS20) Magnetic ResonanceImaging dataset. Motivated by the results of our study, we propose a Mixed Focal loss function, a newcompound loss function derived from modiﬁed variants of the Focal loss and Focal Dice loss functions. Wedemonstrate that our proposed loss function is associated with a better recall-precision balance, signiﬁcantlyoutperforming the other loss functions in both binary and multi-class image segmentation. Importantly, theproposed Mixed Focal loss function is robust to signiﬁcant class imbalance. Furthermore, we showed thebeneﬁt of using compound losses over their component losses, and the improvement provided by the focalvariants over other variants.

INDEX TERMS

Class imbalance, Loss function, Machine learning, Medical image segmentation,Computed Tomography, Magnetic Resonance Imaging

I. INTRODUCTION

Image segmentation involves partitioning an image intomeaningful regions, based on the regional pixel characteris-tics, thus aiming at identifying objects of interest [1]. Thistask is fundamental in computer vision and has been appliedwidely in face recognition, autonomous driving, as well asmedical image processing. In particular, automatic segmenta-tion methods are an important advancement in medical imageanalysis, capable of demarcating structures across a range ofimaging modalities including computed tomography (CT),magnetic resonance imaging (MRI) and positron emissiontomography (PET).Classical approaches for image segmentation include di-rect region detection methods such as the split-and-mergeand region growing algorithms [2], graph-based methods [3],active contour and level set models [4]. Alongside thesedevelopments, later approaches have focused on applyingand adapting traditional machine learning techniques [5],such as support vector machines (SVMs) [6], unsupervised clustering [7] and atlas-based segmentation [8]. In recentyears, however, signiﬁcant progress has been achieved usingdeep learning [9], [10].The most well-known architecture in image segmentation,the U-Net architecture [11], is a modiﬁcation of the convo-lutional neural network (CNN) architecture into an encoder-decoder network, similar to SegNet [12], which enables end-to-end feature extraction and pixel classiﬁcation. Since itsinception, many variants based on the U-Net architecturehave been proposed [13], [14], including the 3D U-Net [15],Attention U-Net [16] and V-Net [17].Once a model architecture is selected, optimisation ofmodel parameters is based on minimisation of the loss func-tion during training. The cross entropy loss is perhaps themost widely used loss function in classiﬁcation problems[18] and is applied in U-Net [11], 3D U-Net [15] andSegNet [12]. In contrast, Attention U-Net [16] and V-Net[17] leverage the Dice loss function, which is based on themost commonly used metric for evaluating segmentation VOLUME x, 2021 . Yeung et al. : A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image Segmentation performance, and therefore represents a form of direct lossminimisation. Broadly, loss functions used in image segmen-tation may be classiﬁed into distribution-based losses (suchas the cross entropy loss), region-based losses (such as Diceloss), boundary-based losses (such as the boundary loss) [19],and more recently compound losses. Compound losses referto the simultaneous minimisation of multiple, independentloss functions, such as the Combo loss, which minimises thesum of Dice and cross entropy loss [20].A dominant issue in medical image segmentation is han-dling class imbalance, which refers to an unequal distribu-tion of foreground and background elements. For example,automatic organ segmentation often involves organ sizes anorder of magnitude smaller than the scan itself, resulting ina skewed distribution favouring background elements [21].This issue is even more prevalent in oncology, where tumoursizes are themselves often signiﬁcantly smaller than theirorgan of origin. In these class imbalanced circumstances,careful selection of the loss function is crucial, with theDice loss generally better suited than the cross entropy lossfunction. Taghanaki et al. [20] distinguishes between inputand output imbalance, the former as aforementioned, and thelatter referring to classiﬁcation biases arising during infer-ence. These include false positives and false negatives, whichrespectively describe background pixels incorrectly classiﬁedas foreground objects, and foreground objects incorrectlyclassiﬁed as background. Both are particularly importantin the context of medical image segmentation; in the caseof image-guided interventions, false positives may result ina larger radiation ﬁeld or excessive surgical margins, andconversely false negatives may lead to inadequate radiationdelivery or incomplete surgical resection. Therefore, it isimportant to design a loss function that can be optimised tohandle both input and output imbalances.Due to the impracticality of experimenting and testing nu-merous loss functions, it is often the case that only a handfulof loss functions are tested, from which the best performingmodel is selected. Despite its signiﬁcance, few studies havefocused on comparing large numbers of loss functions. Mun et al. [22] compared the performance of six loss functions onthe Prostate MR Image Segmentation 2012 (PROMISE12)dataset [23], with the cosine similarity outperforming Dice-based and cross entropy-based losses amongst others. Morerecently, a comparison of ﬁfteen loss functions using theNBFS Skull-stripping dataset [24] (brain CT segmentation),which also introduces the log-cosh Dice loss, concluded thatFocal Tversky loss and Tversky loss are generally optimal[25].Whilst these studies are based on organ segmentation,datasets involving tumour segmentation are associated witheven greater degrees of class imbalance. Manual tumourdelineation is both time-consuming and operator-dependent.Automatic methods of tumour delineation aim to addressthese issues, and public datasets, such as the Kidney TumourSegmentation 19 (KiTS19) dataset for kidney tumour CT[26] and Brain Tumour Segmentation 2020 (BraTS20) for brain tumour MRI [27], have accelerated progress towardsthis goal. In fact, there has been recent developments fortranslating the BraTS20 dataset into clinical and scientiﬁcpractice [28].For the KiTS19 dataset, the current state-of-the-art is the“no-new-Net” (nnU-Net) [29], [30], an automatically conﬁg-urable deep learning-based segmentation method involvingthe ensemble of 2D, 3D and cascaded 3D U-Nets. Thisframework was optimised using the Dice and cross entropyloss. Recently, an ensemble-based method obtained compa-rable results to nnU-Net, and involved initial independentprocessing of kidney organ and kidney tumour segmentationby 2D U-Nets trained using the Dice loss, followed bysuppression of false positive predictions of the kidney tumoursegmentation using the network trained for kidney organsegmentation [31]. When the dataset size is small, resultsfrom an active learning-based method using CNN-correctedlabeling, also trained using the Dice loss, showed a highersegmentation accuracy over nnU-Net [32].For the BraTS20 dataset, a popular approach is to use amulti-scale architecture where different receptive ﬁeld sizesallow the independent processing of both local and globalcontextual information [33], [34]. Kamnitsas et al. used atwo-phase training process involving initial upsampling ofunder-represented classes, followed by a second-stage wherethe output layer is retrained on a more representative sample[33]. Similarly, Havaei et al. used a sampling rule to imposeequal probability of foreground or background pixels at thecentre of a patch, and used the cross entropy loss for optimi-sation [34].It is apparent that for both the KiTS19 and BraTS20datasets, class imbalance is largely handled by altering ei-ther the training or input data sampling process, and rarelywith adapting the loss function. Even state-of-the-art solu-tions typically use either Dice loss, cross entropy loss or acombination of the two. However, popular methods—suchas upsampling the underrepresented class—are inherentlyassociated with an increase in false positive predictions,and more complicated, often multi-stage training processesrequire more computational resources. In contrast, adaptingthe loss function provides a simpler, ubiquitous solution at noadditional cost in terms of computation.In this paper, we propose the following contributions:(a) We summarise and extend the knowledge provided byprevious studies that compare loss functions using 2DU-Nets for binary classiﬁcation problems, and evaluatemultiple loss functions using 3D U-Nets for both binaryand multi-class, highly class imbalanced classiﬁcationproblems.(b) We introduce a new compound loss function, the MixedFocal loss, which enables tuning to optimise for bothinput and output imbalances.(c) Our proposed loss function improves segmentationquality over six other related loss functions across mul-tiple classes and datasets, is associated with a better

VOLUME x, 2021

3. Yeung et al. : A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image Segmentation recall-precision balance, and is robust to class imbal-ance.(d) We provide evidence demonstrating the beneﬁt of usingcompound losses over their component loss functions,and using focal variants over other variants of Dice orcross entropy-based losses in dealing with class imbal-anced problems.The manuscript is organised as follows. Section II providesa summary of the loss functions used. Section III describesthe chosen medical imaging datasets, introduces the proposedMixed Focal loss function, and deﬁnes the segmentation eval-uation metrics used. Section IV presents and discusses theexperimental results. Finally, Section V provides conclusiveremarks and future directions.

II. BACKGROUND

Minimisation of the loss function represents the optimisa-tion problem that occurs during training to generate optimalmodel parameters. This paper focuses on semantic segmen-tation, a sub-ﬁeld of image segmentation where pixel-levelclassiﬁcation is performed directly, in contrast to instancesegmentation where an additional object detection stage isrequired. We describe seven loss functions that belong toeither distribution-based, region-based or compound losses.A graphical overview of loss functions in these categoriesis provided in Fig. 1. First, the distribution-based functionsare introduced, followed by region-based loss functions, andﬁnally concluding with compound loss functions.

A. CROSS ENTROPY LOSS

The cross entropy loss is one of the most widely usedloss functions in deep learning. With origins in informationtheory, cross entropy measures the difference between twoprobability distributions for a given random variable or set ofevents. As a loss function, it is superﬁcially equivalent to thenegative log likelihood loss and, for binary classiﬁcation, thebinary cross entropy loss ( L BCE ) is deﬁned as the following: L BCE ( y , ˆ y ) = − ( y log(ˆ y ) + ( − y ) log( − ˆ y )) . (1)Here, y , ˆ y ∈ { , } N , where ˆ y refers to the predictedvalue and y refers to the ground truth label. This can beextended to multi-class problems, and the categorical crossentropy loss ( L CCE ) is computed as: L CCE = − N N (cid:88) i =1 C (cid:88) c =1 y i,c · log (cid:0) p i,c (cid:1) , (2)where y i,c uses a one-hot encoding scheme of ground truthlabels, p i,c is a matrix of predicted values for each class,and where indices c and i iterate over all classes and pixels,respectively. Cross entropy loss is based on minimising pixel-wise error, leading to over-representation of larger objectsin the loss, and consequently resulting in poorer qualitysegmentation of smaller objects. B. FOCAL LOSS

The Focal loss is a variant of the binary cross entropy loss thataddresses the issue of class imbalance faced by the standardcross entropy loss by down-weighting the contribution ofeasy examples enabling learning of harder examples [35]. Toderive the Focal loss function, we ﬁrst simplify the loss (1)as: CE ( p, y ) = (cid:40) − log( p ) , if y = 1 − log(1 − p ) , if y = 0 . (3)Next, we deﬁne the probability of predicting the groundtruth class, p t , as: p t = (cid:40) p, if y = 11 − p, if y = 0 . (4)The binary cross entropy loss ( L BCE ) can therefore berewritten as: L BCE ( p,y ) = CE ( p t ) = − log ( p t ) . (5)The Focal loss ( L F ) adds a modulating factor to the binarycross entropy loss: L F ( p t ) = α (1 − p t ) γ · L BCE ( p,y ) , (6)The Focal loss is parameterised by α and γ , which controlthe class weights and degree of down-weighting of easyexamples, respectively (Fig. 2a). When γ = 0 , the Focal losssimpliﬁes to the binary cross entropy loss.To use the Focal loss for multi-class classiﬁcation, wedeﬁne the categorical Focal loss ( L CF ): L CF = α (cid:16) − (cid:0) p t,c (cid:1)(cid:17) γ · L CCE , (7)where α is now a vector of class weights, p t,c is a matrixof ground truth probabilities for each class, and L CCE is thecategorical cross entropy loss as deﬁned in Eq. (2).

C. DICE LOSS

The Sørensen–Dice index, known as the Dice similaritycoefﬁcient (DSC) when applied to Boolean data, is the mostcommonly used metric for evaluating segmentation accuracy.We can deﬁne DSC in terms of the per voxel classiﬁcation oftrue positives (TP), false positives (FP) and false negatives(FN): DSC = 2 TP TP + FP + FN . (8)For notational convenience and to highlight its similarityto the Tversky index (TI), from now on, we deﬁne a modiﬁedDice similarity coefﬁcient (mDSC) according to Eq. (9): mDSC = (cid:80) Ni =1 p i g i (cid:80) Ni =1 p i g i + δ (cid:80) Ni =1 p i g i + (1 − δ ) (cid:80) Ni =1 p i g i , (9) VOLUME x, 2021 . Yeung et al. : A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image Segmentation

FIGURE 1: Overview of the various distribution-based, region-based and compound loss functions. The arrows connect related loss functions,with the direction of the arrows indicating the inheritance relationship.FIGURE 2: Effect of altering the parameter γ for the (a) Focal loss, (b) Focal Tversky loss, and (c) Cosine Tversky loss. This is equivalent to Eq. (8) when δ = : DSC = (cid:80) Ni =1 p i g i (cid:80) Ni =1 p i g i + (cid:80) Ni =1 p i g i + (cid:80) Ni =1 p i g i , (10)where p i is the probability of pixel i belonging to theforeground class and p i is the probability of pixel belongingto background class. p i is for foreground and 0 for back-ground and conversely g takes values of for backgroundand 0 for foreground.The Dice loss ( L DSC ), for C classes, can therefore bedeﬁned as: L DSC = C (cid:88) c =1 (1 − DSC ) . (11)Other variants of the Dice loss include the GeneralisedDice loss [36], [37] where the class weights are corrected bythe inverse of their volume, and the Generalised WassersteinDice loss [38], which combines the Wasserstein metric withthe Dice loss and is adapted for dealing with hierarchicaldata, such as the BraTS20 dataset [27]. Even in its most simple formulation, the Dice loss ispartially robust to class imbalance, with equal weightingprovided to each class. D. TVERSKY LOSS

The Tversky index [39] is closely related to the Dice score,but enables optimisation for output imbalance by altering theweights assigned to false positives and false negatives. In itsmost general form it is equivalent to the Eq. (9), but is mostcommonly used by setting δ = : TI = (cid:80) Ni =1 p i g i (cid:80) Ni =1 p i g i + (cid:80) Ni =1 p i g i + (cid:80) Ni =1 p i g i , (12)To use TI as a loss function, we deﬁne the Tversky loss, L T , for C classes as: L T = C (cid:88) c =1 (1 − TI ) . (13) VOLUME x, 2021

5. Yeung et al. : A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image Segmentation

When the Dice loss function is applied to highly class im-balanced problems, the resulting segmentation often exhibitshigh precision but low recall rate [39]. By assigning a greaterweight to false negatives, recall rate is improved leading to abetter balance of precision and recall.

E. FOCAL TVERSKY LOSS

Analogous to the way the Focal loss adapts the cross entropyloss to focus on harder examples, the Focal Tversky loss [40]adapts the Tversky loss by down-weighting easy to classifyregions in favour of more difﬁcult regions.Using the deﬁnition of TI from Eq. (12), we can deﬁne theFocal Tversky loss ( L FT ) as: L FT = C (cid:88) c =1 (1 − TI ) γ , (14)where higher values of γ increases the degree of focusing onharder examples (Fig. 2b) and simpliﬁes to the Tversky losswhen γ = 1 . F. COSINE TVERSKY LOSS

Inspired by results from [22], we test another variant of theTversky loss, closely related to the Cosine Dice loss proposedin [41]. Here, we deﬁne the Cosine Tversky loss ( L cosT ),again using the TI from Eq. (12): L cosT = C (cid:88) c =1 cos γ (cid:18) π · TI (cid:19) , (15)where γ is analogous to the focal parameters in the Focal lossand Focal Tversky loss (Fig. 2c). G. COMBO LOSS

The Combo loss [20] belongs to the class of compoundlosses, where multiple loss functions are minimised in uni-son. The Combo loss ( L combo ) is deﬁned as a weighted sumof the Dice similarity coefﬁcient in Eq. (10) and a modiﬁedform of the cross entropy loss ( L mCE ): L combo = α ( L mCE ) − (1 − α ) · DSC , (16)where: L mCE = − N N (cid:88) i =1 β (cid:0) t i − log ( p i ) (cid:1) +(1 − β ) (cid:2) (1 − t i ) ln (1 − p i ) (cid:3) (17)and α in the range of [0 , controls the relative weightingof the Dice and cross entropy terms contribution to the loss,and β controls the relative weights assigned to false positivesand negatives. A value of β > penalises false negativepredictions more than false positives.For our experiments, we use a simpliﬁed, multi-class vari-ant of the Combo loss: L combo = L CCE − DSC . (18) Firstly, we assign equal weights to the Dice and crossentropy loss, which is equivalent to the optimal value of α = [20]. Secondly, we use the standard cross entropyloss, given that the optimal value of β is dependent on thedataset used.Confusingly, the term “Dice and cross entropy loss” hasbeen used to refer to both the sum of cross entropy lossand DSC [20], [29], as well as the sum of the cross entropyloss and Dice loss, such as in the Dice Focal loss [42], [43].Here, we decide to use the former implementation, which isconsistent with both Combo loss and the loss function usedin the state-of-the-art for the KiTS19 dataset [29]. III. MATERIALS AND METHODS

A. DATASET DESCRIPTIONS

1) KiTS19 dataset

Kidney tumour segmentation is a challenging task due tothe widespread presence of hypodense tissue, as well ashighly heterogeneous appearance of tumours on CT [44],[45]. To evaluate our loss functions, we select the KidneyTumour Segmentation 2019 (KiTS19) dataset [26], a highlyclass imbalanced, multi-class problem. Brieﬂy, this datasetconsists of arterial phase abdominal CT scans frompatients who underwent partial removal of the tumour andsurrounding kidney or complete removal of the kidney in-cluding the tumour at the University of Minnesota MedicalCenter, USA. Kidney and tumour boundaries were manuallydelineated by two students, with class labels of either kid-ney, tumour or background assigned to each voxel resultingin a semantic segmentation task [26]. scans and theirassociated segmentations are provided for training, with thesegmentation masks for the other scans withheld frompublic access for testing. We therefore exclude the 90 scanswithout segmentation masks, and further exclude another 6scans (case 15, 23, 37, 68, 125 and 133) due to concern overground truth quality [46], leaving scans for use.

2) BraTS20 dataset

To assess for generalisation, we further evaluate the top threeperforming loss functions on the Brain Tumour Segmentation2020 (BraTS20) dataset [27], [47], [48]. This is currentlythe largest, publicly available and fully-annotated dataset formedical image segmentation, and comprises of multi-modal scans of patients with either low-grade glioma or high-grade glioblastoma. Whilst kidney tumours are well visu-alised on CT scans, MRI is better suited for brain tumours.The BraTS20 dataset provides images for the followingMRI sequences: T1-weighted (T1), T1-weighted contrast-enhanced using gadolinum contrast agents (T1-CE), T2-weighted (T2) and ﬂuid attenuated inverse recovery (FLAIR)sequence. Images were manually annotated, with regionsassociated with the tumour labelled as: necrotic and non-enhancing tumour core, peritumoural oedema or gadolinum-enhancing tumour. From the 494 scans provided, 125 scansare used for validation with reference segmentation maskswithheld from public access, and therefore are excluded. VOLUME x, 2021 . Yeung et al. : A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image Segmentation

We further exclude T1, T2 and FLAIR sequences to focuson gadolinum-enhancing tumour segmentation using the T1-CE sequence [49], [50], which not only appears to be thethe most difﬁcult class to segment [51], but is also themost clinically relevant for radiation therapy [52]. We furtherexclude another scans without enhancing tumour regions,leaving scans for use. B. THE PROPOSED MIXED FOCAL LOSS

Combo loss [20] and Dice Focal loss [42] are two compoundloss functions that inherit beneﬁts from both Dice-based andcross entropy-based loss functions. The Combo loss is betteradapted to handle output imbalance, with a modiﬁable β parameter in its cross entropy component loss. However, theCombo loss lacks an equivalent tunable parameter for its Dicecomponent loss, and neither Dice nor cross entropy loss areadapted to handle highly class imbalanced inputs. In contrast,the Dice Focal loss is better adapted to handle input imbal-ance, with its Focal parameter in the Focal loss component.However, similar to the Combo loss, its Dice component isnot adapted to handling highly class imbalanced data.Here, we propose a novel compound loss function, namelythe Mixed Focal loss function, which involves further modiﬁ-cations of Dice-based and cross entropy based loss functions,incorporating tunable parameters to handle output imbalance,as well as focal parameters to handle input imbalance, forboth the Dice and cross entropy-based component losses.Firstly, to provide the Dice component of the loss witha parameter to optimise the weighting of false positive andfalse negative predictions, we deﬁne a modiﬁed Dice lossusing Eq. (9): L mD = C (cid:88) c =1 (1 − mDSC ) , (19)where the parameter δ in Eq. (9) controls the relative contri-bution of false positive and false negative predictions to theloss.Using this formulation, we can combine the modiﬁed Diceloss with the modiﬁed cross entropy loss function (17) todeﬁne a modiﬁed Combo loss ( L mCombo ): L mCombo = α ( L mCE ) − (1 − α ) · ( L mD ) , (20)where the parameters β in Eq. (17) and δ in Eq. (9) controlthe weights of the false positive and false negatives for themodiﬁed cross entropy and modiﬁed Dice loss, respectively.Whilst this enables tuning for output imbalance, the stan-dard Dice and cross entropy losses are maladapted for han-dling highly class imbalanced inputs, whereas loss functionsusing the focal parameter γ appear more suitable. Therefore,next we add separate focal parameters to both the modiﬁedcross entropy loss and modiﬁed Dice loss, to produce themodiﬁed Focal loss ( L mF ) and modiﬁed Focal Dice loss( L mFD ) respectively: L mF = − α (1 − p t ) γ · L mCE , (21) L mFD = C (cid:88) c =1 (1 − mD ) γ , (22)Using these equations, we deﬁne the Mixed Focal loss( L MF ) as the weighted sum of the modiﬁed Focal loss andmodiﬁed Focal Dice loss: L MF = λ L mF + (1 − λ ) L mFD , (23)where λ ∈ [0 , and determines the relative weighting of thetwo component loss functions.To enable a fair comparison with the simpliﬁed Comboloss in Eq. (18), we implement a simpliﬁed, Categoricalvariant of the Mixed Focal loss ( L CMF ) where equal weightsare assigned to the component losses, with parameters chosento equate the modiﬁed Focal Dice loss to the Focal Tverskyloss, and modiﬁed Focal loss to the categorical Focal lossEq. (7): L CMF = L FT + L CF . (24) C. EXPERIMENTAL SETUP

For our experiments, we make use of the Medical Image Seg-mentation with Convolutional Neural Networks (MIScnn)open-source Python library [43].For both the KiTS19 and BraTS20 dataset, imagesand ground truth segmentation masks are provided in ananonymised NIfTI ﬁle format. For the KiTS19 dataset, theoriginal image resolution is × in the axial plane, withan average of slices in coronal plane. Pixel values arenormalised to [0 , using the z -score, Hounsﬁeld units (HU)are clipped to [ − , HU and voxel spacing resampled to . × . × . . We perform patch-wise analysis usingrandom patches of size of × × and patch-wiseoverlap of × × , with a batch size of . For our modelarchitecture, we use the standard 3D U-Net as described in[15] with a ﬁnal softmax activation layer.For the BraTS20 dataset, the original image resolution is × × . The provided data is already pre-processed,with the skull stripped and images interpolated to the sameresolution of mm . We further normalise the pixel valuesto [0 , using the z -score. We perform patch-wise analysisusing random patches of size of × × and patch-wiseoverlap of × × , with a batch size of . We use thesame model architecture as for the KiTS19 dataset. D. IMPLEMENTATION DETAILS

For both the KiTS19 and BraTS20 datasets, we performﬁve-fold cross validation on remaining cases after exclusion.Since all scans belong to unique individuals, we perform asingle random assignment of scans to each fold and use theresulting conﬁguration to evaluate all loss functions.We evaluate the following loss functions: Focal loss, Diceloss, Tversky loss, Focal Tversky loss, Cosine Tversky loss,Combo loss and Mixed Focal loss. We set α = 0 . and VOLUME x, 2021

7. Yeung et al. : A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image Segmentation γ = 2 for Focal loss as in [35]. In contrast, we use γ = 4 / for Focal Tversky loss as in [40] and use γ = 1 for CosineTversky loss. For the Mixed Focal loss, we use the sameparameters as for the individual Focal loss and Focal Tverskyloss.Model parameters are initialised randomly, and we againmake use of use MIScnn that leverages the ‘batchgenerators’library to perform the following data augmentations: rotation,mirroring, brightness, contrast, gamma, elastic deformationand Gaussian noise.For the KiTS19 dataset, we train each model for epochs with iterations per epoch, using an Adam op-timiser [53] with an initial learning rate of . × − and minimum learning rate of . × − , and use batchshufﬂing after each epoch. To account for the larger datasetsize of the BraTS20 dataset, here we train each model for epochs and iterations per epoch. Validation loss isevaluated after each epoch, and the model with the lowestvalidation loss is selected as the ﬁnal model. All experimentsare programmed using Keras with TensorFlow backend andtrained using NVIDIA P100 GPUs. Source code is availableat: https://github.com/mlyg/mixed-focal-loss. E. EVALUATION METRICS

To assess segmentation accuracy, we use three commonlyused metrics [54]: Dice similarity coefﬁcient (DSC), recalland precision. DSC is deﬁned as Eq. (8), and recall andprecision are deﬁned similarly per voxel and according toEqs. (25) and (26), respectively:Recall = TPTP + FN , (25)Precision = TPTP + FP . (26) F. STATISTICS

To provide a statistical comparison of loss function per-formance, we perform pair-wise Wilcoxon rank sum testscomparing kidney and tumour DSC validation scores forthe KiTS19 dataset, and enhancing tumour DSC validationscore for the BraTS20 dataset. To account for multiple com-parisons, p -values are adjusted using the Holm-Bonferronimethod [55]. Statistical tests were implemented using theSciPy package, and p -value adjustments with the ‘statsmod-els’ package. IV. EXPERIMENTAL RESULTS

In this section, we ﬁrst describe the results for the KiTS19dataset, and then for the BraTS20 dataset. The results for theKiTS19 dataset are shown in Table 1.Our proposed loss function, the Mixed Focal loss, outper-formed all other loss functions with a score of . ± . and . ± . for DSC kidney and DSC tumour, re-spectively. Furthermore, the Mixed Focal loss is associatedwith the highest recall score for both kidney and tumour,and with similarly strong performance for precision score. Despite a poor DSC tumour score, Focal loss was associatedwith the highest precision score for kidney segmentation. Fortumour segmentation, the highest precision scores were seenwith both Dice loss and Combo loss. Despite high precisionscores for the Dice loss, Combo loss and Focal loss, these lossfunctions were associated with poorer recall scores, and sub-sequently lower DSC values. In contrast, higher recall scoreswere obtained by the Tversky loss and its variants acrossboth kidney and tumour segmentations, although this wasbalanced by lower precision scores. Comparisons betweenthe Tversky loss variants showed that the Focal Tverskyloss performed the best across all metrics, whilst the CosineTversky loss was generally the worst, only outperforming theTversky loss for tumour recall score. Comparing compoundlosses with their component losses, besides equivalent scoresfor tumour precision, the Combo loss outperformed the Diceloss across all other metrics. Similarly, the Mixed Focal lossoutperformed both the Focal Tversky loss and Focal loss,except for the kidney precision score. Finally, comparisonsbetween the two compound losses showed better recall-precision balance with the Mixed Focal loss, outperformingthe Combo loss for both the DSC and recall metrics.Results from statistical comparisons using the Wilcoxonrank sum test for the KiTS19 dataset are shown in Table 2.For kidney DSC values, the Mixed Focal loss is the only lossfunction which performed signiﬁcantly better than Tverskyloss ( p = 5 . × − ). The Cosine Tversky loss was as-sociated with the lowest kidney DSC, and signiﬁcantly betterperformance was seen with the Focal loss ( p = 4 . × − ),Focal Tversky loss ( p = 1 . × − ), Combo loss ( p =2 . × − ) and Mixed Focal loss ( p = 2 . × − ). Fortumour DSC, Focal Tversky loss ( p = 8 . × − ), Comboloss ( . × − ) and Mixed Focal loss ( p = 1 . × − )performed signiﬁcantly better than the Focal loss.Examples of image segmentations for the KiTS19 datasetare shown in Fig. 3. Whilst kidney segmentations are gener-ally similar, the Focal loss kidney segmentation is noticeablydifferent from the other loss functions, with an apparentover-prediction of the kidney class. This is an expectedconsequence of the over-representation of the larger kidneyclass in the loss. On the other hand, tumour segmentationquality is noticeably different amongst all loss functions. Thetumour appears under-segmented with the Focal loss, againresulting from under-representation of the smaller class andreﬂects its higher precision but lower recall score. In contrast,the tumour appears over-segmented with Tversky variantloss functions, in agreement with the higher recall but lowerprecision scores observed. The segmentation resulting fromtraining with compound loss functions produces the mostaccurate tumour shape, with the highest quality segmentationseen with Mixed Focal loss, followed by Combo loss.Based on the results from the KiTS19 dataset, we selectthe Mixed Focal loss, Combo loss and Focal Tversky lossas the top three performing loss functions and evaluate theseon the BraTS20 dataset. The results are shown in Table 3. Inagreement with results from the KiTS19 dataset, the Mixed VOLUME x, 2021 . Yeung et al. : A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image Segmentation

TABLE 1: Performance on KiTS19 dataset. Values are in the form mean ± conﬁdence intervals. Numbers in boldface denote the highestvalues for each metric. Loss function DSC kidney Precision kidney Recall kidney DSC tumour Precision tumour Recall tumour

Focal loss 0.940±0.020

TABLE 2: Matrix of adjusted p -values from pairwise Wilcoxon rank sum scores for KiTS19 dataset kidney Dice (top) and tumour Dicescores (bottom). Row variables are compared with column variables, and positive statistic values are shaded in green, whilst negative statisticvalues are shaded in red. * p < . , ** p < . , ‡ p < . . Focalloss Diceloss Tverskyloss Cosine Tverskyloss Focal Tverskyloss Comboloss Mixed FocallossFocal loss -Dice loss 1.00 -Tversky loss 0.209 1.00 -Cosine Tversky loss 0.00468** 0.0962 1.00 -Focal Tversky loss 1.00 1.00 0.0950 0.00145** -Combo loss 1.00 1.00 0.165 0.00249** 1.00 -Mixed Focal loss 1.00 0.476 0.00541** 2.52 × × × FIGURE 3: Axial CT slices of image segmentations generated from the KiTS19 dataset using (a) ground truth, (b) Focal loss, (c) Dice loss,(d) Tversky loss, (e) Cosine Tversky loss, (f) Focal Tversky loss, (g) Combo loss, (h) Mixed Focal loss. The kidney is highlighted in red andthe tumour in blue. A magniﬁed contour of the segmentation is provided in the top right-hand corner of each image.

Focal loss was associated with the best recall-precision bal-ance, and outperformed the Focal Tversky loss and Combo loss for DSC tumour and recall tumour scores. The highesttumour precision score was seen with Combo loss, although

VOLUME x, 2021

9. Yeung et al. : A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image Segmentation

FIGURE 4: Axial MRI slices of image segmentations generated from the BraTS20 dataset using (a) ground truth, (b) Focal Tversky loss, (c)Combo loss and (d) Mixed Focal loss. Tumour is highlighted in red. A magniﬁed contour of the segmentation is provided in (e-h) below eachrespective image.TABLE 3: Performance on BraTS20 dataset. Values are in the formmean ± conﬁdence intervals. Numbers in boldface denote thehighest values for each metric. Loss function DSC tumour Precision tumour Recall tumour

Focal Tversky loss 0.747±0.050 0.776±0.070 0.765±0.029Combo loss 0.748±0.032

TABLE 4: Matrix of adjusted p -values from pairwise Wilcoxon ranksum scores for BraTS20 dataset tumour DSC. Row variables arecompared with column variables, and positive statistic values areshaded in green, whilst negative statistic values are shaded in red. * p < . , ** p < . , ‡ p < . . Focal Tverskyloss Comboloss Mixed FocallossFocal Tversky loss -Combo loss 0.312 -Mixed Focal loss 0.176 0.018* - this was also associated with the poorest tumour recall score.Results from statistical comparisons using the Wilcoxonrank sum test for the BraTS20 dataset are shown in Table 4.Whilst the Combo loss is associated with a slightly better tu-mour DSC than the Focal Tversky loss, the Mixed Focal lossperformed signiﬁcantly better than Combo loss ( p = 0 . )but not the Focal Tversky loss ( p = 0 . ). This reﬂectedgenerally better but less consistent segmentation quality withthe Focal Tversky loss than the Combo loss.Examples of image segmentations for the BraTS20 datasetare shown in Fig. 4. Image segmentations are similarly highquality for all three loss functions. Whilst the differencesbetween the Focal Tversky loss and the Combo loss are moresubtle, a higher segmentation quality with the Mixed Focal loss is apparent. V. DISCUSSION AND CONCLUSIONS

In this study, we proposed a new compound loss function,the Mixed Focal loss, which is adapted to handle both inputand output imbalances in semantic image segmentation tasks.The difference in model performance across the numerousloss functions compared highlights the importance of lossfunction choice in class imbalanced image segmentationtasks. Comparisons of compound losses with their respectivecomponent loss functions revealed consistent improvementsacross all metrics, with the Combo loss outperforming theDice loss, and the Mixed Focal loss outperforming bothFocal Tversky loss and Focal loss. Moreover, we showed thatour proposed loss function outperformed the Combo loss,with higher DSC scores obtained across classes and datasets.These results were demonstrated for the KiTS19 dataset, amulti-class, class imbalanced dataset comprised of kidneytumour labelled CT scans, and further generalised to theBraTS20 dataset, which we adapted to solve a binary, highlyclass imbalanced brain tumour segmentation problem basedon T1-CE MRI scans. Therefore, we evaluated our proposedloss function for both binary and multi-class classiﬁcation,across two different modalities, sharing the common themeof class imbalance. The main metric we evaluate is the DSC,which is highest when both precision and recall scores aresimilarly high. Our results illustrated how loss functionstend to prioritise one of these component metrics over theother resulting in output imbalance, with higher precisionscores observed with cross entropy-based losses and the Diceloss, and higher recall scores associated with Tversky vari- VOLUME x, 2021 . Yeung et al. : A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image Segmentation ant losses. This is further complicated by input imbalance,where cross entropy based-losses such as the Focal loss areparticularly susceptible, as shown by poorer tumour metrics,than Dice-based losses. The improved performance using theMixed Focal loss reﬂects a better recall-precision balance,which is also robust to signiﬁcant class imbalance.There are several limitations associated with our study.Firstly, we focused our experiments on seven loss functions,only a small proportion of all the currently available lossfunctions. In particular, we did not include any boundary-based loss functions [19], [56], another class of loss functionsthat instead use distance-based metrics to optimise contoursrather than distributions or regions used by cross entropyand Dice-based losses, respectively. In favour of simplicityand fairness, we also did not optimise for hyperparameters,instead either simplifying or where possible relying on priorexperiments to select hyperparameter values.We conclude by highlighting several areas for future re-search. In this paper, we focused on simpliﬁed variants ofthe Combo loss and Mixed Focal loss, and there is scopefor further improvement with a more careful hyperparameterselection. The additional hyperparameter introduced fromcombining loss functions provides another layer of com-plexity, which controls the contributions of each componentloss function to the total loss. Furthermore, combining otherclasses of loss functions, such as the boundary-based losses,may provide complementary beneﬁt to optimisation usingdistribution and region-based loss functions. Finally, it willbe useful to experiment using the Mixed Focal loss withmore complex network architectures, to assess whether theperformance gains generalise to state-of-the-art deep learningmethods, and whether this is able to complement or evenreplace alternatives, such as training or sampling-based meth-ods for handling class imbalance.

ACKNOWLEDGMENTS

This work was partially supported by The Mark Foun-dation for Cancer Research and Cancer Research UKCambridge Centre [C9685/A25177] and the CRUK Na-tional Cancer Imaging Translational Accelerator (NCITA)[C42780/A27066]. Additional support was also provided bythe National Institute of Health Research (NIHR) CambridgeBiomedical Research Centre. The views expressed are thoseof the authors and not necessarily those of the NHS, theNIHR, or the Department of Health and Social Care.CBS acknowledges support from the Leverhulme Trustproject on ‘Breaking the non-convexity barrier’, the PhilipLeverhulme Prize, the Royal Society Wolfson Fellow-ship, the EPSRC grants EP/S026045/1, EP/T003553/1,EP/N014588/1, EP/T017961/1, the Wellcome InnovatorAward RG98755, European Union Horizon 2020 researchand innovation programmes under the Marie Skodowska-Curie grant agreement No. 777826 NoMADS and No.691070 CHiPS, the Cantab Capital Institute for the Mathe-matics of Information and the Alan Turing Institute.

REFERENCES [1] Pal, N.R., Pal, S.K.: A review on image segmentation techniques. Patternrecognition (9) (1993) 1277–1294[2] Rundo, L., Militello, C., Vitabile, S., Casarino, C., Russo, G., Midiri, M.,Gilardi, M.C.: Combining split-and-merge and multi-seed region growingalgorithms for uterine ﬁbroid segmentation in MRgFUS treatments. Med.Biol. Eng. Comput. (7) (2016) 1071–1084[3] Chen, X., Pan, L.: A survey of graph cuts/graph search based medicalimage segmentation. IEEE Rev. Biomed. Eng. (2018) 112–124[4] Khadidos, A., Sanchez, V., Li, C.T.: Weighted level set evolution based onlocal edge features for medical image segmentation. IEEE Trans. ImageProcess. (4) (2017) 1979–1991[5] Rundo, L., Militello, C., Vitabile, S., Russo, G., Sala, E., Gilardi, M.C.:A survey on nature-inspired medical image analysis: a step further inbiomedical data integration. Fundam. Inform. (1-4) (2020) 345–365[6] Wang, S., Summers, R.M.: Machine learning and radiology. Med. ImageAnal. (5) (2012) 933–951[7] Ren, T., Wang, H., Feng, H., Xu, C., Liu, G., Ding, P.: Study on theimproved fuzzy clustering algorithm and its application in brain imagesegmentation. Appl. Soft Comput. (2019) 105503[8] Wachinger, C., Golland, P.: Atlas-based under-segmentation. In: Proc.International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer (2014) 315–322[9] Ker, J., Wang, L., Rao, J., Lim, T.: Deep learning applications in medicalimage analysis. IEEE Access (2018) 9375–9389[10] Rueckert, D., Schnabel, J.A.: Model-based and data-driven strategies inmedical image computing. Proc. IEEE (1) (2019) 110–124[11] Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networksfor biomedical image segmentation. In: Proc. International Conferenceon Medical Image Computing and Computer-Assisted Intervention (MIC-CAI), Springer (2015) 234–241[12] Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A deep convolu-tional encoder-decoder architecture for image segmentation. IEEE Trans.Pattern Anal. Mach. Intell. (12) (2017) 2481–2495[13] Liu, L., Cheng, J., Quan, Q., Wu, F.X., Wang, Y.P., Wang, J.: A surveyon U-shaped networks in medical image segmentations. Neurocomputing (2020) 244–258[14] Rundo, L., Han, C., Nagano, Y., et al.: USE-Net: incorporating squeeze-and-excitation blocks into U-Net for prostate zonal segmentation of multi-institutional MRI datasets. Neurocomputing (2019) 31–43[15] Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.:3d u-net: learning dense volumetric segmentation from sparse annotation.In: Proc. International Conference on Medical Image Computing andComputer-Assisted Intervention (MICCAI), Springer (2016) 424–432[16] Schlemper, J., Oktay, O., Schaap, M., Heinrich, M., Kainz, B., Glocker,B., Rueckert, D.: Attention gated networks: learning to leverage salientregions in medical images. Med. Image Anal. (2019) 197–207[17] Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: Fully convolutional neuralnetworks for volumetric medical image segmentation. In: Proc. FourthInternational Conference on 3D Vision (3DV), IEEE (2016) 565–571[18] Liu, Y., Yang, G., Hosseiny, M., Azadikhah, A., Mirak, S.A., Miao, Q.,Raman, S.S., Sung, K.: Exploring uncertainty measures in bayesian deepattentive neural networks for prostate zonal segmentation. IEEE Access (2020) 151817–151828[19] Kervadec, H., Bouchtiba, J., Desrosiers, C., Granger, E., Dolz, J., Ayed,I.B.: Boundary loss for highly unbalanced segmentation. In: Proc. In-ternational Conference on Medical Imaging with Deep Learning (MIDL),PMLR (2019) 285–296[20] Taghanaki, S.A., Zheng, Y., Zhou, S.K., Georgescu, B., Sharma, P., Xu,D., Comaniciu, D., Hamarneh, G.: Combo loss: Handling input and outputimbalance in multi-organ segmentation. Comput. Med.Imaging Graph. (2019) 24–33[21] Roth, H.R., Lu, L., Farag, A., Shin, H.C., Liu, J., Turkbey, E.B., Summers,R.M.: Deeporgan: Multi-level deep convolutional networks for automatedpancreas segmentation. In: International conference on medical imagecomputing and computer-assisted intervention, Springer (2015) 556–564[22] Mun, J., Jang, W., Sung, D.J., Kim, C.: Comparison of objective functionsin cnn-based prostate magnetic resonance image segmentation. In: Proc.International Conference on Image Processing (ICIP), IEEE (2017) 3859–3863[23] Litjens, G., Toth, R., van de Ven, W., Hoeks, C., Kerkstra, S., vanGinneken, B., et al.: Evaluation of prostate segmentation algorithms forMRI: the PROMISE12 challenge. Med. Image Anal. (2) (2014) 359–373 VOLUME x, 2021

11. Yeung et al. : A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image Segmentation [24] Eskildsen, S.F., Coupé, P., Fonov, V., Manjón, J.V., Leung, K.K., Guizard,N., Wassef, S.N., Østergaard, L.R., Collins, D.L., Initiative, A.D.N., et al.:BEaST: brain extraction based on nonlocal segmentation technique. Neu-roImage (3) (2012) 2362–2373[25] Jadon, S.: A survey of loss functions for semantic segmentation. In:Proc. Conference on Computational Intelligence in Bioinformatics andComputational Biology (CIBCB), IEEE (2020) 1–7[26] Heller, N., Sathianathen, N., Kalapara, A., Walczak, E., Moore, K., Kaluz-niak, H., Rosenberg, J., Blake, P., Rengel, Z., Oestreich, M., et al.: TheKiTS19 challenge data: 300 kidney tumor cases with clinical context.arXiv preprint arXiv:1904.00445 (2019)[27] Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K.,Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al.: Themultimodal brain tumor image segmentation benchmark (BRATS). IEEETrans. Med. Imaging (10) (2014) 1993–2024[28] Koﬂer, F., Berger, C., Waldmannstetter, D., Lipkova, J., Ezhov, I., Tetteh,G., Kirschke, J., Zimmer, C., Wiestler, B., Menze, B.H.: BraTS toolkit:Translating BraTS brain tumor segmentation algorithms into clinical andscientiﬁc practice. Front. Neurosci. (2020)[29] Isensee, F., Petersen, J., Klein, A., Zimmerer, D., Jaeger, P.F., Kohl, S.,Wasserthal, J., Koehler, G., Norajitra, T., Wirkert, S., et al.: nnU-net: Self-adapting framework for U-net-based medical image segmentation. arXivpreprint arXiv:1809.10486 (2018)[30] Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: nnU-net: a self-conﬁguring method for deep learning-based biomedical imagesegmentation. Nat. Methods (2020)[31] Fatemeh, Z., Nicola, S., Satheesh, K., Eranga, U.: Ensemble U-net-basedmethod for fully automated detection and segmentation of renal masses oncomputed tomography images. Med. Phys. (9) (2020) 4032–4044[32] Kim, T., Lee, K., Ham, S., Park, B., Lee, S., Hong, D., Kim, G.B., Kyung,Y.S., Kim, C.S., Kim, N.: Active learning for accuracy enhancement ofsemantic segmentation with CNN-corrected label curations: Evaluation onkidney segmentation in abdominal CT. Sci. Rep. (1) (2020) 1–7[33] Kamnitsas, K., Ledig, C., Newcombe, V.F., Simpson, J.P., Kane, A.D.,Menon, D.K., Rueckert, D., Glocker, B.: Efﬁcient multi-scale 3D CNNwith fully connected CRF for accurate brain lesion segmentation. Med.Image Anal. (2017) 61–78[34] Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio,Y., Pal, C., Jodoin, P.M., Larochelle, H.: Brain tumor segmentation withdeep neural networks. Med. Image Anal. (2017) 18–31[35] Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for denseobject detection. In: Proc. International Conference on Computer Vision(ICCV), IEEE (Oct 2017)[36] Crum, W.R., Camara, O., Hill, D.L.G.: Generalized overlap measures forevaluation and validation in medical image analysis. IEEE Trans. Med.Imaging (11) (2006) 1451–1461[37] Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Cardoso, M.J.: Gener-alised Dice overlap as a deep learning loss function for highly unbalancedsegmentations. In: Deep Learning in Medical Image Analysis and Multi-modal Learning for Clinical Decision Support. Springer (2017) 240–248[38] Fidon, L., Li, W., Garcia-Peraza-Herrera, L.C., Ekanayake, J., Kitchen, N.,Ourselin, S., Vercauteren, T.: Generalised wasserstein dice score for im-balanced multi-class segmentation using holistic convolutional networks.In: Proc. International MICCAI Brainlesion Workshop, Springer (2017)64–76[39] Salehi, S.S.M., Erdogmus, D., Gholipour, A.: Tversky loss function forimage segmentation using 3D fully convolutional deep networks. In:Proc. International Workshop on Machine Learning in Medical Imaging,Springer (2017) 379–387[40] Abraham, N., Khan, N.M.: A novel focal Tversky loss function withimproved attention U-Net for lesion segmentation. In: Proc. 16th Interna-tional Symposium on Biomedical Imaging (ISBI), IEEE (2019) 683–687[41] Chen, W., Zhang, Y., He, J., Qiao, Y., Chen, Y., Shi, H., Wu, E.X., Tang,X.: Prostate segmentation using 2D bridged U-net. In: Proc. InternationalJoint Conference on Neural Networks (IJCNN), IEEE (2019) 1–7[42] Zhu, W., Huang, Y., Zeng, L., Chen, X., Liu, Y., Qian, Z., Du, N., Fan, W.,Xie, X.: AnatomyNet: Deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy. Med. Phys. (2) (2019)576–589[43] Müller, D., Kramer, F.: MIScnn: A framework for medical image seg-mentation with convolutional neural networks and deep learning. arXivpreprint arXiv:1910.09308 (2019) [44] Linguraru, M.G., Yao, J., Gautam, R., Peterson, J., Li, Z., Linehan, W.M.,Summers, R.M.: Renal tumor quantiﬁcation and classiﬁcation in contrast-enhanced abdominal CT. Pattern Recognit. (6) (2009) 1149–1161[45] Rundo, L., Beer, L., Ursprung, S., Martin-Gonzalez, P., Markowetz, F.,Brenton, J.D., Crispin-Ortuzar, M., Sala, E., Woitek, R.: Tissue-speciﬁcand interpretable sub-segmentation of whole tumour burden on CT imagesby unsupervised fuzzy clustering. Comput. Biol. Med. (2020) 103751[46] Heller, N., Isensee, F., Maier-Hein, K.H., Hou, X., Xie, C., Li, F., Nan,Y., Mu, G., Lin, Z., Han, M., et al.: The state of the art in kidney andkidney tumor segmentation in contrast-enhanced ct imaging: Results ofthe KiTS19 challenge. Med. Image Anal. (2021) 101821[47] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J.S.,Freymann, J.B., Farahani, K., Davatzikos, C.: Advancing the cancergenome atlas glioma MRI collections with expert segmentation labels andradiomic features. Sci. Data (2017) 170117[48] Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempﬂer, M., Crimi, A., Shi-nohara, R.T., Berger, C., Ha, S.M., Rozycki, M., et al.: Identifying the bestmachine learning algorithms for brain tumor segmentation, progressionassessment, and overall survival prediction in the BRATS challenge. arXivpreprint arXiv:1811.02629 (2018)[49] Rundo, L., Tangherloni, A., Cazzaniga, P., Nobile, M.S., Russo, G.,Gilardi, M.C., et al.: A novel framework for MR image segmentation andquantiﬁcation by using MedGA. Comput. Methods Programs Biomed. (2019) 159–172[50] Han, C., Rundo, L., Araki, R., Nagano, Y., Furukawa, Y., et al.: Combiningnoise-to-image and image-to-image GANs: brain MR image augmentationfor tumor detection. IEEE Access (1) (2019) 156966–156977[51] Henry, T., Carre, A., Lerousseau, M., Estienne, T., Robert, C., Paragios,N., Deutsch, E.: Top 10 BraTS 2020 challenge solution: Brain tumorsegmentation with self-ensembled, deeply-supervised 3D-Unet like neuralnetworks. arXiv preprint arXiv:2011.01045 (2020)[52] Rundo, L., Stefano, A., Militello, C., Russo, G., Sabini, M.G., D’Arrigo,C., Marletta, F., Ippolito, M., Mauri, G., Vitabile, S., Gilardi, M.C.: A fullyautomatic approach for multimodal PET and MR image segmentation inGamma Knife treatment planning. Comput. Methods Programs Biomed. (2017) 77–96[53] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXivpreprint arXiv:1412.6980 (2014)[54] Wang, Z., Wang, E., Zhu, Y.: Image segmentation evaluation: a survey ofmethods. Artif. Intell. Rev. (8) (2020) 5637–5674[55] Holm, S.: A simple sequentially rejective multiple test procedure. Scand.J. Statist. (2) (1979) 65–70[56] Zhu, Q., Du, B., Yan, P.: Boundary-weighted domain adaptive neuralnetwork for prostate mr image segmentation. IEEE Trans. Med. Imaging (3) (2019) 753–763 VOLUME x, 2021 . Yeung et al. : A Mixed Focal Loss Function for Handling Class Imbalanced Medical Image Segmentation

MICHAEL YEUNG received his Bachelor’s de-gree in Neuroscience in 2019 from the Univer-sity of Cambridge, United Kingdom. He is cur-rently a medical student at the School of ClinicalMedicine, University of Cambridge, United King-dom. He is a Senior Whitby Scholar at DowningCollege. His research interests include Machinelearning, Radiology and Computer Vision.

EVIS SALA received her medical degree fromUniversity of Tirana, Albania, in 1991 and herPhD Degree in Epidemiology and Biostatisticsfrom the Cambridge University, UK, in 2000. Cur-rently, she is the Professor of Oncological Imagingat Cambridge University, UK, and co-leads theAdvanced Cancer Imaging and the Integrated Can-cer Medicine Programmes for the CRUK Cam-bridge Centre. Her research in the new ﬁeld ofradiogenomics has focused on understanding themolecular basis of cancer by demonstrating the phenotypic patterns thatoccur because of multiple genetic alterations that interact with the tumourmicroenvironment to drive the disease in several tumour types. She is alsoleading multiple research projects focusing on the applications of artiﬁcialintelligence methods for image reconstruction, segmentation, and data inte-gration.

CAROLA-BIBIANE SCHÖNLIEB graduatedfrom the Institute for Mathematics, University ofSalzburg, Austria, in 2004. She received her PhDdegree from the University of Cambridge in 2009.Currently, she is Professor of Applied Mathemat-ics at the Department of Applied Mathematicsand Theoretical Physics (DAMTP), University ofCambridge, United Kingdom. There, she is headof the Cambridge Image Analysis group, Directorof the Cantab Capital Institute for Mathematicsof Information, and co-Director of the EPSRC Centre for Mathematics ofInformation in Healthcare. Since 2011 she is a fellow of Jesus CollegeCambridge and since 2016 a fellow of the Alan Turing Institute, London. Hercurrent research interests focus on variational methods, partial differentialequations and machine learning for image analysis, image processing andinverse imaging problems.

LEONARDO RUNDO received the Bachelor’sand Master’s Degrees in Computer Science En-gineering from the University of Palermo, Italy,in 2010 and 2013, respectively. In 2013, he wasa Research Fellow at the Institute of MolecularBioimaging and Physiology, National ResearchCouncil of Italy (IBFM-CNR). He obtained hisPhD in Computer Science at the University ofMilano-Bicocca, Italy, in 2019. Since November2018, he is a Research Associate at the Depart-ment of Radiology, University of Cambridge, United Kingdom, tightlycollaborating with Cancer Research UK. His main scientiﬁc interests includeBiomedical Image Analysis, Machine Learning, Computational Intelligence,and High-Performance Computing.