[PDF] On Mathews Correlation Coefficient and Improved Distance Map Loss for Automatic Glacier Calving Front Segmentation in SAR Imagery

Abstract

The vast majority of the outlet glaciers and ice streams of the polar ice sheets end in the ocean. Ice mass loss via calving of the glaciers into the ocean has increased over the last few decades. Information on the temporal variability of the calving front position provides fundamental information on the state of the glacier and ice stream, which can be exploited as calibration and validation data to enhance ice dynamics modeling. To identify the calving front position automatically, deep neural network-based semantic segmentation pipelines can be used to delineate the acquired SAR imagery. However, the extreme class imbalance is highly challenging for the accurate calving front segmentation in these images. Therefore, we propose the use of the Mathews correlation coefficient (MCC) as an early stopping criterion because of its symmetrical properties and its invariance towards class imbalance. Moreover, we propose an improvement to the distance map-based binary cross-entropy (BCE) loss function. The distance map adds context to the loss function about the important regions for segmentation and helps accounting for the imbalanced data. Using Mathews correlation coefficient as early stopping demonstrates an average 15% dice coefficient improvement compared to the commonly used BCE. The modified distance map loss further improves the segmentation performance by another 2%. These results are encouraging as they support the effectiveness of the proposed methods for segmentation problems suffering from extreme class imbalances.

Full PDF

11 On Mathews Correlation Coefﬁcient and ImprovedDistance Map Loss for Automatic Glacier CalvingFront Segmentation in SAR Imagery

Amirabbas Davari*, Saahil Islam*, Thorsten Seehaus, Matthias Braun, Andreas Maier, Vincent Christlein

Abstract —The vast majority of the outlet glaciers and icestreams of the polar ice sheets end in the ocean. Ice mass lossvia calving of the glaciers into the ocean has increased overthe last few decades. Information on the temporal variability ofthe calving front position provides fundamental information onthe state of the glacier and ice stream, which can be exploitedas calibration and validation data to enhance ice dynamicsmodeling. To identify the calving front position automatically,deep neural network-based semantic segmentation pipelines canbe used to delineate the acquired SAR imagery. However, theextreme class imbalance is highly challenging for the accuratecalving front segmentation in these images. Therefore, we proposethe use of the Mathews correlation coefﬁcient (MCC) as an earlystopping criterion because of its symmetrical properties and itsinvariance towards class imbalance. Moreover, we propose animprovement to the distance map-based binary cross-entropy(BCE) loss function. The distance map adds context to theloss function about the important regions for segmentationand helps accounting for the imbalanced data. Using Mathewscorrelation coefﬁcient as early stopping demonstrates an average15 % dice coefﬁcient improvement compared to the commonlyused BCE. The modiﬁed distance map loss further improvesthe segmentation performance by another 2 %. These results areencouraging as they support the effectiveness of the proposedmethods for segmentation problems suffering from extreme classimbalances.

Index Terms —semantic segmentation, class imbalance, Math-ews correlation coefﬁcient (MCC), improved distance map loss

I. I

NTRODUCTION

The mass loss of the Polar ice sheets and other calvingglaciers has accelerated in the last two decades. This is oneof the main causes of global sea level rise. Currently, the sealevel contribution from thermal expansion is exceeded by thecontribution from ice sheets, glaciers, and ice caps [1]. Theretreat of the glacier fronts can produce a re-enforcing effect ofice loss. For example, the retreat from the pinning points canreduce the lateral buttressing forces, which further destabilizethe glaciers’ ice discharge into the ocean [2]. A signiﬁcantfrontal retreat can further cause a retreat of the grounding zoneleading to further destabilization of the glacier ﬂow [3]. Thismakes the calving front, the region where the glacial ice breaks * AmirAbbas Davari and Saahil Islam contributed equally to this work.A. Davari, S. Islam, A. Maier and V. Christlein are with the ComputerScience department at Friedrich-Alexander University Erlangen-N¨urnberg,91058 Erlangen, Germany (email: [email protected]).T. Seehaus and M. Braun are with the Geography & Geosciences depart-ment at Friedrich-Alexander University Erlangen-N¨urnberg, 91058 Erlangen,Germany. off into the ocean. Hence, the calving front position (CFP)is an important variable for monitoring, but also ice dynamicmodeling.The detection of the calving front locations is commonlycarried out using optical imagery or Synthetic Aperture Radar(SAR) imagery. Cloud covers and polar nights affect the opticalimagery and the data coverage is strongly curbed. On the otherhand, SAR acquisitions are unaffected by such situations andcan be carried out all year round, making SAR imagery moreadvantageous for monitoring purposes in polar regions. Thecommon approach to detect CFPs has been visual analysis andmanual delineation on the obtained SAR images. However,manual digitization is tedious, subjective, expensive, and time-consuming. Moreover, sea ice and calved-off icebergs in theoceans make it often difﬁcult to reliably differentiate betweenice mass and these chunks of ﬂoating ice, making manualdelineation subjective. This leads to a lack of temporallyupdated data on the position of the calving front, which isconsidered an essential climate variable (ECV). Automaticdetection of CFPs would close this monitoring gap. In the past,various algorithms of automatic and semi-automatic detectionof the calving front positions had been proposed. Edge detectionalgorithms such as Roberts edge extractor has been applied toERS-1 SAR images by Sohn et al. [4]. Seale et al. [5] haveexploited Sobel ﬁlter and brightness proﬁling to detect theCFPs on daily MODIS images. A detailed overview of variousautomatic and semi-automatic approaches on this applicationis provided by Baumhoer et al. [6].Deep learning-based methods such as Convolutional NeuralNetworks (CNNs) [7] are a better alternative to the traditionalautomatic delineation of CFPs. Over the past two decades,CNNs have been seen to outperform the above-mentioned man-ually designed ﬁlter-based methods and many such traditionalcomputer vision techniques in the ﬁeld of image processing [8].The CNN architecture (the type and number of layers andtheir order) is chosen depending on the application and thetraining data. The parameters in a CNN are ﬁne-tuned basedon optimizing a predeﬁned objective function, also referred toas cost function or loss function.Deep convolutional networks can be used on SAR imagesto automatically classify the image pixels, belonging tosemantically different regions, into distinct classes. Such a taskis referred to as semantic segmentation. Architectures such asSegNet [9] and U-Net [10] are commonly used for semanticsegmentation applications [7]. The use of the U-Net architecturefor SAR image segmentation was ﬁrst exploited by Mohajerani a r X i v : . [ ee ss . I V ] F e b et al. [11] on Landsat-5, 7, and 8 images to automatically detectcalving fronts. Zhang et al. [12] have used U-Net on higherresolution TerraSAR-X images of Jakobshavn Isbræ glaciers toseparate the whole ice m´elange regions from non-ice m´elangeregions, which are subsequently post-processed to manuallydelineate the calving fronts.Despite being powerful learning tools, the high numberof parameters in Deep Neural Networks (DNNs) comeswith the risk of over-ﬁtting to the training data [13], i. e.,it “memorizes” the non-predictive features of the trainingdata instead of “learning” its underlying distribution In thispaper, one contribution focuses on proper exploitation of earlystopping to prevent the network from over-ﬁtting [14].Early stopping suggests stopping training based on the erroron independent validation data to maintain high performance onnew unseen data from the same distribution. Since the traininggets terminated based on the error on the validation data, itis crucial to have the right performance metric as a stoppingcriterion. The traditional method of early stopping comprisesmonitoring the loss [15]. However, the signiﬁcance of theperformance metric increases in cases of class-imbalanceddata [16]. An imbalanced dataset comprises classes with anunequal number of class elements. The majority of performancemetrics are challenged in the presence of severely imbalanceddata. For example, accuracy is not an appropriate metric inapplications with class-imbalance as incorrect classiﬁcationsof the minority class could be overlooked. On the other hand,other metrics such as f1-score greatly diminishes the effectof class imbalance. Mathews Correlation Coefﬁcient (MCC)is theoretically argued to have a higher tolerance over classimbalance than f1-score and confusion entropy [17], [18]and was for example used to compare between a vast rangeof classiﬁers for evaluation of models on micro-array geneexpression and genotyping data [19]. In this paper, we aresuggesting (a) the use of early stopping to prevent over-ﬁttingand (b) using MCC instead of the classical BCE loss as stoppingcriterion.While early stopping handles over-ﬁtting with the rightperformance metric, it is also crucial to objectively handlethe learning of the network on the imbalanced dataset. Theimbalanced dataset introduces a bias towards the majority classduring the training of the network while the minority class isthe point of interest in many real-world problems. Commonsolutions are for example data re-sampling [20], cost-sensitivelearning, one class learning, or ensemble learning methods [21].While these methods could be effective, they do not guaranteeto be so in all kinds of tasks. For example, over-samplingundergoes a risk of early over-ﬁtting because the network usesmany samples more than once while under-sampling reducesthe size of the training data. Artiﬁcially balancing the classdistributions was not very successful in some tasks [22] andmany ensemble-based methods are resource-demanding andtime-consuming [21].Another way of cost-sensitive learning is to weigh the mi-nority class higher to punish miss-classiﬁcation. In imbalancedsegmentation tasks, the loss function can also be trained topenalize more the contour of the segmentation masks thanthe regions. Kervadec et al. [23] used such a boundary-based loss function for medical image segmentation. They arguedthat region integrals of the segmentation masks highly varyover the classes and instead trained on the space of contours.Another distance-based loss approach was stated to segmentknee bones in 3d MRI [24]. However, they employed theidea of a distance map-based loss because most errors insegmentation are found to be at the edges of the segmentedmasks. A distance map-based loss was also used by Adhikari et al. [25] to detect forest trails in optical images of forests,which cover only small regions in the images. Therefore, thetrails were weighted higher in the loss function while the otherregions were weighted less based on the distance to the foresttrails.In this paper, we show that the use of MCC as an earlystopping criterion is superior to a binary cross-entropy (BCE)monitored network. The performance of the network is furtherenhanced by an improved distance map-based BCE loss andis compared with (a) the traditional BCE loss and (b) aweighted BCE loss. The rest of this paper is organized asfollows: Section II reviews the tools that have been used inour contributions, i. e., Mathews correlation coefﬁcient, and thedistance map loss. Section III describes our proposed pipelineand explains the main contributions of this paper in detail. Next,the dataset description, experimental setup, and the quantitativeand qualitative results are presented and discussed in Section IV.Finally, Section V concludes the paper.II. T HEORETICAL B ACKGROUND

A. Mathews Correlation Coefﬁcient

Binary cross-entropy is a popular loss function in binaryclassiﬁcation scenarios. A good loss is usually supposed notto produce very small or extremely large gradients, which isone of the desirable properties of the BCE loss [26]. We couldchoose to use the validation BCE loss also as an early stoppingcriterion. However, BCE loss is very fragile in the presence ofclass-imbalance [27]. Hence, it becomes an unfavorable earlystopping criterion for class-imbalanced segmentation.On the other hand, MCC represents true positives (TP), falsenegatives (FN), false positives (FP) and true negatives (TN) ina balanced manner [28]. It is balanced in the sense that therate of positive class predictions and negative class predictionsare given equal importance. MCC is deﬁned as:MCC = TP × TN − FP × FN (cid:112) ( TP + FP )( TP + FN )( TN + FP )( TN + FN ) . (1)It lies in the range [ − , with representing the perfectprediction, − denoting wrong, and suggesting randomprediction. MCC takes positives and negatives equally intoaccount and provides very little “discrimination” to randomacts [17]. B. Distance Map Loss

While BCE is the most commonly used loss function inmost deep learning-based methods, it is not favorable for class-imbalance problems. A common approach to tackle the class-imbalance with BCE loss is to weigh each class with its inverse

Fig. 1: Proposed glacier front segmentation pipeline.class frequency. However, it does not always guarantee toproduce better performance [22]. Moreover, minority classeswith higher weights can lead to noisy gradients [25]. On theother hand, a distance map-based approach adds context tothe loss functions based on the regions of importance in thesegmentation task. Distance maps, also known as distance ﬁeldsor distance transforms, are derived representations of an imagebased on a distance metric. In computer graphics, distancemaps are often used for volume rendering to generate offsetsurfaces or a blending between surface models [29].A distance map-based loss trains the network by penalizingthe predictions with respect to its distance from the objectof interest in the image. A distance map-based loss is alsobeneﬁcial for class imbalance. By forcing the network to givemore attention to the regions of interest (minority class), itcounters the bias of the network’s training towards the majorityclass. The distance map loss works on the idea of weightingfar away pixels (from the semantic line) higher than the onesin its proximity [25]. The loss is formulated as: J DW = 1 M · N N (cid:88) j =1 M (cid:88) i =1 ((1 − g i,j ) · d i,j · p i,j − g i,j · d max · log( p i,j )) , (2)with g being the ground truth map and p the prediction atpixel location i, j . The distance map is denoted as d , with d max being the maximum distance, and M, N represent the spatialdimensions of the maps.III. M

ETHODOLOGY

A. Processing Pipeline

Figure 1 outlines our proposed segmentation workﬂow. Weperform the segmentation of calving fronts images using aU-Net architecture. The SAR images are used as input to thenetwork, which learns either to predict the zones of ice massregions or the calving front lines, depending on the groundtruth data presented to the model at training time. In the restof this paper, we refer to the ice/non-ice regions as zones andcalving front locations as front lines for convenience. In thevariant that the network is trained to predict the ice zones, the predictions are post-processed to extract the boundaries and inthis way obtain the front lines.The images are ﬁrst pre-processed to remove noise and thenpatches of a predeﬁned size were extracted from both theimages and their corresponding ground truth before they arefed to the network. Before extracting the patches, the imagesare padded with zeros to the next larger image size that is amultiple of the patch size. The predicted masks on the patchesare later stitched together to reconstruct the original size image,and the padding is discarded.

B. MCC as the Early Stopping Metric

The data is split into three subsets, namely training, valida-tion, and test set. We use early stopping to stop the trainingbefore it overﬁts. However, the validation error is usually verynoisy. Hence, to ensure that the training terminates where thevalidation performance is optimum, we pre-deﬁne a patiencenumber. This is the number of epochs the early stopperwaits to check for improvement on the validation error beforeterminating the training when no improvement is noticed. Thecommon practice is to monitor the BCE loss on the validationset. However, due to the severe class-imbalance, we proposethe use of the Mathews correlation coefﬁcient (MCC) as anearly stopping metric as it is theoretically more robust towardsclass-imbalance.

C. Modiﬁed Distance Map and Distance Map BCE Loss

We formulate the distance map in a generalized form tobe used in different loss functions. The computed distancemap weighs the line locations higher while the regions aroundthe proximity of the front lines are weighed less based onthe distance to the lines. As we move progressively awayfrom the front line locations, the weights keep reducing untilthey become . In other words, true line locations are givenhigh importance and the proximity adds relaxation to thenetwork. The points beyond which these weights become zeroare considered to be the background (majority class). In contrastto the distance map-based approach by Adhikari et al. [25],cf. Eq. (2), we weigh the background differently dependingon the class imbalance. In particular, we calculate the distancemap from the ground truth masks as follows:We ﬁrst thicken the ground truth lines by morphologicaldilation ( δ w ) [30]. Then, we use the Euclidean distancetransform (EDT), divided by a constant factor of R . Thestrength of relaxation is determined by R , i. e., how smoothlythe weights fall off from the front lines to the background. Themorphological dilation of the ground truth images thickens thefront lines based on the structuring element used. We use arectangular structuring element with a size of ( w, w ) , where w determines the amount of thickening of the lines. It is furtherpassed through a sigmoid function σ to obtain values in therange of [0 , : y dl = δ w ( y ) , (3) y edt = σ (cid:18) EDT ( y dl ) R (cid:19) , (4) where y denotes the ground truth. Finally, the distance map d is obtained by: d = y edt + k (1 − y dl ) , (5)where k is a constant in the range of (0 , determining theweight given to the background, with k = 1 signifying utmostimportance. Since we deal with high class imbalance, a highweight is not desirable. However, it is a hyper-parameter thatdepends on the dataset and its class imbalance ratio, and needsto be tuned. Various distance maps are depicted with different w , R , and k in Fig. 2. Finally, the distance map computed on(a) (b) (c)(d) (e) (f)Fig. 2: Variation of the distance maps with different parameterswhen applied on the ground truth. First image is (a) groundtruth, and rest are distance maps with (b) w = 5 , R = 4 , k = 0 . , (c) w = 15 , R = 4 , k = 0 . , (d) w = 15 , R = 8 , k = 0 . , (e) w = 5 , R = 4 , k = 0 . , and (f) w = 5 , R = 4 , k = 1 .the ground truth is used in the BCE loss to create the distancemap BCE loss. The BCE loss (J BCE ) is given by:J

BCE ( y, ˆ y ) = − N N (cid:88) i =1 ( y i log ˆ y i + (1 − y i ) log(1 − ˆ y i )) , (6)where ˆ y is the prediction, y is the ground truth, and N is thebatch size during training. Incorporating the distance map intothe BCE loss (J DBCE ) results in:J

DBCE ( y, ˆ y ) = − N N (cid:88) i =1 ( y i log(ˆ y i · d i )+ (1 − y i ) log(1 − ˆ y i · d i ) . (7)IV. E XPERIMENTAL S ETUP AND E VALUATION

A. Dataset

We perform our experiments on SAR images obtained fromsites of the Antarctic Peninsula (AP) and Greenland, depictedin Fig. 3. AP has been strongly affected by changing climaticconditions [31]. The Dinsmoor-Bombardier-Edgworth (DBE)and the Sj¨ogren-Inlet (SI) glacier had undergone a strong frontal retreat over the years [32], [33]. Jakobshavn Isbræ is one ofthe largest tidewater glaciers in Greenland and went through adrastic retreat over the last decades [34].The SAR images of the AP were calibrated and multi-lookedto remove speckle noise. Furthermore, the images were geo-coded and ortho-rectiﬁed by means of the enhanced ASTERdigital elevation model of the AP [35]. The ground truth ofthe front lines is taken from Seehaus et al. [32], [33]. Theimages vary on the basis of their spatial resolution. High spatialresolution images cover an area of 5 m × ×

50 m per pixel. Outof 244 available SAR images at the AP, randomly 50 imageswere kept for testing, another 50 were used for validation, andthe rest served for training the model. We increase the APtraining dataset three times by augmenting with horizontallyﬂipped image versions as well as by 90° rotated versions. TheJakobshavn Isbræ glacier calving front locations were providedby Zhang et al et al. [12]. The SAR images of Jakobshavn Isbræwere preprocessed with a median ﬁlter to remove speckle noise.All the images have a constant spatial resolution of 6 m × × , and the largest onecontains × pixels. Conversely, Jakobshavn Isbræimages are all approximately × pixels in size. Forboth datasets, two ground truth images are available: Thecalving front lines, and ice/non-ice zones. Figure 4 depicts asample SAR image in our dataset, along with its correspondingice/non-ice zones ground truth and its glacier calving frontground truth images. The zones have a training class-imbalanceof 1:3 in both datasets. The front line ground truth is severelyimbalanced with a ratio of 1:2097 in AP training data and aratio of 1:1124 in Jakobshavn Isbræ training data. B. Experimental Setup1) Architecture and Parameters:

We followed Zhang et al. [12] as our baseline architecture. Both the convolutionand transposed convolution layers have kernel sizes of (5 , .We use leaky ReLUs as the activation functions with a negativeslope of α = 0 . , and the pooling layers have a pooling sizeof (2 , . For regularization, batch normalization layers havebeen exploited after every convolutional layer. Furthermore,we use Adam as the optimizer [36] and a cyclic learning ratefor optimization, which is beneﬁcial to the learning of thenetwork [37]. Due to the constant changes in the network’sperformance by the cyclic training during training, we seta rather high patience value (30 epochs) for early stopping.Note that we also tried various other patience values for asubset of the experiments and the same behavior has beenobserved. Hence, we decided to use 30 for all experimentsso that we give BCE early stopping criterion enough timeto bring the validation loss down, too. We use a patch sizeof × pixels for patch extraction. The prediction ofzones differs from the prediction of lines in the context of Fig. 3: Maps of the study regions and the locations of the studied glaciers at the Northern Antarctic Peninsula on the left, andGreenland Jakobshavn on the right. Blue polygons indicate subsets used for the CNN analysis. Left panels: Landsat LIMAMosaic © USGS, NASA, BAS, NFS, Right panels: Bing Aerial Maps © Microsoft.class-imbalance. Hence, we use slightly different parametersfor these two different ground truths, which are explained inthe following experiments.

2) Experiment 1: Early Stopping Criterion: MCC vs. BCE:

To demonstrate the importance of the metric used as the earlystopping criterion, we perform the segmentation on (a) thezones and (b) on the lines, and compare the performance ofBCE and MCC as the early stopping criteria. For segmentingzones, a batch size of is used. The maximum learning rateof e − and a minimum of e − is used in the cyclic learningrate with a step size of × epochs. To extract the calving frontlines from the predicted zones, we post-process our predictionsas follows: First, the false alarms are removed by extractingthe largest connected component in the image. Then, we usecanny edge detection [38] to delineate the calving front linesfrom the rectiﬁed zones.In the second variant, where we train our model usingthe calving front lines directly, the front lines are one-pixelwide. This contributes to the major reason for the extremeclass-imbalance. To facilitate the network’s training on sucha highly imbalanced dataset, the lines were thickened usingmorphological dilation with a rectangular structuring elementof size × pixels. This reduces the class-imbalance of APDataset lines to and of Jakobshavn Isbræ to . We use a batch size of for the front lines. The cyclic learningrates parameters for the lines are e − and e − as maximumand minimum, respectively.

3) Experiment 2: Improved Distance Map BCE Loss vs. BCEloss:

After proving the fact that MCC as an early stoppingcriterion accomplishes better results, we perform our secondexperiment using MCC as an early stopping criterion. Inthe second experiment, we compare the performance of theimproved distance map BCE loss (i.e., with optimum k ) withthe conventional BCE loss, inverse class-frequency weightedBCE, and the conventional distance map BCE loss (i.e., with k = 1 ). This experiment is designed only to tackle high class-imbalance. Hence, the models are trained only using the frontline ground truth images of both the datasets. We use thesame parameters for predicting front line masks as in our ﬁrstexperiment. We search for the best parameters for the modiﬁeddistance map formulation using grid search and thereby, select w = 3 and R = 1 for both datasets. The best value of k wasfound to be . for AP and . for Jakobshavn Isbræ. C. Evaluation

We use the performance metrics, that are commonly usedfor segmentation problems [39], [7], i. e., Intersection OverUnion (IOU) and Dice coefﬁcient, to evaluate our experiments. (a)(b) (c)

Fig. 4: (a) Example of a SAR image with its correspondingground truth images: (b) ice/non-ice zones in which blackrepresents the ice and white represents the non-ice zones, and(c) glacier calving front line ground truth.We also report the MCC. IOU is the area of overlap betweenthe predicted masks p and the ground truth masks g divided bythe area of union between the predicted masks and the groundtruth masks. The Dice coefﬁcient is twice the area of overlapdivided by the total number of pixels from predictions andground truth. IOU = p ∩ gp ∪ g (8)Dice = 2 N (cid:80) i =1 p i g iN (cid:80) i =1 p i + N (cid:80) i =1 g i (9)These performance metrics solely represent how well thepredictions overlap with the ground truth masks. This makes ithard to evaluate the calving front lines because even if they arevery close to the ground truth locations, they will be consideredas a miss by these metrics. Therefore, to make these metricsmore comprehensible, we further dilate the predictions andground truth masks. The dilated pixels around the ground truthare considered as the relaxation or tolerance. Since our datasetsalso come with spatial resolution, we calculate the tolerance inmeters. The spatial resolution s denotes the actual distance onepixel covers on earth. Therefore, the tolerance t is calculatedas t = s × p/ . The tolerance scheme is depicted in Fig. 5.The metrics are reported in our tables along with the tolerance(in meters) for the predicted calving front lines.Table I shows the results when using BCE and MCC as anearly stopping criterion. Monitoring MCC clearly outperformsBCE in all cases. Prediction of zones in Jakobshavn Isbrædoes not show much improvement as BCE as early stopping Fig. 5: Tolerance scheme based on dilation of lines and thespatial resolution of imageTABLE I: Performance when using BCE and MCC as earlystopping criterion. Datasets Metrics Tolerance Early Stopping CriterionBCE MCCAP IOU 0.8264 zones Dice Coeff 0.8984

MCC 0.8764

IOU 65 m 0.1524

110 m 0.2201

155 m 0.2712

Dice Coeff 65 m 0.2617

AP 110 m 0.3549 lines 155 m 0.4196

MCC 65 m 0.3328

110 m 0.4112

155 m 0.4657

Jakobshavn IOU 0.9613

Isbræ Dice Coeff 0.9801 zones MCC 0.9725

IOU 60 m 0.3755

105 m 0.4424

150 m 0.4801

Jakobshavn Dice Coeff 60 m 0.5437

Isbræ 105 m 0.6108 lines 150 m 0.6459

MCC 60 m 0.5712

105 m 0.6273

150 m 0.6553 performed very well, having IOU and Dice score of around97 %. The reason is that the use of MCC continues training toroughly 150 epochs, while BCE loss already stops at around 50epochs. Validation BCE decides that the network is overﬁttingtoo early. On the other hand, MCC is more comprehensiveabout the validation error and correctly allows the networkto be trained further. Table I also illustrates that monitoringMCC shows improvement not only when dealing with highclass-imbalance, but also is quite reliable for less imbalanceddatasets (AP and Jakobshavn Isbræ zones). The fact that theBCE-monitored network is not trained enough can be clearlyobserved in the qualitative results that are depicted in Fig. 6.

The MCC-monitored network performs signiﬁcantly better forboth zones and lines. Apart from the fact that the calving frontlines generated from the MCC monitored network are moreaccurate, the predictions also seem to have fewer numbersof false positives (incorrect white lines on the black regions).The results of BCE Loss and improved distance map BCE (a) (b) (c)

Fig. 6: Comparing the segmentation results using BCE andMCC as the early stopping criterion on AP glaciers: (a) groundtruth, (b) segmentation maps predicted using BCE as the earlystopping criterion, (c) segmentation maps predicted using MCCas the early stopping criterion.Loss are shown in Table II. The improved distance map BCEloss is also compared with the distance map using ( k = 1) .It is interesting to notice that the distance map BCE losswith k = 1 did not improve the performance of BCE loss inour task of glacier front line segmentation. However, it doesshow slight improvement with low tolerance (65 m) for theAP glaciers. In contrast, the optimum k helps to deal with theclass-imbalance better. The qualitative results of the predictedlines on Jakobshavn Isbræ regions are depicted in Fig. 7. In thisﬁgure, the green color represents ground truth, the red colorshows the incorrect prediction, and the yellow color depictsthe correct predictions, i. e., the overlap between the predictionand the ground truth. The line predictions of Jakobshavn Isbræglaciers in both losses are disconnected. However, the distancemap loss manages to identify more regions of the calvingfronts. BCE loss also incorrectly predicts severe outliers (falsepositives) that are far away from the real locations as depictedin Fig. 7. Such false positives are less numerous in the case ofthe distance map BCE predictions. Referring to Fig. 8, whichdepicts the front line predictions of the AP glaciers, we seethat the BCE loss predicts several images quite inaccurately.These images are well handled by the distance map loss.When working on the problem of automatic detection ofCFPs in the SAR images, the available ground truth datafor the training could be the ice zones or the calving frontlines. The semantic segmentation of zones on the SAR imagesneeds post-processing to ﬁnd the calving front locations.Unlike semantically segmenting the zones, a direct semanticsegmentation of the calving front locations is a harder problemdue to the more severe class-imbalance. Figure 9 qualitativelycompares the segmented front lines using the two approachesfor the AP glaciers dataset: one approach predicts the front Fig. 7: Comparison of predictions of (a) BCE Loss, and (d)DMap BCE Loss. (b) and (c) show magniﬁed versions of themarked areas from the BCE Loss prediction image, and (e) and(f) show magniﬁed areas from the DMap BCE loss predictionimage. (a) (b) (c) Fig. 8: Predicted lines achieved by (b) BCE loss, and (b)distance map BCE loss. Image (a) depicts the ground truth.The two samples are from the AP glaciers dataset. (a) (b)

Fig. 9: Comparison of the predictions achieved by modelstrained on (a) front lines, and (b) zones and then post-processed.The sample is from the AP glaciers dataset.

TABLE II: Performance of the improved distance map BCE Loss (i. e., with optimum k ) vs. the BCE loss, inverse classfrequency weighted BCE, and the conventional distance map BCE Loss. Datasets Metrics Tolerance (meters) Loss functionBCE BCE (withweights) Dmap BCE, k = 1 Dmap BCE,optimum k AP lines IOU 65 0.2817 0.2798 0.3078

110 0.3957 0.3728 0.3991

155 0.4561 0.4355 0.4591

Dice Coeff. 65 0.4421 0.4211 0.4456

110 0.5385 0.5242 0.5442

155 0.5997 0.5873 0.6046

MCC 65 0.4496 0.4302 0.4502

110 0.5456 0.5295 0.5467

155 0.6062 0.5913 0.6059

Jakobshavn Isbræ lines IOU 60 0.5681 0.5535 0.5675

105 0.6451 0.6272 0.6417

150 0.6928 0.6717 0.6872

Dice Coeff. 60 0.7223 0.7101 0.7217

105 0.7824 0.7684 0.7794

150 0.8167 0.8012 0.8123

MCC 60 0.7208 0.707 0.7204

105 0.7783 0.7031 0.7754

150 0.8103 0.7937 0.8058 lines directly and the other post-processes the predicted zones.Direct predictions of calving front lines bring minor disjoints inthe predictions, whereas the post-processed zone predictions arecontinuous. However, the result of the post-processed zone isnot entirely the glacier calving front lines. It shows the boundarybetween ice zones and non-ice zones, which may contain somebedrock boundaries as well. This is also observable on thepost-processed predictions for the Jakobshavn glacier in Fig. 10.Moreover, some regions are better predicted by direct frontline predictions.Fig. 10: Calving front prediction of Jakobshavn Isbræ computedby postprocessing the predicted ice/non-ice zones.V. C

ONCLUSION

This paper proposes methods to improve deep learning-based semantic segmentation of the glacier calving fronts. Thedetection of calving fronts is challenged by its severe classinequality. We have tackled the class-imbalance problem by twoapproaches. First, the use of Mathews Correlation Coefﬁcientas an early stopping criteria showed a great performance gain.Then, to help the binary cross-entropy loss function to handlethe imbalanced data, we incorporated a novel distance map loss. The distance map can be used in conjunction with anyother loss function. The MCC-monitored network outperformsthe BCE-monitored network by about 15 % dice coefﬁcientand IOU. Using the modiﬁed distance map BCE loss results ina further improvement of approximately 2 % dice coefﬁcient.Finally, two approaches of predicting front lines werecompared, i. e., either to directly predict the front lines or topredict ice zones and then post-process the areas to delineatethe calving fronts. The presented results in this work areencouraging and we believe that other image segmentationapplications with severe-class-imbalance can beneﬁt from theproposed approaches in this work.R

EFERENCES[1] M. van den Broeke, J. Bamber, J. Ettema, E. Rignot, E. Schrama, W. J.van de Berg, E. van Meijgaard, I. Velicogna, and B. Wouters, “Partitioningrecent greenland mass loss,” science , vol. 326, no. 5955, pp. 984–986,2009.[2] J. J. F¨urst, G. Durand, F. Gillet-Chaulet, L. Tavard, M. Rankl, M. Braun,and O. Gagliardini, “The safety band of antarctic ice shelves,”

NatureClimate Change , vol. 6, no. 5, pp. 479–482, 2016.[3] P. Friedl, T. C. Seehaus, A. Wendt, M. H. Braun, and K. H¨oppner,“Recent dynamic changes on ﬂeming glacier after the disintegration ofwordie ice shelf, antarctic peninsula,”

The Cryosphere , vol. 12, no. 4,p. 1347, 2018.[4] H.-G. Sohn and K. Jezek, “Mapping ice sheet margins from ers-1 sarand spot imagery,”

International Journal of Remote Sensing , vol. 20,no. 15-16, pp. 3201–3216, 1999.[5] A. Seale, P. Christoffersen, R. I. Mugford, and M. O’Leary, “Oceanforcing of the greenland ice sheet: Calving fronts and patterns of retreatidentiﬁed by automatic satellite monitoring of eastern outlet glaciers,”

Journal of Geophysical Research: Earth Surface , vol. 116, no. F3, 2011.[6] C. A. Baumhoer, A. J. Dietz, C. Kneisel, and C. Kuenzer, “Automatedextraction of antarctic glacier and ice shelf fronts from sentinel-1 imageryusing deep learning,”

Remote Sensing , vol. 11, no. 21, p. 2529, 2019.[7] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, andD. Terzopoulos, “Image segmentation using deep learning: A survey,” arXiv preprint arXiv:2001.05566 , 2020. [8] N. O’Mahony, S. Campbell, A. Carvalho, S. Harapanahalli, G. V.Hernandez, L. Krpalkova, D. Riordan, and J. Walsh, “Deep learning vs.traditional computer vision,” in

Science and Information Conference ,pp. 128–144, Springer, 2019.[9] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deepconvolutional encoder-decoder architecture for image segmentation,”

IEEE transactions on pattern analysis and machine intelligence , vol. 39,no. 12, pp. 2481–2495, 2017.[10] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networksfor biomedical image segmentation,” in

International Conference onMedical image computing and computer-assisted intervention , pp. 234–241, Springer, 2015.[11] Y. Mohajerani, M. Wood, I. Velicogna, and E. Rignot, “Detection ofglacier calving margins with convolutional neural networks: A case study,”

Remote Sensing , vol. 11, no. 1, p. 74, 2019.[12] E. Zhang, L. Liu, and L. Huang, “Automatically delineating the calvingfront of jakobshavn isbræ from multitemporal terrasar-x images: a deeplearning approach,”

The Cryosphere , vol. 13, no. 6, pp. 1729–1741, 2019.[13] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdi-nov, “Dropout: a simple way to prevent neural networks from overﬁtting,”

The journal of machine learning research , vol. 15, no. 1, pp. 1929–1958,2014.[14] X. Ying, “An overview of overﬁtting and its solutions,”

Journal of Physics:Conference Series , vol. 1168, p. 022022, 02 2019.[15] L. Prechelt, “Early stopping-but when?,” in

Neural Networks: Tricks ofthe trade , pp. 55–69, Springer, 1998.[16] A. Luque, A. Carrasco, A. Mart´ın, and A. de las Heras, “The impact ofclass imbalance in classiﬁcation performance metrics based on the binaryconfusion matrix,”

Pattern Recognition , vol. 91, pp. 216–231, 2019.[17] G. Jurman, S. Riccadonna, and C. Furlanello, “A comparison of mcc andcen error measures in multi-class prediction,”

PloS one , vol. 7, no. 8,p. e41882, 2012.[18] D. Chicco and G. Jurman, “The advantages of the matthews correlationcoefﬁcient (mcc) over f1 score and accuracy in binary classiﬁcationevaluation,”

BMC genomics , vol. 21, no. 1, p. 6, 2020.[19] L. Shi, G. Campbell, W. Jones, F. Campagne, Z. Wen, S. Walker,Z. Su, T. Chu, F. Goodsaid, L. Pusztai, et al. , “The maqc-ii project:a comprehensive study of common practices for the development andvalidation of microarray-based predictive models.,” 2010.[20] A. Davari, H. C. ¨Ozkan, A. Maier, and C. Riess, “Fast and efﬁcientlimited data hyperspectral remote sensing image classiﬁcation via gmm-based synthetic samples,”

IEEE Journal of Selected Topics in AppliedEarth Observations and Remote Sensing , vol. 12, no. 7, pp. 2107–2120,2019.[21] X. Guo, Y. Yin, C. Dong, G. Yang, and G. Zhou, “On the class imbalanceproblem,”

Fourth International Conference on Natural Computation,ICNC ’08 , vol. Vol. 4, 10 2008.[22] P. Pompey and I. Solomos, “Kaggle human protein atlas,” tech. rep.,Standford University.[23] H. Kervadec, J. Bouchtiba, C. Desrosiers, E. Granger, J. Dolz, andI. B. Ayed, “Boundary loss for highly unbalanced segmentation,” in

International conference on medical imaging with deep learning , pp. 285–296, 2019.[24] F. Caliva, C. Iriondo, A. M. Martinez, S. Majumdar, and V. Pedoia,“Distance map loss penalty term for semantic segmentation,” 2019.[25] S. P. Adhikari and H. Kim, “Distance weighted loss for forest traildetection using semantic line,” in

International Conference on AdvancedConcepts for Intelligent Vision Systems , pp. 302–311, Springer, 2020.[26] I. Goodfellow, Y. Bengio, and A. Courville,

Deep Learning

Bioinformatics , vol. 16, no. 5, pp. 412–424, 2000.[29] S. F. Frisken, R. N. Perry, A. P. Rockwood, and T. R. Jones, “Adaptivelysampled distance ﬁelds: A general representation of shape for computergraphics,” in

Proceedings of the 27th annual conference on Computergraphics and interactive techniques , pp. 249–254, 2000.[30] P. Soille,

Morphological image analysis: principles and applications .Springer-Verlag New York, Inc., 2003.[31] J. Turner, H. Lu, I. White, J. C. King, T. Phillips, J. S. Hosking, T. J.Bracegirdle, G. J. Marshall, R. Mulvaney, and P. Deb, “Absence of 21stcentury warming on antarctic peninsula consistent with natural variability,”

Nature , vol. 535, no. 7612, pp. 411–415, 2016.[32] T. Seehaus, S. Marinsek, V. Helm, P. Skvarca, and M. Braun, “Changesin ice dynamics, elevation and mass discharge of dinsmoor–bombardier– edgeworth glacier system, antarctic peninsula,”

Earth and PlanetaryScience Letters , vol. 427, pp. 125–135, 2015.[33] T. C. Seehaus, S. Marinsek, P. Skvarca, J. M. van Wessem, C. H. Reijmer,J. L. Seco, and M. H. Braun, “Dynamic response of sj¨ogren inlet glaciers,antarctic peninsula, to ice shelf breakup derived from multi-missionremote sensing time series,”

Frontiers in Earth Science , vol. 4, p. 66,2016.[34] I. Joughin, I. M. Howat, M. Fahnestock, B. Smith, W. Krabill, R. B.Alley, H. Stern, and M. Truffer, “Continued evolution of jakobshavnisbrae following its rapid speedup,”

Journal of Geophysical Research:Earth Surface , vol. 113, no. F4, 2008.[35] A. J. Cook, T. Murray, A. Luckman, D. G. Vaughan, and N. E. Barrand,“A new 100-m digital elevation model of the antarctic peninsula derivedfrom aster global dem: methods and accuracy assessment.,”

Earth systemscience data. , vol. 4, no. 1, pp. 129–142, 2012.[36] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.[37] L. N. Smith, “Cyclical learning rates for training neural networks,” in , pp. 464–472, IEEE, 2017.[38] J. Canny, “A computational approach to edge detection,”

IEEE Transac-tions on Pattern Analysis and Machine Intelligence , vol. PAMI-8, no. 6,pp. 679–698, 1986.[39] A. A. Taha and A. Hanbury, “Metrics for evaluating 3d medical imagesegmentation: analysis, selection, and tool,”