[PDF] A Weakly Supervised Consistency-based Learning Method for COVID-19 Segmentation in CT Images

Abstract

Coronavirus Disease 2019 (COVID-19) has spread aggressively across the world causing an existential health crisis. Thus, having a system that automatically detects COVID-19 in tomography (CT) images can assist in quantifying the severity of the illness. Unfortunately, labelling chest CT scans requires significant domain expertise, time, and effort. We address these labelling challenges by only requiring point annotations, a single pixel for each infected region on a CT image. This labeling scheme allows annotators to label a pixel in a likely infected region, only taking 1-3 seconds, as opposed to 10-15 seconds to segment a region. Conventionally, segmentation models train on point-level annotations using the cross-entropy loss function on these labels. However, these models often suffer from low precision. Thus, we propose a consistency-based (CB) loss function that encourages the output predictions to be consistent with spatial transformations of the input images. The experiments on 3 open-source COVID-19 datasets show that this loss function yields significant improvement over conventional point-level loss functions and almost matches the performance of models trained with full supervision with much less human effort. Code is available at: \url{this https URL}.

Full PDF

AA Weakly Supervised Consistency-based Learning Method for COVID-19Segmentation in CT Images

Issam Laradji , Pau Rodriguez , Oscar Ma˜nas , Keegan Lensink , Marco Law , Lironne Kurzman ,William Parker , David Vazquez , and Derek Nowrouzezahrai [email protected], Element AI, Xtract AI, SapienML, University of British Columbia, McGill University, Universitat Politcnica de Catalunya

Abstract

Coronavirus Disease 2019 (COVID-19) has spread ag-gressively across the world causing an existential healthcrisis. Thus, having a system that automatically detectsCOVID-19 in tomography (CT) images can assist in quan-tifying the severity of the illness. Unfortunately, labellingchest CT scans requires signiﬁcant domain expertise, time,and effort. We address these labelling challenges by onlyrequiring point annotations, a single pixel for each in-fected region on a CT image. This labeling scheme al-lows annotators to label a pixel in a likely infected re-gion, only taking 1-3 seconds, as opposed to 10-15 sec-onds to segment a region. Conventionally, segmentationmodels train on point-level annotations using the cross-entropy loss function on these labels. However, thesemodels often suffer from low precision. Thus, we pro-pose a consistency-based (CB) loss function that encour-ages the output predictions to be consistent with spatialtransformations of the input images. The experiments on3 open-source COVID-19 datasets show that this loss func-tion yields signiﬁcant improvement over conventional point-level loss functions and almost matches the performance ofmodels trained with full supervision with much less humaneffort. Code is available at: https://github.com/IssamLaradji/covid19_weak_supervision .

1. Introduction

The severe acute respiratory syndrome coronavirus 2(SARS-CoV-2) has quickly become a global pandemic andresulted in over 400,469 COVID-19 related deaths as ofJune 8th, 2020 . The virus comes from the same family asthe SARS-CoV outbreak originated in 2003 and the MERS- Source: World Health Organization.

Original Image Point-level Supervision(Ours)Full Supervision (Conventional)

Figure 1:

Labeling Scheme.

We illustrate the differencebetween labels obtained using full supervision and point-level supervision. One point is placed on each infected re-gion, and several on the background region.CoV outbreak of 2012, and is projected to join other coron-avirus strains as a seasonal disease. The disease can presentitself in a variety of ways ranging from asymptomatic toacute respiratory distress syndrome (ARDS). However, theprimary and most common presentation associated withmorbidity and mortality is the presence of opacities andconsolidation in a patient’s lungs. As the disease spreads,healthcare centers around the world are becoming over-whelmed and facing shortages of the essential equipmentnecessary to manage the symptoms of the disease. Severecases require admission to the intensive care unit (ICU) andneed mechanical ventilation, some sources [11] citing at arate of 5% of all infected. Thus, availability of ICU beds dueto the overwhelming number of COVID-19 cases around theworld is a large challenge. Rapid screening is necessary todiagnose the disease and slow the spread, making effectivetools essential for prognostication in order to efﬁciently al-locate intensive care services to those who need it most.Upon inhalation, the virus attacks and inhibits the alve-oli of the lung, which are responsible for oxygen ex-change [44]. In response, and as part of the inﬂamma-tory repair process, the alveoli ﬁll with ﬂuid, causing var-1 a r X i v : . [ ee ss . I V ] J u l ous forms of opaciﬁcation within the lung when viewed onComputed Tomography (CT) scans. Due to the increaseddensity, these areas present on CT scans as increased at-tenuation with preserved bronchial and vascular markingsknown as ground glass opacities (GGO). In addition, theaccumulation of ﬂuid progresses to obscure bronchial andvascular regions on CT scans is known as consolidation.While reverse transcription polymerase chain reaction(RT-PCR) has been considered the gold standard forCOVID-19 screening, the shortage of equipment and strictrequirements for testing environments limit the utility ofthis test in all settings. Further, RT-PCR is also reportedto suffer from high false negative rates due to its relativelylow sensitivity yet high speciﬁcity [1]. CT scans are animportant complement to RT-PCR tests which were shownto demonstrate effective diagnosis, including follow-up as-sessment and the evaluation of disease evolution [1, 53].In addition to providing complimentary diagnostic prop-erties, the analysis of CT scans has great potential for theprognostication of patients with COVID-19. The percent-age of well-aerated-lung (WAL) has emerged as a predictivemetric for determining prognosis of patients conﬁrmed withCOVID-19, including admission to the ICU and death [7].The quantiﬁcation of percentage of WAL is often doneby visually estimating volume of opaciﬁcation relative tohealthy lung, and can be estimated automatically throughattenuation values within the lung. In addition to the per-cent of WAL, which does not account for the various formsof opaciﬁcation, expert interpretation of CT scans can pro-vide insight on the severity of the infection by identifyingvarious patterns of opaciﬁcation. The prevalence of thesepatterns, which are correlated with the severity of the in-fection, has been correlated to different stages of the dis-ease [22, 43]. The quantiﬁcation of both the percentage ofWAL and the opaciﬁcation composition enables efﬁcient es-timation of the stage of the disease and the patient outcome.Deep learning-based methods have been widely appliedin medical image analysis to combat COVID-19 [13, 16,42]. They have been proposed to detect patients infectedwith COVID-19 via radiological imaging. For example,COVID-Net [40] was proposed to detect COVID-19 casesfrom chest radiography images. An anomaly detectionmodel was designed to assist radiologists in analyzing thevast amounts of chest X-ray images [35]. For CT imaging,a location-attention oriented model was employed to calcu-late the infection probability of COVID-19 [5]. A weakly-supervised deep learning-based software system was devel-oped in [49] using 3D CT volumes to detect COVID-19. Alist of papers for COVID-19 imaging-based AI works canbe found in Wang et al. [41]. Although plenty of AI sys-tems have been proposed to provide assistance in diagnos-ing COVID-19 in clinical practice, there are only a few re-lated works [10], and no signiﬁcant impact has been shown using AI to improve clinical outcomes, as of yet.According to Ma et al. [27], it takes around 400 minutesto delineate one CT scan with 250 slices. That is an averageof 1.6 minutes per slice. On the other hand, it takes around3 seconds to point to a single region at the pixel level Pa-padopoulos et al. [31]. Thus, point-level annotations allowus to label many more slices quickly.Point-level annotations are not as expressive as segmen-tation labels, making effective learning a challenge for seg-mentation models (Fig. 1). Conventionally, segmentationmodels train on point-level annotations using the cross-entropy on these labels. While this loss can yield goodresults in some real-life datasets [3], the resulting modelsusually suffer from low precision as they often predict bigblobs. Such predictions are not suitable for imbalancedimages where only few small regions are labeled as fore-ground. Thus, we propose a consistency-based (CB) lossfunction that encourages the model’s output predictions tobe consistent with spatial transformations of the input im-ages. While consistency methods have been successfullydeployed in semantic segmentation, the novel aspect of thiswork is the notion of consistency under weak supervision,which utilizes unlabeled pixels during training. We showthat this regularization method yields signiﬁcant improve-ment over conventional point-level loss functions was on 3open-source COVID-19 datasets. We also show that thisloss function results in a segmentation performance that al-most matches that of the fully supervised model. To the bestof our knowledge, this is the ﬁrst time that self-supervisionhas been applied in conjunction with point-level supervisionon a medical segmentation dataset.We summarize our contributions and results on 3 pub-licly available CT Scans as follows:1. We propose a framework that trains using aconsistency-based loss function on a medical segmen-tation dataset labeled with point-level supervision.2. We present a trivial, yet cost-efﬁcient point-level su-pervision setup where the annotator is only required tolabel a single point on each infected region and severalpoints on the background.3. We show that our consistency-based loss functionyields signiﬁcant improvement over conventionalpoint-level loss functions and almost matches the per-formance of models trained with full supervision.

2. Related Work

In this section, we start with reviewing semantic seg-mentation methods applied to CT scans on general medicalproblems, followed by semantic segmentation for COVID-19. Later we go over semantic segmentation methods Found here: https://medicalsegmentation.com/covid19/

Semantic segmentation for CT Scans has been widelyused for diagnosing lung diseases. Diagnosis is often basedon segmenting different organs and lesions from chest CTslices, which can provide essential information for doctorsto identify lung diseases. Many methods exist that performnodule segmentation of lungs. Early algorithms are basedon image processing and SVMs to segment nodules [16].Then, algorithms based on deep learning emerged [13].These methods include central focus CNNs [42] and GAN-synthesized data for nodule segmentation in CT scans [15].A recent method uses multiple deep networks to segmentlung tumors from CT slices with varying resolutions, andmulti-task learning of joint classiﬁcation and segmenta-tion [14]. In this work, we use an ImageNet-pretrainedFCN8 [25] as our segmentation method.

Semantic segmentation for COVID-19

While COVID-19 is a recent phenomenon, several methods have been pro-posed to analyze infected regions of COVID-19 in lungs.Fan et al. [10] proposed a semi-supervised learning algo-rithm for automatic COVID-19 lung infection segmenta-tion from CT scans. Their algorithm leverages attention toenhance representations. Similarly, Zhou et al. [50] pro-posed to use spatial and channel attention to enhance rep-resentations, and Chen et al. [6] augment U-Net [33] withResNeXt [48] blocks and attention. Instead of focusing onthe architecture, Amyar et al. [2] proposed to improve thesegmentation performance with a multi-task learning ap-proach which includes a reconstruction loss. Although pre-vious methods are accurate, their computational cost can beprohibitive. Thus, Qiu et al. [32] proposed Miniseg for efﬁ-cient COVID-19 segmentation. Unfortunately, these meth-ods require full supervision, which is costly to acquire com-pared to point-level supervision: our problem setup.

Weakly supervised semantic segmentation methodscan vastly reduce the required annotation cost for collect-ing a training set. According to Bearman et al. [3], man-ually collecting image-level and point-level labels for thePASCAL VOC dataset [9] takes only . and . sec-onds per image, respectively. These annotation methods arean order of magnitude faster than acquiring full segmenta-tion labels, which is . seconds on average. Other formsof weaker labels were explored as well, including boundingboxes [17] and image-level annotation [51]. Weak supervi-sion was also explored in instance segmentation where thegoal is to identify object instances as well as their class la-bels [20, 21, 52]. In this work, the labels are given as point-level annotations instead of the conventional per-pixel level labels and the task is to identify the class labels of the re-gions only. Self-supervision for weakly supervised semantic seg-mentation is a relatively new research area that has strongpotential in improving segmentation performance. The ba-sic idea is to generate two perturbed versions of the inputand apply consistency training to encourage the predictionsto be similar [47]. For example, FixMatch [37] combinedconsistency regularization with pseudo-labeling to produceartiﬁcial image-level labels. In the case of dense predic-tions, the outputs need to be further transformed in orderto compare them against a consistency loss, making themodel’s output equivariant against transformations. Self-supervision was recently applied in a weakly supervisedsetup where annotations are image-level [45]. The idea wasto make the output consistent across scales, which led tonew state-of-the-art results on PASCAL VOC dataset. Oualiet al. [30] proposed to apply cross-consistency training,where the perturbations are applied to the outputs of the en-coder and the dense predictions are enforced to be invariant.These perturbations can also be used for data augmentation,which can be learnt automatically using methods based onreinforcement learning and bilevel optimization [8, 29]. Formedical segmentation, self-supervision has been used alongwith semi-supervised learning [4, 23]. Bortsova et al. [4]made the outputs consistent across elastic transforms, whileLi et al. [23] added a teacher-student paradigm for consis-tency training. In this work, we apply consistency loss onthe novel setup of medical segmentation with point super-vision.

3. Methodology

Problem Setup and Network Architecture.

We deﬁnethe problem setup as follows. Let X be a set of N trainingimages with corresponding ground truth labels Y . Y i is a W × H matrix with non-zero entries that indicate the lo-cations of the object instances. The values of these entriesindicate the class label that the point corresponds to.We use a standard fully-convolutional neural networkthat takes as input an image of size W × H and outputsa W × H × C per-pixel map where C is the set of objectclasses of interest. The output map is converted to a per-pixel probability matrix S i by applying the softmax func-tion across classes. These probabilities indicate how likelyeach pixel belongs to the infected region of a class c ∈ C . Proposed Loss Function.

Our weakly supervised methoduses a loss function that consists of a supervised point-levelloss and an unsupervised consistency loss. Given a network f θ that outputs a probability map S i given an image X i , weoptimize its parameters θ using the following loss function,3 P Transform TransformInput Image C Consistency lossPoint-level Loss P LogitsCNNencoder DecoderCNNencoder Decoder Logits GroundtruthLogits

Figure 2:

Model Training.

Our model has two branches with shared weights. The ﬁrst branch encodes the original input x while the second branch encodes the transformed input t ( x ) . The point-level loss compares the outputs f ( x ) and f ( t ( x )) with the corresponding weak labels y and t ( y ) . In addition, an unsupervised consistency loss is used to make the outputs t ( f ( x )) and f ( t ( x )) consistent. L ( X, Y ) = N (cid:88) i =1 L P ( X i , Y i ) (cid:124) (cid:123)(cid:122) (cid:125) Point-level + λ L C ( X i ) (cid:124) (cid:123)(cid:122) (cid:125) Consistency , (1)where λ is used to weigh between the two loss terms. Point-level loss.

We apply the standard cross-entropyfunction against point annotations, which is deﬁned as fol-lows, L P ( X i , Y i ) = − (cid:88) j ∈I i log( f θ ( X i ) jY j ) , (2)where f θ ( X i ) jY j is the output corresponding to class Y j forpixel j , and I i is the set of labeled pixels for image X i . Consistency loss.

We ﬁrst deﬁne a set of geometric trans-formations T = { t , t , ..., t n } . An example of t k is hor-izontal ﬂipping, which can be used to transform an im-age X i and its corresponding label Y i collectively to theirﬂipped version. The goal of this loss function is to makethe model’s output consistent with respect to these transfor-mations on the input image. The loss function is deﬁned asfollows, L C ( X i ) = (cid:88) j ∈P i | t k ( f θ ( X i )) j − f θ ( t k ( X i )) j | , (3)where P i is the set of pixels for image X i . This unsuper-vised loss function helps the network learn equivariant se-mantic representations that go beyond the translation equiv-ariance that underlies convolutional neural networks, serv-ing as an additional form of supervision. Model Training.

The overview of the model training isshown in Fig. 2 and Alg. 1. The model has two brancheswith shared weights θ . At each training step k , we sam-ple an image X i and a transform function t k ∈ T . Themodel’s ﬁrst branch takes as input the original image X i and the second branch takes as input the transformed im-age t k ( X i ) . The transformed output of the ﬁrst branch, y := t k ( f θ ( X i )) , is aligned with the prediction of the sec-ond branch y := f θ ( t k ( X i )) for pixel-wise comparison bythe consistency loss function 3.In addition to the consistency loss, the point-levelloss L P is applied to both input X i and t k ( X i ) , i.e . L P ( t k ( X i ) , t k ( Y i )) , where t k ( Y i ) is a pseudo ground-truthmask for t k ( X i ) generated by applying the same geometrictransformation t k to the ground-truth mask Y i . In this case,the network is forced to update the prediction for t k ( X i ) tobe more similar to t k ( Y i ) .In this work, we use geometric transformations which al-low us to infer the true label of images that undergo thesetransformations. For instance, the segmentation mask of theﬂipped version of an image is the ﬂipped version of the orig-inal segmentation mask. Thus, we include the followingtransformations: 0, 90, 180 and 270 degree rotation and ahorizontal ﬂip. At test time, the trained model can then bedirectly used to segment infected regions on unseen imageswith no additional human input.

4. Experiments

Here we describe the details behind the datasets, meth-ods, and evaluation metrics used in our experiments.4 lgorithm 1:

Model Training

Input : X = { X , X , ..., X n } images, Y = { Y , Y , ..., Y n } point-level masks. Output :

Trained parameters θ ∗ Parameters:

A weight coefﬁcient λ ,A set of transformation functions T ,A model forward function f θ . for each batch B do L ← for each ( X i , Y i ) ∈ B do Compute Point Loss L P ← − (cid:80) j ∈I i log( f θ ( X i ) jY j ) Uniformly sample a transform function t k ∼ T Compute Consistency Loss L C ← (cid:80) j ∈P i | t k ( f θ ( X i )) j − f θ ( t k ( X i )) j | L ← L + L P + λ L C end Update θ by backpropagating w.r.t. L end4.1.1 Datasets We evaluate our weakly supervised learning system on threeseparate open source medical segmentation datasets (re-ferred to as COVID-19-A/B/C). For each dataset, a point-level label is obtained for a segmentation mask by takingthe pixel with the largest distance transform as the centroid.Thus, we generate a single supervised point for each dis-joint infected region on the training images. For the back-ground region, we randomly sample several pixels as theground-truth points (Figure 1). We show the dataset statis-tics in Table 1 and describe them in the next sections.

COVID-19-A [10, 28] consists of 100 axial lung CTJPEG images obtained from 60 COVID-19 lung CTs pro-vided by the Italian Society of Medical and InterventionalRadiology. Each image was labeled for ground-glass, con-solidation, and pleural effusion by a radiologist. We dis-carded two images without areas of infection from thisdataset due to their low resolution. Images were resized toa ﬁxed dimension of × pixels and normalized usingImageNet statistics [34]. The ﬁnal dataset consisted of 98images separated into a training set ( n = 50 ), validation set( n = 5 ), and a test set ( n = 48 ). COVID-19-B [28] consists of 9 volumetric COVID-19chest CTs in DICOM format containing a total of 829 axialslices. Images were ﬁrst converted from Houndsﬁeld units to unsigned 8-bit integers, then resized to × pixelsand normalized using ImageNet statistics [34].We use COVID-19-B to evaluate the consistency loss ontwo splits of the dataset: separate and mixed. In the separatesplit (COVID-19-B-Separate), the slices in the training, val-idation, and test set come from different scans. The goal isto have a trained model that can generalize to scans of newpatients. In this setup, the ﬁrst 5 scans are deﬁned as thetraining set, the sixth scan as validation, and the remainingscans as the test set.For the mixed split (COVID-19-B-Mixed), the slices inthe training, validation, and test set come from the samescans. The idea is to have a trained model that can infer themasks in the remaining slices of a scan when the annotatoronly labels few of the slices in that scan. In this setup, theﬁrst 5 scans are deﬁned as the training set, the sixth scanas validation, and the remaining scans as the test set. Foreach scan, the ﬁrst 45% slices of the scan are deﬁned asthe training set, the next 5% as the validation set, and theremaining slices as the test set. COVID-19-C [26] consists of 20 CT volumes. Lungs andareas of infection were labeled by two radiologists and ver-iﬁed by an experienced radiologist. Each three-dimensionalCT volume was converted from Houndsﬁeld units to un-signed 8-bit integers and normalized using ImageNet statis-tics [34].As with COVID-19-B, we also split the dataset into sep-arate and mixed versions to evaluate our model’s efﬁcacy.For the separate split (COVID-19-B-Sep), we assign 15scans to the training set, 1 scan to the validation set, and4 scans to the test set. For the mixed split (COVID-19-C-Mixed), we separate the slices from each scan in the samemanner as in COVID-19-B, training on the ﬁrst axialslices, validating on the next of slices, and testing on theremaining of slices. As common practice [36], we evaluate our models againstthe following metrics for semantic segmentation:

Intersection over Union (IoU) measures the overlapbetween the prediction and the ground truth:

IoU = T PT P + F P + F N , where TP, FP, and FN is the number of truepositive, false positive and false negative pixels across allimages in the test set. Dice Coefﬁcient (F1 Score) is similar to IoU but givesmore weight to the intersection between the prediction andthe ground truth: F ∗ T P ∗ T P + F P + F N . PPV (Positive Predicted Value) measures the fraction ofpositive samples that were correctly predicted, which is alsoknown as precision:

P P V = T PT P + F P .5able 1: Statistics of open-source COVID-19 datasets.Name

Sensitivity (recall) measures the fraction of real posi-tive samples that were predicted correctly:

Sensitivity = T PT P + F N . Speciﬁcity (true negative rate) measures the fractionof real negative samples that were predicted correctly:

Specif icity = T NF P + T N . We provide experiments with three weakly supervisedloss functions based on point-level annotations and a fully-supervised upper bound method: • Point loss (PL) . It is deﬁned in Eq. 2 in Bearman et al.[3]. The loss function encourages all pixel predictionsto be background for background images and appliescross-entropy against the provided point-level annota-tions, ignoring the rest of the pixels. • CB(Flip) + PL . It is deﬁned in Eq. 1 in Section 3,which combines the point loss and the horizontal ﬂiptransformation for the consistency loss. • CB(Flip, Rot) + PL . It is the same as

CB(Flip) + PL except that the transformation used for the consistencyloss also includes the 0, 90, 180, and 270 degree rota-tion transformation uniformly sampled for each image. • Fully supervised . This loss function combinesweighted cross-entropy and IoU loss as deﬁned in Eq.(3) and (5) from Wei et al. [46], respectively. It is anefﬁcient method for ground truth segmentation masksthat are imbalanced. Since this loss function requiresfull supervision, it serves as an upper bound perfor-mance in our experimental results.

Implementation Details

Our methods use an Imagenet-pretrained VGG16 FCN8 network [25]. Models are trainedwith a batch size of 8 for 100 epochs with ADAM [18] and alearning rate of − . We also achieved similar results withoptimizers that do not require a learning rate [24, 38, 39].The reported scores are on the test set which were obtainedwith early stopping on the validation set. Point annotationswere obtained by uniformly sampling one pixel from eachannotated mask. The same amount of points are uniformlysampled from the background. Table 2: COVID-19-A Segmentation Results Loss Function Dice IoU PPV Sens. Spec.Fully Supervised 0.65 0.48 0.52 0.85 0.85Point Loss (PL) 0.54 0.37 0.39

Here we evaluate the loss functions on three coviddatasets and discuss their results.

Table 2 shows that with only point supervision, our methodwas able to perform competitively compared to full super-vision. In terms of sensitivity, it can be observed that thepoint loss outperformed the fully-supervised baseline by0.11 points. For the other metrics, we were able to obtaincompetitive performance when using the consistency-based(CB) loss. The gap between fully supervised and point-based loss is reduced when using ﬂips and rotations (Flip,Rot) instead of simple horizontal ﬂips (Flip). Moreover,with (Flip, Rot), our method surpasses the fully-supervisedsensitivity by 0.12 points. COVID-19-A is a small and easydataset compared to COVID-19-B and COVID-19-C. Thus,in the next sections, we show that with bigger datasets,

CBpoint loss obtains even better performance on the rest of themetrics with weak supervision.

As seen in Table 3 and 4, the CB method is more robustagainst different splits of the data. In both COVID-19-B-Sep and COVID-19-B-Mixed, the CB method achieves sim-ilar results, whereas there is more variance in the resultswith

Point Loss and

W-CE metrics. While the W-CE base-line has an average gap of 0.37 between sep and mixed overall metrics, the CB Point loss only has a difference of 0.07with (Flip) and 0.08 with (Flip, Rot). Remarkably, on sep ,our weakly supervised method with (Rot, Flip) improvedby 0.48, 0.42, and 0.56, on the Dice, IoU, and Sensitivitymetrics, with respect to the W-CE baseline. On PPV andSpeciﬁcity, our method was able to retain a competitive per-6 riginal Image Ground Truth Point Loss (PL) Consistency LossCB(Flip, Rot) + PL

Figure 3:

Qualitative results.

We show the predictions obtained from training the model with the point-level loss in Bearmanet al. [3] and our consistency-based (CB) loss. With the CB loss the predictions are much closer to the ground-truth labels.7able 3: COVID-19-B-Mixed Segmentation Results

Loss Function Dice IoU PPV Sens. Spec.Fully Supervised 0.84 0.73 0.90 0.80 1.00Point Loss (PL) 0.33 0.20 0.20 0.91 0.94CB(Flip) + PL (Ours) 0.73 0.57

CB(Flip, Rot) + PL (Ours)

Table 4: COVID-19-B-Sep Segmentation Results

Loss Function Dice IoU PPV Sens. Spec.Fully Supervised 0.24 0.14 0.89 0.14 1.00Point Loss (PL) 0.57 0.40 0.44

CB(Flip, Rot) + PL (Ours)

Table 5: COVID-19-C-Mixed Segmentation Results

Loss Function Dice IoU PPV Sens. Spec.Fully Supervised 0.78 0.64 0.79 0.77 1.00Point Loss (PL) 0.12 0.07 0.07

CB(Flip, Rot) + PL (Ours)

Table 6: COVID-19-C-Sep Segmentation Results

Loss Function Dice IoU PPV Sens. Spec.Fully Supervised 0.71 0.55 0.78 0.65 0.99Point Loss (PL) 0.37 0.23 0.23 formance, with a difference of 0.16 and 0.02 respectively.Except the for Sensitivity in COVID-19-B-Sep, the CB loss(Rot, Flip) yields better results than the point loss.

As seen in Tables 5 and 6, the fully supervised method per-forms better on COVID-19-C than in the other two datasetsand the performance gap between mixed and sep is smaller.This can be attributed to the larger size of COVID-19-C.The average gap in performance of the fully supervisedbaseline between the mixed and sep versions is 0.06 forCOVID-19-C. The weakly supervised CB loss yields a gapof 0.05 in performance between mixed and sep . Similarto COVID-19-B, except for Sensitivity, the CB point lossyields substantially better results than the point loss. Wealso observed better results when adding rotations. In fact,with (Flip, Rot), our weakly supervised method improvesover the fully supervised baseline by 0.04, 0.04, and 0.21on Dice, IoU and

Sensitivity on the sep split. Table 7: COVID-19-B-Mixed Counting and LocalizationLoss Function MAE GAMEPoint Loss 5.97 7.24LCFCN Loss 1.15 2.09CB LCFCN (Ours) Loss

Table 8: COVID-19-C-Mixed Counting and LocalizationLoss Function MAE GAMEPoint Loss 9.63 11.76LCFCN Loss 1.01 1.70CB LCFCN Loss (Ours)

In this setup we consider the task of counting and lo-calizing COVID-19 infected regions in CT Scan images.Radiologists strive to identify all regions that might haverelevance to COVID-19, which is a very challenging task,especially for small infected regions. Thus, having a modelthat can localize these regions can help improve radiologistperformance in the identiﬁcation of infected regions.We consider the COVID-19-B and COVID-19-Cdatasets to evaluate 3 types of loss functions: point loss(Eq.2 from Bearman et al. [3]), LCFCN loss (Eq. 1from Laradji et al. [19]), and consistency-based LCFCNloss that we propose in this section.The consistency based LCFCN (CB LCFN Loss) loss ex-tends the LCFCN loss with the CB loss proposed in Eq. 1using the horizontal ﬂip transformation. To evaluate these3 loss functions, we consider each connected infected re-gion as a unique region. The goal is to identify whetherthese regions can be counted and localized. We use themean absolute error (MAE) and grid average mean abso-lute error (GAME) [12] to measure how well the methodscan count and localize infected regions. We provide resultsfor

GAM E ( L = 4) which divides the image using a gridof L non-overlapping regions, and the error is computed asthe sum of the MAE in each of these subregions.Table 7 and 8 shows that the consistency loss helpsLCFCN achieve superior results in counting and localiz-ing infected regions in the CT image. It is expected thatthe Point Loss achieves poor performance as it predicts bigblobs that can encapsulate several regions together. On theother hand, the consistency loss helped LCFCN improveits results suggesting the model learns more informative se-mantic features for the task with such self-supervision.8 . Conclusion

Machine learning has the potential to solve a numberchallenges associated with COVID-19. One example is theidentiﬁcation of high-risk patients by segmenting infectedregions in CT scans. However, conventional annotationsmethods rely on per-pixel labels which are costly to collectfor CT scans. In this work, we have proposed an efﬁcientmethod that can learn from point-level annotations, whichare much cheaper to acquire than per-pixel labels. Ourmethod uses a consistency-based loss that signiﬁcantly im-proves the segmentation performance compared to conven-tional point-level loss on 3 COVID-19 open-source datasets.Further, our method obtained results that almost match theperformance of the fully supervised methods and they aremore robust against different splits of the data.

References [1] T. Ai, Z. Yang, H. Hou, C. Zhan, C. Chen, W. Lv, Q. Tao,Z. Sun, and L. Xia. Correlation of chest ct and rt-pcr testingin coronavirus disease 2019 (covid-19) in china: a report of1014 cases.

Radiology , page 200642, 2020.[2] A. Amyar, R. Modzelewski, and S. Ruan. Multi-task deeplearning based ct imaging analysis for covid-19: Classiﬁca-tion and segmentation. medRxiv , 2020.[3] A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei.Whats the point: Semantic segmentation with point super-vision.

ECCV , 2016.[4] G. Bortsova, F. Dubost, L. Hogeweg, I. Katramados, andM. de Bruijne. Semi-supervised medical image segmenta-tion via learning consistency under transformations. In

In-ternational Conference on Medical Image Computing andComputer-Assisted Intervention , pages 810–818. Springer,2019.[5] C. Butt, J. Gill, D. Chun, and B. A. Babu. Deep learning sys-tem to screen coronavirus disease 2019 pneumonia.

AppliedIntelligence , page 1, 2020.[6] X. Chen, L. Yao, and Y. Zhang. Residual attention u-netfor automated multi-class segmentation of covid-19 chest ctimages. arXiv preprint arXiv:2004.05645 , 2020.[7] D. Colombi, F. C. Bodini, M. Petrini, G. Mafﬁ, N. Morelli,G. Milanese, M. J. Silva, N. Sverzellati, and E. Michieletti.Well-aerated lung on admitting chest ct to predict adverseoutcome in covid-19 pneumonia.

Radiology , 2020.[8] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le.Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501 , 2018.[9] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, andA. Zisserman. The pascal visual object classes (voc) chal-lenge.

IJCV , 2010.[10] D.-P. Fan, T. Zhou, G.-P. Ji, Y. Zhou, G. Chen, H. Fu, J. Shen,and L. Shao. Inf-net: Automatic covid-19 lung infection seg-mentation from ct images.

IEEE Transactions on MedicalImaging , 2020.[11] W.-j. Guan, Z.-y. Ni, Y. Hu, W.-h. Liang, C.-q. Ou, J.-x. He,L. Liu, H. Shan, C.-l. Lei, D. S. Hui, B. Du, L.-j. Li, G. Zeng, K.-Y. Yuen, R.-c. Chen, C.-l. Tang, T. Wang, P.-y. Chen,J. Xiang, S.-y. Li, J.-l. Wang, Z.-j. Liang, Y.-x. Peng, L. Wei,Y. Liu, Y.-h. Hu, P. Peng, J.-m. Wang, J.-y. Liu, Z. Chen,G. Li, Z.-j. Zheng, S.-q. Qiu, J. Luo, C.-j. Ye, S.-y. Zhu, andN.-s. Zhong. Clinical characteristics of coronavirus disease2019 in china.

New England Journal of Medicine , 382(18):1708–1720, 2020.[12] R. Guerrero, B. Torre, R. Lopez, S. Maldonado, andD. Onoro. Extremely overlapping vehicle counting.

IbPRIA ,2015.[13] M. H. Hesamian, W. Jia, X. He, and P. Kennedy. Deep learn-ing techniques for medical image segmentation: Achieve-ments and challenges.

Journal of digital imaging , 32(4):582–596, 2019.[14] J. Jiang, Y.-C. Hu, C.-J. Liu, D. Halpenny, M. D. Hellmann,J. O. Deasy, G. Mageras, and H. Veeraraghavan. Multipleresolution residually connected feature streams for automaticlung tumor segmentation from ct images.

IEEE transactionson medical imaging , 38(1):134–144, 2018.[15] D. Jin, Z. Xu, Y. Tang, A. P. Harrison, and D. J. Mollura.Ct-realistic lung nodule simulation from 3d conditional gen-erative adversarial networks for robust lung segmentation. In

International Conference on Medical Image Computing andComputer-Assisted Intervention , pages 732–740. Springer,2018.[16] M. Keshani, Z. Azimifar, F. Tajeripour, and R. Boostani.Lung nodule segmentation and recognition using svm clas-siﬁer and active contour modeling: A complete intelligentsystem.

Computers in biology and medicine , 43(4):287–300,2013.[17] A. Khoreva, R. Benenson, J. H. Hosang, M. Hein, andB. Schiele. Simple does it: Weakly supervised instance andsemantic segmentation.

CVPR , 2017.[18] D. P. Kingma and J. Ba. Adam: A method for stochasticoptimization. arXiv preprint arXiv:1412.6980 , 2014.[19] I. H. Laradji, N. Rostamzadeh, P. O. Pinheiro, D. Vazquez,and M. Schmidt. Where are the blobs: Counting by localiza-tion with point supervision.

ECCV , 2018.[20] I. H. Laradji, N. Rostamzadeh, P. O. Pinheiro, D. Vazquez,and M. Schmidt. Instance segmentation with point supervi-sion. arXiv preprint arXiv:1906.06392 , 2019.[21] I. H. Laradji, D. Vazquez, and M. Schmidt. Where are themasks: Instance segmentation with image-level supervision.In

BMVC , 2019.[22] M. Li, P. Lei, B. Zeng, Z. Li, P. Yu, B. Fan, C. Wang, Z. Li,J. Zhou, S. Hu, and H. Liu. Coronavirus disease (COVID-19): Spectrum of CT ﬁndings and temporal progressionof the disease.

Academic Radiology , 27(5):603–608, May2020. doi: 10.1016/j.acra.2020.03.003. URL https://doi.org/10.1016/j.acra.2020.03.003 .[23] X. Li, L. Yu, H. Chen, C.-W. Fu, L. Xing, and P.-A.Heng. Transformation-consistent self-ensembling modelfor semisupervised medical image segmentation.

IEEETransactions on Neural Networks and Learning Systems ,page 112, 2020. ISSN 2162-2388. doi: 10.1109/tnnls.2020.2995319. URL http://dx.doi.org/10.1109/TNNLS.2020.2995319 .[24] N. Loizou, S. Vaswani, I. Laradji, and S. Lacoste-Julien. tochastic polyak step-size for sgd: An adaptive learningrate for fast convergence. arXiv preprint arXiv:2002.10542 ,2020.[25] J. Long, E. Shelhamer, and T. Darrell. Fully convolutionalnetworks for semantic segmentation. CVPR , 2015.[26] J. Ma, C. Ge, Y. Wang, X. An, J. Gao, Z. Yu, and J. He.Covid-19 ct lung and infection segmentation dataset (versionverson 1.0), 2020. URL http://doi.org/10.5281/zenodo.375747 .[27] J. Ma, Y. Wang, X. An, C. Ge, Z. Yu, J. Chen, Q. Zhu,G. Dong, J. He, Z. He, et al. Towards efﬁcient covid-19 ct an-notation: A benchmark for lung and infection segmentation. arXiv preprint arXiv:2004.12537 , 2020.[28] MedSeg. Covid-19 ct segmentation dataset, 2020.URL https://medicalsegmentation.com/covid19/ .[29] S. Mounsaveng, I. Laradji, I. B. Ayed, D. Vazquez, andM. Pedersoli. Learning data augmentation with onlinebilevel optimization for image classiﬁcation. arXiv preprintarXiv:2006.14699 , 2020.[30] Y. Ouali, C. Hudelot, and M. Tami. Semi-supervised seman-tic segmentation with cross-consistency training, 2020.[31] D. P. Papadopoulos, J. R. Uijlings, F. Keller, and V. Fer-rari. Training object class detectors with click supervision.In

Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition , pages 6374–6383, 2017.[32] Y. Qiu, Y. Liu, and J. Xu. Miniseg: An extremely minimumnetwork for efﬁcient covid-19 segmentation. arXiv preprintarXiv:2004.09750 , 2020.[33] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolu-tional networks for biomedical image segmentation.

MIC-CAI , 2015.[34] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,A. C. Berg, and L. Fei-Fei. ImageNet Large Scale VisualRecognition Challenge.

International Journal of ComputerVision (IJCV) , 115(3), 2015.[35] T. Schlegl, P. Seeb¨ock, S. M. Waldstein, U. Schmidt-Erfurth,and G. Langs. Unsupervised anomaly detection with gen-erative adversarial networks to guide marker discovery. In

International conference on information processing in medi-cal imaging , pages 146–157. Springer, 2017.[36] F. Shan, Y. Gao, J. Wang, W. Shi, N. Shi, M. Han,Z. Xue, and Y. Shi. Lung infection quantiﬁcation ofcovid-19 in ct images with deep learning. arXiv preprintarXiv:2003.04655 , 2020.[37] K. Sohn, D. Berthelot, C.-L. Li, Z. Zhang, N. Carlini, E. D.Cubuk, A. Kurakin, H. Zhang, and C. Raffel. Fixmatch:Simplifying semi-supervised learning with consistency andconﬁdence, 2020.[38] S. Vaswani, A. Mishkin, I. Laradji, M. Schmidt, G. Gidel,and S. Lacoste-Julien. Painless stochastic gradient: Interpo-lation, line-search, and convergence rates. In

Advances inNeural Information Processing Systems , pages 3732–3745,2019.[39] S. Vaswani, F. Kunstner, I. Laradji, S. Y. Meng, M. Schmidt,and S. Lacoste-Julien. Adaptive gradient methods convergefaster with over-parameterization (and you can do a line- search). arXiv preprint arXiv:2006.06835 , 2020.[40] L. Wang and A. Wong. Covid-net: A tailored deep convolu-tional neural network design for detection of covid-19 casesfrom chest x-ray images. arXiv preprint arXiv:2003.09871 ,2020.[41] L. L. Wang, K. Lo, Y. Chandrasekhar, R. Reas, J. Yang,D. Eide, K. Funk, R. Kinney, Z. Liu, W. Merrill, et al. Cord-19: The covid-19 open research dataset. arXiv preprintarXiv:2004.10706 , 2020.[42] S. Wang, M. Zhou, Z. Liu, Z. Liu, D. Gu, Y. Zang, D. Dong,O. Gevaert, and J. Tian. Central focused convolutional neuralnetworks: Developing a data-driven model for lung nodulesegmentation.

Medical image analysis , 40:172–183, 2017.[43] Y. Wang, C. Dong, Y. Hu, C. Li, Q. Ren, X. Zhang,H. Shi, and M. Zhou. Temporal changes of CT ﬁnd-ings in 90 patients with COVID-19 pneumonia: A longi-tudinal study.

Radiology , page 200843, Mar. 2020. doi:10.1148/radiol.2020200843. URL https://doi.org/10.1148/radiol.2020200843 .[44] Y. Wang, C. Dong, Y. Hu, C. Li, Q. Ren, X. Zhang, H. Shi,and M. Zhou. Temporal changes of ct ﬁndings in 90 patientswith covid-19 pneumonia: A longitudinal study.

Radiology ,2020.[45] Y. Wang, J. Zhang, M. Kan, S. Shan, and X. Chen. Self-supervised equivariant attention mechanism for weakly su-pervised semantic segmentation, 2020.[46] J. Wei, S. Wang, and Q. Huang. F3net: Fusion, feed-back and focus for salient object detection. arXiv preprintarXiv:1911.11445 , 2019.[47] Q. Xie, Z. Dai, E. Hovy, M.-T. Luong, and Q. V. Le. Unsu-pervised data augmentation for consistency training, 2019.[48] S. Xie, R. Girshick, P. Doll´ar, Z. Tu, and K. He. Aggre-gated residual transformations for deep neural networks. In

Proceedings of the IEEE conference on computer vision andpattern recognition , pages 1492–1500, 2017.[49] C. Zheng, X. Deng, Q. Fu, Q. Zhou, J. Feng, H. Ma, W. Liu,and X. Wang. Deep learning-based detection for covid-19from chest ct using weak label. medRxiv , 2020.[50] T. Zhou, S. Canu, and S. Ruan. An automatic covid-19ct segmentation based on u-net with attention mechanism. arXiv preprint arXiv:2004.06673 , 2020.[51] Y. Zhou, Y. Zhu, Q. Ye, Q. Qiu, and J. Jiao. Weakly su-pervised instance segmentation using class peak response.

CVPR , 2018.[52] Y. Zhou, Y. Zhu, Q. Ye, Q. Qiu, and J. Jiao. Weakly su-pervised instance segmentation using class peak response.In

Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition , pages 3791–3800, 2018.[53] Z. Y. Zu, M. D. Jiang, P. P. Xu, W. Chen, Q. Q. Ni, G. M.Lu, and L. J. Zhang. Coronavirus disease 2019 (covid-19): aperspective from china.

Radiology , page 200490, 2020., page 200490, 2020.