[PDF] Unsupervised Task Design to Meta-Train Medical Image Classifiers

Abstract

Meta-training has been empirically demonstrated to be the most effective pre-training method for few-shot learning of medical image classifiers (i.e., classifiers modeled with small training sets). However, the effectiveness of meta-training relies on the availability of a reasonable number of hand-designed classification tasks, which are costly to obtain, and consequently rarely available. In this paper, we propose a new method to unsupervisedly design a large number of classification tasks to meta-train medical image classifiers. We evaluate our method on a breast dynamically contrast enhanced magnetic resonance imaging (DCE-MRI) data set that has been used to benchmark few-shot training methods of medical image classifiers. Our results show that the proposed unsupervised task design to meta-train medical image classifiers builds a pre-trained model that, after fine-tuning, produces better classification results than other unsupervised and supervised pre-training methods, and competitive results with respect to meta-training that relies on hand-designed classification tasks.

Full PDF

UUnsupervised Task Design to Meta-TrainMedical Image Classiﬁers (cid:63)

Gabriel Maicas † Cuong Nguyen † Farbod Motlagh † Jacinto C. Nascimento ‡‡ Gustavo Carneiro † † Australian Institute for Machine Learning, The University of Adelaide ‡‡ Institute for Systems and Robotics, Instituto Superior Tecnico, Portugal

Abstract.

Meta-training has been empirically demonstrated to be themost eﬀective pre-training method for few-shot learning of medical im-age classiﬁers (i.e., classiﬁers modeled with small training sets). However,the eﬀectiveness of meta-training relies on the availability of a reason-able number of hand-designed classiﬁcation tasks, which are costly toobtain, and consequently rarely available. In this paper, we propose anew method to unsupervisedly design a large number of classiﬁcationtasks to meta-train medical image classiﬁers. We evaluate our methodon a breast dynamically contrast enhanced magnetic resonance imaging(DCE-MRI) data set that has been used to benchmark few-shot trainingmethods of medical image classiﬁers. Our results show that the proposedunsupervised task design to meta-train medical image classiﬁers buildsa pre-trained model that, after ﬁne-tuning, produces better classiﬁca-tion results than other unsupervised and supervised pre-training meth-ods, and competitive results with respect to meta-training that relies onhand-designed classiﬁcation tasks.-

Keywords: meta-training, unsupervised learning, unsupervised task de-sign, breast image analysis, magnetic resonance imaging, few-shot, pre-training, clustering.

The accuracy and robustness of deep learning based medical image classiﬁers isgenerally positively correlated with the size of the annotated training set usedduring the modelling process [1]. However, large annotated training sets areexpensive and not readily available for some medical image analysis applications,such as breast screening from DCE-MRI [2]. Therefore, training medical imageclassiﬁers with small annotated training sets has become a highly investigatedtopic, particularly after the advent of deep learning [1].The most competitive medical image classiﬁers are currently based on convo-lutional neural networks (CNNs) [1] that need large training sets to be properlymodelled. To reduce the need for such large annotated sets, pre-training ap-proaches have been explored in medical image analysis, where the most relevantfor our paper are: 1) supervised pre-training using independent data sets [5],where the model is pre-trained by solving a classiﬁcation problem in a diﬀerentdata set; 2) unsupervised pre-training using clustering [3], where the model is (cid:63)

Supported by Australian Research Council through grant DP180103232. a r X i v : . [ c s . C V ] J u l Authors Suppressed Due to Excessive Length

Fig. 1:

Unsupervised task design to meta-train medical image classiﬁers. Deep cluster-ing [3] produces a set of clusters that are used in the unsupervised design of classiﬁcationtasks. These tasks are used in a meta-training process to produce a pre-trained modelthat can be ﬁne-tuned to new classiﬁcation tasks using small labelled training sets, inthis paper represented by the breast screening problem from DCE-MRI [4]. pre-trained by performing clustering without any knowledge about the groundtruth labels; and 3) unsupervised pre-training using input reconstruction [6],where the model is pre-trained by reconstructing the input images of the train-ing set. Arguably, the main issue with these pre-training methods is that theirobjective functions are irrelevant for the medical image classiﬁer being developeddownstream. Alternatively, the need for pre-training methods can be alleviatedwith the use of other types of training methods, such as multiple instance learn-ing (MIL) [7] or multi-task learning [8], but both methods still need large trainingsets. More recently, a pre-trained model produced by supervised meta-training(i.e., a meta-training process that depends on hand-designed classiﬁcation tasks)showed superior performance compared to the previously described pre-trainingmethods [4]. Nevertheless, these promising meta-training results are counterbal-anced by an unappealing need of an expensive hand-designing process to producethe classiﬁcation tasks [4]. Given the high cost of this process, the availability ofa large number of hand-designed classiﬁcation tasks is rare, which hampers theexploration of meta-training for medical image classiﬁers.In this paper, we propose a new method to unsupervisedly produce a largenumber classiﬁcation tasks to meta-train medical image classiﬁers. To this end,we use deep clustering [3] to automatically build image clusters that can begrouped in diﬀerent ways to enable the design of multiple classiﬁcation tasksemployed in the meta-training process – see Fig. 1. We evaluate our methodon the breast screening classiﬁcation task from a breast DCE-MRI data setthat has been used to benchmark few-shot training algorithms of medical imageclassiﬁers [4]. Results show that our proposed approach produces classiﬁcationresults that are signiﬁcantly better than other unsupervised and supervised pre-training methods, and competitive to supervised meta-training. nsupervised Task Design to Meta-Train Medical Image Classiﬁers 3

DCE-MRI is a recommended image modality in breast screening programs forpatients at high-risk [9]. However, DCE-MRI interpretation is time-consumingand prone to high inter-observer variability [10]. Thus, computer-aided diagnosis(CAD) systems are being developed to assist radiologists increase their diagno-sis sensitivity [11] and speciﬁcity [12], and reduce analysis time. However, thedevelopment of CAD systems for breast DCE-MRI is challenging due in part tothe small size of annotated data sets available for training.Meta-training has been shown to be an eﬀective strategy to improve thelearning of classiﬁers using relatively small training sets [13]. For instance, Maicas et al. [4] proposed the use of hand-designed breast classiﬁcation tasks to meta-train a model that was then ﬁne-tuned to solve the breast screening task. Resultsshowed that this method improves over other strategies to train classiﬁers fromsmall data sets, such as MIL [7] and multi-task learning [8]. However, the methodproposed in [4] relies on costly hand-designed classiﬁcation tasks.Similarly to our paper, Hsu et al. [14] proposed an unsupervised methodto design computer vision classiﬁcation tasks for meta-training. Results showedthat this approach produced worse classiﬁcation performance than meta-trainingmodelled with hand-designed tasks (i.e., supervised meta-training). We believethat the reason behind this drop in performance lies in the large number ofhand-designed tasks already available for supervised meta-training in computervision applications [14], enabling a good classiﬁcation performance baseline. Thediﬃculty to obtain a large number of hand-designed tasks for medical imageclassiﬁcation problems means that the number of these hand-designed tasks willbe small, which may result in a relatively low classiﬁcation performance baseline.We hypothesize that our proposed method that unsupervisedly designs a largenumber of classiﬁcation tasks to meta-train a medical image classiﬁer can achievea classiﬁcation performance that is at least comparable to supervised meta-training [4] trained with a small number of hand-designed tasks. Our proposedmethod has the advantage that it does not rely on costly hand-designed tasks.

The data set is represented by D = { ( v i , t i , b i , y i ) } |D| i =1 , where v : Ω → R cor-responds to the ﬁrst DCE-MRI subtraction volume ( Ω denotes the volume lat-tice) [15], t : Ω → R represents the T1-weighted MRI only used to separatethe left and breast regions of the volume, b ∈ { left , right } indicates the left orright breast, and y ∈ Y = { , } indicates the classiﬁcation label: no malignantﬁndings, or malignant ﬁndings, respectively. The proposed unsupervised task design method builds several binary classiﬁca-tion tasks from image groups formed by deep clustering [3]. The training of deepclustering alternates an optimisation of two objective functions [3]. We denotethe θ -parameterised model that produces the unsupervised learning features by Authors Suppressed Due to Excessive Length f θ ( v ) ∈ R D and the ω -parameterised classiﬁer that produces a pseudo-labelrepresenting one of the unknown K classes and is placed on top of f θ ( . ) by g ω ( f θ ( v )) ∈ { , } K . The ﬁrst objective function is the cross-entropy loss (cid:96) ( . ) with respect to the pseudo-labels { (cid:101) y i } |D| i =1 , with (cid:101) y ∈ (cid:101) Y = { , } K , min θ,ω |D| |D| (cid:88) i =1 (cid:96) ( g ω ( f θ ( v i )) , (cid:101) y i ) , (1)which is used to estimate the optimal θ ∗ and ω ∗ . The second objective functionﬁnds the K centroids, denoted by C ∈ R D × K , and pseudo-labels (cid:101) y with min C |D| |D| (cid:88) i =1 min (cid:101) y i (cid:107) f θ ( v i ) − C (cid:101) y i (cid:107) , (2)where (cid:101) y i is a K -dim one-hot vector.Each step of the optimization above will generate new values for the modelparameters, centroids and pseudo-labels. We extend deep clustering [3] witha model selection process based on maximising the Silhouette coeﬃcient thatmeasures clustering quality [16] with κ = 1 |D| |D| (cid:88) i =1 b ( i ) − a ( i )max ( a ( i ) , b ( i )) , (3)where a ( i ) represents the average (cid:96) distance between f θ ( v i ) and all points f θ ( v j ) where i (cid:54) = j and (cid:101) y i = (cid:101) y j ; and b ( i ) denotes the smallest average (cid:96) distancebetween f θ ( v i ) and f θ ( v j ) where i (cid:54) = j and (cid:101) y i (cid:54) = (cid:101) y j .The unsupervised design of classiﬁcation tasks is based on the formationof L binary classiﬁcation problems derived from the pseudo-labels obtainedfrom (2). Each of these L binary classiﬁcation problems is built by randomlyselecting 2 nonempty and disjoint subsets K (0) l and K (1) l from the pseudo labelset { , , . . . , K } and labelling their corresponding data points as class 0 and1, respectively. Note that the number of classiﬁcation tasks for a given K is L = (cid:80) n − i =1 (cid:80) min( i,n − i ) k =1 ( ni ) × ( n − ik ) δ ( i − k ) , where (cid:0) AB (cid:1) denotes the binomial coeﬃcient,and δ ( . ) represents the Dirac delta function. Meta-training estimates the parameters of a meta-learner, so it can be used asa pre-trained model that is eﬃciently ﬁne-tuned to previously unseen classiﬁ-cation tasks, using small annotated training sets [13]. The algorithm assumesthat there exists a task distribution T , from which each classiﬁcation task T l isdrawn, where each task comprises a training set { v ( l,t ) i , (cid:101) y ( l,t ) i } Mi =1 and a testingset { v ( l,v ) i , (cid:101) y ( l,v ) i } Ni =1 , with M << N and M + N = |T l | . Meta-training iterativelysamples T tasks from T , and re-trains a multi-target classiﬁer for those tasksusing the training and testing sets deﬁned above.We use the MAML meta-training [17] that consists of a Bayesian hierarchicalmodel, where ψ denotes the classiﬁer meta parameter, and φ l represents the nsupervised Task Design to Meta-Train Medical Image Classiﬁers 5 parameter for task T l . The meta-training objective function is deﬁned by: max ψ log p ( Y ( v ) l =1 ..T |Y ( t ) l =1 ..T , V ( v ) l =1 ..T , V ( t ) l =1 ..T , ψ ) , (4)where T is the number of tasks per meta-training iteration, Y ( v ) l = { (cid:101) y ( l,v ) i } Ni =1 , Y ( t ) l = { (cid:101) y ( l,t ) i } Mi =1 , V ( v ) l = { v ( l,v ) i } Ni =1 , and V ( t ) l = { v ( l,t ) i } Mi =1 . In (4), we have log p ( Y ( v ) l =1 ..T |Y ( t ) l =1 ..T , V ( v ) l =1 ..T , V ( t ) l =1 ..T , ψ ) ≥ T (cid:88) l =1 E p ( φ l |Y ( t ) l , V ( t ) l ,ψ ) (cid:104) log p ( Y ( v ) l |V ( v ) l , φ l ) (cid:105) , (5)where the lower bound is derived from Jensen’s inequality [18]. Therefore, themaximisation in (4) is approximated with the lower bound maximisation in (5),where the posterior p ( φ l |Y ( t ) l , V ( t ) l , ψ ) is approximated with a Dirac delta func-tion at a local optimal task-speciﬁc model parameter φ ∗ l , with p ( φ l |Y ( t ) l , V ( t ) l , ψ ) = δ ( φ l − φ ∗ l ) . The local optimal model parameter φ ∗ i is obtained with truncatedgradient descent initialised by the meta parameters ψ : φ ∗ l = ψ − α ∇ φ l (cid:104) − log p ( Y ( t ) l | , V ( t ) l , φ l ) (cid:105) , (6)where α is the learning rate, and the truncated gradient descent consists ofa single step of (6). Maximising the lower bound of the log likelihood in (5)represents the MAML algorithm in [13], which produces a pre-trained modelthat can quickly learn new tasks drawn from T . We evaluate our proposed method on a breast DCE-MRI data set [2] (formallydeﬁned in Sec. 3.1), which has previously been used to evaluate few-shot trainingmethods [4]. To allow a fair comparison with previous papers, we split the dataset in a patient-wise manner into the same training, validation and testing sets,containing 45, 13, and 59 patients, respectively. We use the T1-weighted MRI toautomatically extract the left and right breast regions from the ﬁrst DCE-MRIsubtraction volume [4]. Each breast region is resized into a volume of × × [4]. For the breast screening problem, only breasts that contain a malignantﬁnding(s) are considered positive, while breasts with only benign ﬁndings or noﬁndings are considered negative. There are 30, 9, and 38 positive and 60, 17,and 80 negative breasts in the training, validation and testing sets, respectively.The model f θ ( v ) that unsupervisedly produces the volume features is a 3DDensenet [19] composed of ﬁve dense blocks, each containing two dense layers.The features represent the input to the deep clustering algorithm, explainedin Sec.3.2, with the number of clusters K ∈ { , , } . The model that is meta-trained, and ﬁne-tuned, has the same architecture as f θ ( . ) . During meta-training,we use a meta learning rate α = 0 . in (6). At each meta-iteration, a meta-batch size of T = 4 classiﬁcation tasks is sampled according to a random or acurriculum learning strategy [4]. The meta-trained model is ﬁne-tuned to the Authors Suppressed Due to Excessive Length breast screening task using the entire training set, where model selection is per-formed using the validation set and results are reported in the test set.The evaluation for the breast screening problem is based on the area underthe ROC curve (AUC). We also measure the standard error utilising an estimatebased on the Wilcoxon test [20] that estimates conﬁdence intervals based onthe testing set. In this evaluation, we study the type of task sampling for meta-training, i.e. random, or curriculum learning [4], and the inﬂuence of the numberof clusters K in (1), used to build the tasks. We compare our method (U-MT)with the previously proposed supervised meta-training for the case where thebreast screening task is included (S-MT (S)) and not included (S-MT (NS))in the meta-training process. We also compare our method with: a) Densenettrained from scratch on the breast screening task; b) Densenet from (a) ﬁne-tuned with MIL [7]; c) Densenet trained with multi-tasking (using hand-designedtasks) [4]; d) Densenet pre-trained as a variational autocoder (i.e., unsupervisedtraining) and ﬁne-tuned for the breast screening task; and e) Densenet pre-trained with deep clustering (i.e., unsupervised training) and ﬁne-tuned for thebreast screening task. All Densenet models of these competing methods havethe same architecture as the meta-trained model described above. The rationalefor baselines (d) and (e) is to evaluate the eﬀect of pre-training based on areconstruction or a clustering scheme. With this purpose, we present resultsbased on nearest neighbor classiﬁcation and the ﬁne-tuned classiﬁcation model. We show the AUC results ( ± standard error) for breast screening baselines inTab. 1. Table 2 presents the results of meta-training, as a function of K ∈{ , , } , with supervised and unsupervised task design using random and cur-riculum learning task sampling methods. Figure 2 presents examples of breastscreening classiﬁcation. Training Method Baseline AUC

From Scratch [19] . ± . MIL based ﬁne-tuning [7] . ± . Multi-Task [8] . ± . Variational Autoencoder + Nearest Neighbour . ± . Variational Autoencoder + Fine-Tune in breast screening . ± . Deep Clustering + Nearest Neighbour . ± . Deep Clustering + Fine-Tune in breast screening . ± . Table 1:

AUC results ( ± standard error) for breast screening baselines. We measure the statistical signiﬁcance of the diﬀerence in performance be-tween our best performing approaches (Random with K = 5 and Curriculumwith K = 5 ) and all baseline methods, obtaining a p-value p ≤ . for allcases (unpaired two-tailed t-test). Also, comparing our newly proposed U-MT(Random with K = 5 ) and S-MT (S) (Curriculum with K = 3 ) [4], we obtain ap-value p > . . nsupervised Task Design to Meta-Train Medical Image Classiﬁers 7 Random Curriculum K = 3 K = 4 K = 5 K = 3 K = 4 K = 5 S-MT [4] (S) . ± . N/A N/A . ± . N/A N/AS-MT [4] (NS) . ± . N/A N/A . ± . N/A N/A

U-MT (Ours) . ± .

05 0 . ± .

04 0 . ± .

04 0 . ± . Table 2:

AUC for the breast screening task for our proposed method (U-MT) as afunction of the number of image clusters K and the task sampling method (randomand curriculum). We also present the results of supervised meta-training [4] (S-MT)for the cases where the breast screening is included (labelled as S) and not included(labelled as NS) in the meta-training tasks. N/A indicates that the experiment is notfeasible due to the lack of extra ground truth labels. (a) (b) (c) (d)Fig. 2: Example of breast screening diagnosis produced by our approach. Image (2a)shows the correct positive diagnosis of a breast containing a malignant tumour. Image(2b) shows the correct negative diagnosis of a breast with a benign tumour. Image (2c)shows the incorrect positive classiﬁcation of a breast containing no tumours. Image(2d) shows the correct negative diagnosis of a breast with a benign tumour.

We have presented a new method that unsupervisedly designs classiﬁcation tasksto meta-train medical image classiﬁers. Our method signiﬁcantly outperformsseveral baselines consisting of traditional pre-training methods based on varia-tional autoencoder, deep clustering, MIL, and multi-task learning (see Tab. 1).Our method also produces results comparable to the state-of-the-art set by meta-training using hand-designed tasks [4] (see Tab. 2). However, instead of usingmanually deﬁned labels during meta-training, we unsupervisedly build classiﬁ-cation tasks, allowing us to build a larger set of tasks, compared to the hand-designed ones. Also from Tab. 2, we notice that larger number of tasks, whichincreases with the number of clusters (Sec. 3.2), generally implies better AUCresults. This conﬁrms our initial hypothesis that, diﬀerently from computer vi-sion problems, automatically building tasks is of great importance for medicalimage classiﬁcation problems, where image labels that allow a large number oftasks are costly to obtain. We also observe that sampling tasks according tocurriculum learning provides a good improvement of accuracy compared to ran-dom task sampling for a small number of clusters ( K = 3 ), but not for largernumber of tasks ( K = 5 ). We hypothesize that meta-training with curriculumlearning sampling needs a larger number of meta-iterations to learn a curriculumthat is better than random task sampling. Given the large number of tasks for Authors Suppressed Due to Excessive Length K ∈ { , } , the meta-training process converged before the curriculum learningalgorithm – that deserves further research. References

1. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M.,Van Der Laak, J.A., Van Ginneken, B., Sánchez, C.I.: A survey on deep learningin medical image analysis. Medical image analysis (2017)2. McClymont, D., Mehnert, A., Trakic, A., Kennedy, D., Crozier, S.: Fully automaticlesion segmentation in breast mri using mean-shift and graph-cuts on a regionadjacency graph. JMRI (2014)3. Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervisedlearning of visual features. In: ECCV. (2018)4. Maicas, G., Bradley, A.P., Nascimento, J.C., Reid, I., Carneiro, G.: Training med-ical image analysis systems like radiologists. In: MICCAI. (2018)5. Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E., Greenspan, H.: Chestpathology detection using deep learning with non-medical training. In: ISBI. (2015)6. Dong, L.F., Gan, Y.Z., Mao, X.L., Yang, Y.B., Shen, C.: Learning deep repre-sentations using convolutional auto-encoders with symmetric skip connections. In:ICASSP. (2018)7. Zhu, W., Lou, Q., Vang, Y.S., Xie, X.: Deep multi-instance networks with sparselabel assignment for whole mammogram classiﬁcation. In: MICCAI. (2017)8. Xue, W., Brahm, G., et al.: Full left ventricle quantiﬁcation via deep multitaskrelationships learning. Medical image analysis (2018)9. Mainiero, M.B., Moy, L., Baron, P., Didwania, A.D., Green, E.D., Heller, S.L.,Holbrook, A.I., Lee, S.J., Lewin, A.A., Lourenco, A.P., et al.: Acr appropriatenesscriteria R (cid:13)(cid:13)