Offline versus Online Triplet Mining based on Extreme Distances of Histopathology Patches
Milad Sikaroudi, Benyamin Ghojogh, Amir Safarpoor, Fakhri Karray, Mark Crowley, H.R. Tizhoosh
AAccepted for presentation at the 15th International Symposium on Visual Computing (ISVC) 2020, Springer.
Offline versus Online Triplet Mining based onExtreme Distances of Histopathology Patches
Milad Sikaroudi † , Benyamin Ghojogh ‡ (cid:63) , Amir Safarpoor † , Fakhri Karray ‡ ,Mark Crowley ‡ , Hamid R. Tizhoosh † † KIMIA Lab, University of Waterloo, Waterloo, ON, Canada ‡ Department of Electrical and Computer Engineering,University of Waterloo, Waterloo, ON, Canada { msikaroudi, bghojogh, asafarpo, karray, mcrowley, tizhoosh } @uwaterloo.ca Abstract.
We analyze the effect of offline and online triplet miningfor colorectal cancer (CRC) histopathology dataset containing 100,000patches. We consider the extreme, i.e., farthest and nearest patches toa given anchor, both in online and offline mining. While many worksfocus solely on selecting the triplets online (batch-wise), we also studythe effect of extreme distances and neighbor patches before training in anoffline fashion. We analyze extreme cases’ impacts in terms of embeddingdistance for offline versus online mining, including easy positive, batchsemi-hard, batch hard triplet mining, neighborhood component analysisloss, its proxy version, and distance weighted sampling. We also investi-gate online approaches based on extreme distance and comprehensivelycompare offline, and online mining performance based on the data pat-terns and explain offline mining as a tractable generalization of the onlinemining with large mini-batch size. As well, we discuss the relations ofdifferent colorectal tissue types in terms of extreme distances. We foundthat offline and online mining approaches have comparable performancesfor a specific architecture, such as ResNet-18 in this study. Moreover, wefound the assorted case, including different extreme distances, is promis-ing, especially in the online approach.
Keywords:
Histopathology, Triplet mining, Extreme distances, Onlinemining, Offline mining, Triplet network
With the advent of the deep learning methods, image analysis algorithms leveledand, in some cases, surpassed human expert performance. But due to the lack ofinterpretability, the deep model decision is not transparent enough. Additionally,these models need a massive amount of the labeled data, which can be expen-sive and time consuming for medical data [1]. To address the interpretabilityissue, one may evaluate the performance by enabling consensus, for example,by retrieving similar cases. An embedding framework, such as the triplet loss, (cid:63)
The first two authors contributed equally to this work a r X i v : . [ c s . C V ] S e p upervisedDeep Network Feature SpaceDistance Matrix Calculation OutlierRemoval TripletDataset Triplet Network Embedding SpaceTraining Training Embedded X X Feeding X Fig. 1.
Block diagram for the offline triplet mining approach. can be applied for training models to overcome the expensive label requirement,where either soft or hard similarities can be used [2]. In triplet loss, triplets ofanchor-positive-negative instances are considered where the anchor and positiveinstances belong to the same class or are similar. Still, the negative instance be-longs to another class or is dissimilar to them. Triplet loss aims to decrease andincrease the intra-class and inter-class variances of embeddings, respectively, bypulling the anchor and positive closer and pushing the negative away [3].Since the introduction of the triplet loss, many updated versions have beenproposed to increase efficiency and improve generalization. Furthermore, consid-ering beneficial aspects of these algorithms, such as unsupervised feature learn-ing, data efficiency, and better generalization, the triplet techniques are appliedto many other applications, like representation learning in pathology images[4,5,6,2] and other medical applications [7]. Schroff et al. [8] proposed a methodto encode images into a space with distances reflecting the dissimilarity betweeninstances. They trained a deep neural network using triplets, including similarand dissimilar cases.Later, a new family of algorithms emerged to address shortcomings of thetriplet loss by selecting more decent triplets with the notion of the similarity forthe network while training. These efforts, such as Batch All (BA) [9], Batch Semi-Hard (BSH) [8], Batch Hard (BH) [10,11], Neighborhood Components Analysis(NCA) [12], Proxy-NCA (PNCA) [13,14], Easy Positive (EP) [15], and DistanceWeighted Sampling (DWS) [16] fall into online triplet mining category wheretriplets are created and altered during training within each batch. As the onlinemethods rely on mini-batches of data, they may not reflect the data neighbor-hood correctly; thus, they can result in a sub-optimal solution. In the offlinetriplet mining , a triplet dataset is created before the training session, while alltraining samples are taken into account. As a result, in this study, we investi-gate four offline and five online approaches based on four different extreme casesimposed on the positive and negative samples for triplet generation. Our contri-butions in this work are two-fold. For the first contribution, we have investigatedfour online methods and the existing approaches, and five offline methods, basedon extreme cases. Secondly, we will compare different triplet mining methods forhistopathology data to analyze them based on their patterns.he remainder of this paper is organized as follows. Section 2 introduces theproposed offline triplet mining methods. In Section 3, we review the online tripletmining methods and propose new online strategies based on extreme distances.The experiments and comparisons are reported in Section 4. Finally, Section 5concludes the paper and reports the possible future work.
Notations:
Consider a training dataset X where x i denotes an instance inthe i -th class. Let b and c denote the mini-batch size and the number of classes,respectively, and D be a distance metric function, e.g., squared (cid:96) norm. Thesample triplet size per class in batch is w := (cid:98) b/c (cid:99) . We denote the anchor,positive, and negative instance in the i -th class by x ia , x ip , and x in , respectively,and their deep embeddings by y ia , y ip , and y in , respectively. In the offline triplet mining approach, the processing of data is not performedduring the triplet network training but beforehand. The extreme distances arecalculated only once on the whole training dataset, not in the mini-batches. Thehistopathology patterns in the input space cannot be distinguished, especiallyfor the visually similar tissues [17]. Hence, we work on the extreme distancesin the feature space trained using the class labels. The block diagram of theproposed offline triplet mining is depicted in Fig. 1. In the following, we explainthe steps of mining in detail.
Training Supervised Feature Space:
We first train a feature space ina supervised manner. For example, a deep network with a cross-entropy lossfunction can be used for training this space where the embedding of the one-to-last layer is extracted. We want the feature space to use the labels to betterdiscriminate classes by increasing their inter-class distances. Hence, We use a setof training data, call it X , for training the supervised network. Distance Matrix in the Feature Space:
After training the supervisednetwork, we embed another set of the training data, denoted by X (where X ∪ X = X and X ∩ X = ∅ ), in the feature space. We compute a distance matrixon the embedded data in the feature space. Therefore, using a distance matrix, wecan find cases with extreme distances. We consider every x ∈ X as an anchor ina triplet where its nearest or farthest neighbors from the same and other classesare considered as its positive and negative instances, respectively. We have fourdifferent cases with extreme distances, i.e., Easiest Positive and Easiest Negative(EPEN), Easiest Positive and Hardest Negative (EPHN), Hardest Positive, andEasiest Negative (HPEN), and Hardest Positive and Hardest Negative (HPHN).We also have the assorted case where one of the extreme cases is randomlyselected for a triplet.There might exist some outliers in data whose embeddings fall much apartfrom others. In that case, merely one single outlier may become the hardestnegative for all anchors. We prevent this issue by a statistical test [18], wherefor every data instance in X embedded in the feature space, the distances fromother instances are standardized using the Z -score normalization. We consider moothmuscleadipose background debris lympho mucusnormal stroma tumor Fig. 2.
Example patches for the different tissue types in the large CRC dataset.
Stroma Stroma Adipose Tumor Tumor NormalNormal Normal Debris Lympho Lympho Debris (a) (b)(c) (d)
Fig. 3.
Examples for extreme distance triplets: (a) EPEN, (b) EPHN, (c) HPEN, and(d) HPHN. the instances having distances above the 99-th percentile (i.e., normalized dis-tances above the threshold 2 . Training the Triplet Network:
After preparing the triplets in any extremecase, a triplet network [8] is trained using the triplets for learning an embeddingspace for better discrimination of dissimilar instances holding similar instancesclose enough. We call the spaces learned by the supervised and triplet networksas the feature space and embedding space, respectively (see Fig. 1).
In the online triplet mining approach, data processing is performed during thetraining phase and in the mini-batch of data. In other words, the triplets areound in the mini-batch of data and not fed as a triplet-form input to the network.There exist several online mining methods in the literature which are introducedin the following. We also propose several new online mining methods based onextreme distances of data instances in the mini-batch.
Batch All [9]:
One of the online methods which consider all anchor-positiveand anchor-negative in the mini-batch. Its loss function is in the regular tripletloss format, summed over all the triplets in the mini-batch, formulated as L BA := c (cid:88) i =1 c (cid:88) j =1 , j (cid:54) = i w (cid:88) a =1 w (cid:88) p =1 , p (cid:54) = a w (cid:88) n =1 (cid:104) m + D ( y ia , y ip ) − D ( y ia , y jn ) (cid:105) + , (1)where m is the margin between positives and negatives and [ . ] + := max( .,
0) isthe standard Hinge loss.
Batch Semi-Hard [8]:
The hardest (nearest) negative instance in the mini-batch, which is farther than the positive, is selected. Its loss function is L BSH := c (cid:88) i =1 w (cid:88) a =1 w (cid:88) p =1 p (cid:54) = a (cid:104) m + D ( y ia , y ip ) − min j ∈{ ,...,c }\{ i } n ∈{ ,...,w } {D ( y ia , y jn ) |D ( y ia , y jn ) > D ( y ia , y ip ) } (cid:105) + . (2) Batch Hard [10]:
The Hardest Positive and Hardest Negative (HPHN),which are the farthest positive and nearest negative in the mini-batch, are se-lected. Hence, its loss function is L BH := c (cid:88) i =1 w (cid:88) a =1 (cid:104) m + max p ∈{ ,...,w }\{ a } D ( y ia , y ip ) − min j ∈{ ,...,c }\{ i } n ∈{ ,...,w } D ( y ia , y jn ) (cid:105) + . (3) NCA [12]:
The softmax form [19] instead of the regular triplet loss [8] isused. It considers all possible negatives in the mini-batch for an anchor by L NCA := − c (cid:88) i =1 w (cid:88) a =1 ln (cid:16) exp( −D ( y ia , y ip )) (cid:80) cj =1 , j (cid:54) = i (cid:80) wn =1 exp( −D ( y ia , y jn )) (cid:17) , (4)where ln( . ) is the natural logarithm and exp( . ) is the exponential power operator. Proxy-NCA [13]:
A set of proxies P , e.g., the center of classes, with thecardinality of the number of classes is used. An embedding y is assigned to aproxy as Π ( y ) := arg min π ∈P D ( y, π ) for memory efficiency. PNCA uses theproxies of positive and negatives in the NCA loss: L PNCA := − c (cid:88) i =1 w (cid:88) a =1 ln (cid:16) exp (cid:0) −D ( y ia , Π ( y ip )) (cid:1)(cid:80) cj =1 , j (cid:54) = i (cid:80) wn =1 exp (cid:0) −D ( y ia , Π ( y jn )) (cid:1) (cid:17) . (5) Easy Positive [15]:
Let y iep := arg min p ∈{ ,...,w }\{ a } D ( y ia , y ip ) be the easiest (nearest) positive for the anchor. If the embeddings are a) (b) (c) Fig. 4.
Chord diagrams of the negatives in the offline mining based on extreme dis-tances: (a) nearest negatives (EPHN & HPHN), (b) farthest negatives (EPEN &HPEN), and (c) assorted . The flow from i to j means that i takes j as a negative. normalized and fall on a unit hyper-sphere, the loss in EP method is L EP := − c (cid:88) i =1 w (cid:88) a =1 ln (cid:16) exp( y i (cid:62) a y iep )exp( y i (cid:62) a y iep ) + (cid:80) cj =1 , j (cid:54) = i (cid:80) wn =1 exp( y i (cid:62) a y in ) (cid:17) . (6)Our experiments showed that for the colorectal cancer (CRC) histopathologydataset [20], the performance improves if the inner products in Eq. (6) are re-placed with minus distances. We call the EP method with distances by EP-Dwhose loss function is L EP-D := − c (cid:88) i =1 w (cid:88) a =1 ln (cid:16) exp( −D ( y ia , y iep ))exp( −D ( y ia , y iep )) + (cid:80) cj =1 , j (cid:54) = i (cid:80) wn =1 exp( D ( y ia , y in )) (cid:17) . (7) Distance Weighted Sampling [16]:
The distribution of the pairwise dis-tances is proportional to q ( D ( y , y )) := D ( y , y ) p − (1 − . D ( y , y ) ) ( n − / [16]. For a triplet, the negative sample is drawn as n ∗ ∼ P ( n | a ) ∝ min( λ,q − ( D ( y ia , y jn ))) , ∀ j (cid:54) = i . The loss function in the DWS method is L DWS := c (cid:88) i =1 w (cid:88) a =1 w (cid:88) p =1 , p (cid:54) = a (cid:104) m + D ( y ia , y ip ) − D ( y ia , y n ∗ ) (cid:105) + . (8) Extreme Distances:
We propose four additional online methods based onextreme distances. We consider every instance once as an anchor in the mini-batch and take its nearest/farthest same-class instance as the easiest/hardestpositive and its nearest/farthest other-class instance as the hardest/easiest nega-tive instance. Hence, four different cases, i.e., EPEN, EPHN, HPEN, and HPHN, able 1.
Results of offline triplet mining on the training and test data
Train TestR@1 R@4 R@8 R@16 Acc. R@1 R@4 R@8 R@16 Acc.EPEN 92.60 97.66 98.85 99.48 95.87 89.86 96.78 98.20 99.11 94.58EPHN
HPEN 93.22 96.93 97.71 98.35 96.16 87.11 97.01 98.83 99.59 94.10HPHN 81.62 89.73 93.15 95.78 91.19 42.71 71.07 86.13 95.32 71.25 assorted
Table 2.
Results of online triplet mining on the training and test data
Train TestR@1 R@4 R@8 R@16 Acc. R@1 R@4 R@8 R@16 Acc.BA [9] 95.13 98.45 99.20 99.60 97.73 82.42 93.94 96.93 98.58 90.85BSH [8] 95.83 98.77 assorted exist. Inspiration for the extreme values, especially the farthest, was the opposition-based learning [21,22]. HPHN is equivalent to BH, which has already been ex-plained. We can also have a mixture of these four cases (i.e., assorted case) wherefor every anchor in the mini-batch, one of the cases is randomly considered. Theproposed online mining loss functions are as follows: L EPEN := c (cid:88) i =1 w (cid:88) a =1 (cid:104) m + min p ∈{ ,...,w }\{ a } D ( y ia , y ip ) − max j ∈{ ,...,c }\{ i } n ∈{ ,...,w } D ( y ia , y jn ) (cid:105) + , (9) L EPHN := c (cid:88) i =1 w (cid:88) a =1 (cid:104) m + min p ∈{ ,...,w }\{ a } D ( y ia , y ip ) − min j ∈{ ,...,c }\{ i } n ∈{ ,...,w } D ( y ia , y jn ) (cid:105) + , (10) L HPEN := c (cid:88) i =1 w (cid:88) a =1 (cid:104) m + max p ∈{ ,...,w }\{ a } D ( y ia , y ip ) − max j ∈{ ,...,c }\{ i } n ∈{ ,...,w } D ( y ia , y jn ) (cid:105) + , (11) L Assorted := c (cid:88) i =1 w (cid:88) a =1 (cid:104) m + min / max p ∈{ ,...,w }\{ a } D ( y ia , y ip ) − min / max j ∈{ ,...,c }\{ i } n ∈{ ,...,w } D ( y ia , y jn ) (cid:105) + , (12)here min / max denotes random selection between the minimum and maximumoperators. Dataset:
We used the large colorectal cancer (CRC) histopathology dataset[20] with 100,000 stain-normalized 224 ×
224 patches. The large CRC datasetincludes nine classes of tissues, namely adipose, background, debris, lympho-cytes (lymph), mucus, smooth muscle, normal colon mucosa (normal), cancer-associated stroma, and colorectal adenocarcinoma epithelium (tumor). Some ex-ample patches for these tissue types are illustrated in Fig. 2.
Experimental Setup:
We split the data into 70K, 15K, and 15K set ofpatches, respectively, for X , X , and the test data, denoted by X t . We usedResNet-18 [23] as the backbone of both the supervised network and the tripletnetwork. For the sake of a fair comparison, the mini-batch size in offline andonline mining approaches was set to 48 (16 sets of triplets) and 45 (5 samplesper each of the 9 classes), respectively, which are roughly equal. The learningrate, the maximum number of epochs, and the margin in triplet loss were 10 − ,50, and 0 .
25, respectively. The feature-length and embedding spaces were both128.
Offline Patches with Extreme Distance:
Figure 3 depicts some exam-ples for the offline created triplets with extreme distances in the feature space.The nearest/farthest positives and negatives are visually similar/dissimilar tothe anchor patches, as expected. It shows that the learned feature space is a sat-isfactory subspace for feature extraction, which is reasonably compatible withvisual patterns.
Relation of the Colorectal Tissues:
The chord diagrams of negativeswith extreme distances in offline mining are illustrated in Fig. 4. In both thenearest and farthest negatives, the background and normal tissues have notbeen negatives of any anchor. Some stroma and debris patches are the nearestnegatives for smooth muscle, as well as adipose for background patches, andlymph, mucus, and tumor for normal patches. It stems from the fact that thesepatches’ patterns are hard to discriminate, especially tumor versus normal andstroma and debris versus smooth muscle. In farthest negatives, lymph, debris,mucus, stroma, and tumor are negatives of smooth muscle, as well as debris,smooth muscle, and lymph for adipose texture, and adipose and smooth musclefor normal patches. It is meaningful since they have different patterns. Differenttypes of negatives are selected in the assorted case, which is a mixture of thenearest and farthest negative patches. It gives more variety to the triplets sothat the network sees different cases in training.
Offline versus Online Embedding:
The evaluation of the embeddingspaces found by different offline and online methods are reported in Tables 1and 2, respectively. The Recall@ (with ranks 1, 4, 8, and 16) and closest neigh-bor accuracy metrics are reported. ig. 5.
The top 10 retrievals (left to right) of a tumor patch query for different lossfunctions. The patches with no frame are tumor patches.
In offline mining, HPHN has the weakest performance on both training andtest sets, showing whether the architecture or embedding dimensionality is smallfor these strictly hard cases or the network might be under-parameterized. Weperformed another experiment and used ResNet-50 to see whether a more com-licated architecture would help [10]. The results showed that for the same max-imum number of epochs, either would increase embedding dimensionality to 512or utilizing the ResNet-50 architecture increased the accuracy by 4%. The testaccuracy in online mining is not as promising as in offline mining because inonline mining we only select a small portion of each class in a mini-batch. Thechance of having the most dissimilar/similar patches in a mini-batch is muchlower than the case we select triplets in an offline manner. In other words, min-ing in mini-batches definitely depends upon a representative population of everyclass in each batch. Besides, the slightly higher training accuracy of the onlinemanner compared to offline mining can be a herald of overfitting in online min-ing. Tables 1 and 2 show that the easiest negatives have comparable results.It is because the histopathology patches (specifically this dataset) may havesmall intra-class varience for most of the tissue types (e.g., lympho tissue) andlarge intra-class variance for some others (e.g., normal tissue). Moreover, thereis a small inter-class variance in these patches (with similar patterns, e.g., thetumor and normal tissue types are visually similar); hence, using the easy neg-atives would not drop the performance drastically. Moreover, as seen in Fig.4, the hardest negatives might not be perfect candidates for negative patchesin histopathology data because many patches from different tissue types erro-neously include shared textures in the patching stage [20]. In addition to this, thesmall inter-class variety, explains why the hardest negatives struggle in reachingthe best performance, as also reported in [10]. Furthermore, literature has shownthat the triplets created based on the easiest extreme distances can avoid over-clustering and yield to better performance [15], which can also be acknowledgedby our results. The assorted approach also has decent performance. Because boththe inter-class and intra-class variances are considered. Finally, offline and onlineways can be compared in terms of batch size. Increasing the batch size can causethe training of the network to be intractable [13]. On the contrary, a larger batchsize implies a better statistical population of data to have a decent representa-tive of every class. An ideal method has a large batch size without sacrificingthe tractability. The online approaches can be considered as a special case of of-fline mining where the mini-batch size is the number of all instances. The offlineapproach is tractable because of making the triplets in pre-processing. As theTables 1 and 2 show, the offline and online mining approaches have comparableperformances. The promising performance of online approach has already beeninvestigated by the literature. Here, we also show the promising performanceof the offline approach which is because of a good statistical representation forworking on the whole data population. In addition, Table 2 shows that the as-sorted case can result in acceptable embedding because of containing differentcases of extreme distances of histopathology patches.
Retrieval of Histopathology Patches:
Finally, in Fig. 5, we report thetop retrievals for a sample tumor query [24]. As the figure shows, EPEN, HPEN,and assorted cases have the smallest false retrievals among the offline methods.In online mining, BSH, DWS, EPEN, and HPEN have the best performance.These findings coincide with Tables 1 and 2 results showing these methods hadetter performance. Comparing the offline and online methods in Fig. 5 showsthat more number of online approaches than offline ones have false retrievalsdemonstrating that offline methods benefit from a better statistical populationof data.
In this paper, we comprehensively analyzed the offline and online approachesfor colorectal histopathology data. We investigated twelve online and five offlinemining approaches, including the state-of-the-art triplet mining methods and ex-treme distance cases. We explained the performance of offline and online miningin terms of histopathology data patterns. The offline mining was interpreted asa tractable generalization of the online mining where the statistical populationof data is better captured for triplet mining. We also explored the relation ofthe colorectal tissues in terms of extreme distances.One possible future direction is to improve upon the existing triplet samplingmethods, such as [16], for online mining and applying that on the histopathologydata. One can consider dynamic updates of probabilistic density functions ofthe mini-batches to sample triplets from the embedding space. This dynamicsampling may improve embedding of histopathology data by exploring more ofthe embedding space in a stochastic manner.
References
1. Tizhoosh, H.R., Pantanowitz, L.: Artificial intelligence and digital pathology: chal-lenges and opportunities. Journal of Pathology Informatics (2018)2. Sikaroudi, M., Safarpoor, A., Ghojogh, B., Shafiei, S., Crowley, M., Tizhoosh, H.:Supervision and source domain impact on representation learning: A histopathol-ogy case study. In: 2020 International Conference of the IEEE Engineering inMedicine and Biology Society (EMBC), IEEE (2020)3. Ghojogh, B., Sikaroudi, M., Shafiei, S., Tizhoosh, H., Karray, F., Crowley, M.:Fisher discriminant triplet and contrastive losses for training siamese networks. In:2020 International Joint Conference on Neural Networks (IJCNN), IEEE (2020)4. Teh, E.W., Taylor, G.W.: Metric learning for patch classification in digital pathol-ogy. In: Medical Imaging with Deep Learning (MIDL) Conference. (2019)5. Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shotimage recognition. In: ICML Deep Learning Workshop. Volume 2. (2015)6. Medela, A., Picon, A., Saratxaga, C.L., Belar, O., Cabez´on, V., Cicchi, R., Bilbao,R., Glover, B.: Few shot learning in histopathological images: reducing the need oflabeled data on biological datasets. In: 2019 IEEE 16th International Symposiumon Biomedical Imaging (ISBI), IEEE (2019) 1860–18647. Wang, J., Fang, Z., Lang, N., Yuan, H., Su, M.Y., Baldi, P.: A multi-resolutionapproach for spinal metastasis detection using deep siamese neural networks. Com-puters in Biology and Medicine (2017) 137–1468. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for facerecognition and clustering. In: Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition. (2015) 815–823. Ding, S., Lin, L., Wang, G., Chao, H.: Deep feature learning with relative distancecomparison for person re-identification. Pattern Recognition (10) (2015) 2993–300310. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)11. Peng, T., Boxberg, M., Weichert, W., Navab, N., Marr, C.: Multi-task learning of adeep k-nearest neighbour network for histopathological image classification and re-trieval. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer (2019) 676–68412. Goldberger, J., Hinton, G.E., Roweis, S.T., Salakhutdinov, R.R.: Neighbourhoodcomponents analysis. In: Advances in Neural Information Processing Systems.(2005) 513–52013. Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss dis-tance metric learning using proxies. In: Proceedings of the IEEE InternationalConference on Computer Vision. (2017) 360–36814. Teh, E.W., Taylor, G.W.: Learning with less data via weakly labeled patch clas-sification in digital pathology. In: 2020 IEEE 17th International Symposium onBiomedical Imaging (ISBI), IEEE (2020) 471–47515. Xuan, H., Stylianou, A., Pless, R.: Improved embeddings with easy positive tripletmining. In: The IEEE Winter Conference on Applications of Computer Vision.(2020) 2474–248216. Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deepembedding learning. In: Proceedings of the IEEE International Conference onComputer Vision. (2017) 2840–284817. Jimenez-del Toro, O., Ot´alora, S., Andersson, M., Eur´en, K., Hedlund, M., Rous-son, M., M¨uller, H., Atzori, M.: Analysis of histopathology images: From tradi-tional machine learning to deep learning. In: Biomedical Texture Analysis. Elsevier(2017) 281–31418. Aggarwal, C.C.: Outlier Analysis. 2 edn. Springer International Publishing (2017)19. Ye, M., Zhang, X., Yuen, P.C., Chang, S.F.: Unsupervised embedding learning viainvariant and spreading instance feature. In: Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition. (2019) 6210–621920. Kather, J.N., Krisam, J., Charoentong, P., Luedde, T., Herpel, E., Weis, C.A.,Gaiser, T., Marx, A., Valous, N.A., Ferber, D., et al.: Predicting survival fromcolorectal cancer histology slides using deep learning: A retrospective multicenterstudy. PLoS Medicine16