IEEE transactions on pattern analysis and machine intelligence | 2021
Improving Deep Metric Learning by Divide and Conquer
Abstract
Deep metric learning aims at learning a mapping from the input domain to an embedding space, where semantically similar objects are located nearby and dissimilar objects far from another. However, while the embedding space learns to mimic the user-provided similarity on the training data, it should also generalize to novel categories not seen during training. Besides user-provided training labels, a lot of additional visual factors (such as viewpoint changes or shape peculiarities) exist and imply different notions of similarity between objects, affecting the generalization on novel images. However, existing approaches usually directly learn a single embedding space on all available training data, struggling to encode all different types of relationships, and do not generalize well. We propose to build a more expressive representation by jointly splitting the embedding space and the data hierarchically into smaller sub-parts. We successively focus on smaller subsets of the training data, reducing its variance and learning a different embedding subspace for each data-subset. Moreover, the subspaces are learned jointly to cover not only the intricacies, but the breadth of the data as well. Our approach significantly improves upon the state-of-the-art in image retrieval and clustering on CUB200-2011, CARS196, SOP, In-shop Clothes, and VehicleID datasets.