Archive | 2021

Ultra-Fine-Grained Visual Categorization

 

Abstract


Ultra-fine-grained visual categorization (ultra-FGVC) identifies objects at a very fine granularity where even human experts can hardly identify or describe the visual difference. Ultra-FGVC, as a pioneering computer vision task, has significant potential in artificial intelligence agriculture and smart farming. However, it remains an open research problem mainly due to the following challenges: 1) the absence of ultra-fine-grained image datasets; 2) only a few samples per category, which is beyond the ability of most large training data favored convolutional neural network methods; 3) much smaller inter-class differences among ultra-fine-grained images by level of orders (e.g., cultivars in the same species) than those in current FGVC tasks (e.g., species).\nThis thesis reports our efforts towards mitigating this research gap and addressing the challenging ultra-FGVC. To address the lack of benchmark datasets for ultra-FGVC, we introduce the ultra-fine-grained (UFG) image dataset, a large-scale collection of 47,114 images from 3,526 categories. All the images in the proposed UFG image dataset are uniquely annotated with genotype based labels rather than human observation based labels. Together with an extensive evaluation of state-of-the-art fine-grained classification methods on the proposed UFG image dataset, we establish a benchmark dataset and baselines for large-scale ultra-FGVC tasks.\nTo mitigate the lack of technical solutions, we present a series of technical solutions towards addressing the ultra-FGVC tasks. Our proposed methods are coarsely categorized into two groups. In the first group, we focus on efficiently and effectively addressing ultra-FGVC via newly proposed geometric shape descriptors: 1) We introduce a novel Multi-Orientation Region Transform (MORT), which can effectively characterize both contour and structure features simultaneously for image classification; 2) A multiscale contour steered region integral is then proposed to further improve the performance of the proposed MORT via incorporating a 2D Fourier transform to provide a more comprehensive feature description; 3) We propose a Block Diagonal Symmetric Positive Definite Matrix Lie Algebra (BDSPDMLA) for shape representation and classification. The proposed BDSPDMLA addresses the computational bottleneck problem of the Riemannian framework based methods and allows a more discriminative shape description via a fusion of various regions information.\nIn the second group, we introduce novel convolutional neural network based methods for the ultra-FGVC tasks: 1) We propose a novel random mask covariance network, which integrates an auxiliary self-supervised learning module with a powerful in-image data augmentation scheme for the ultra-FGVC. Specifically, we first uniformly partition input images into patches and then augment data by randomly shuffling and masking these patches. On top of that, we introduce an auxiliary self-supervised learning module of predicting the spatial covariance context of these patches to increase discriminability of our network for classification; 2) We introduce a weakly supervised part segmentation framework which simultaneously learns to segment parts and identify objects in an end-to-end manner using only image-level category labels. A novel bilateral asymmetry loss function is proposed to guide the part segmentation, encoding the magnitude of part self-similarity in the network learning.\nWe believe the proposed UFG image dataset and evaluation protocols can serve as a benchmark platform that may advance research of visual classification from approaching human performance to beyond human ability, via facilitating benchmark data of artificial intelligence (AI) not to be limited by the labels of human intelligence (HI). More importantly, very encouraging experimental results of the proposed methods in comparison with the state-of-the-art benchmarks demonstrate their superiority and potential for ultra-FGVC, which pushes research boundary forward from the fine-grained to the ultra-fine-grained visual categorization.

Volume None
Pages None
DOI 10.25904/1912/4178
Language English
Journal None

Full Text