Lingxi Xie
Tsinghua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lingxi Xie.
IEEE Transactions on Image Processing | 2014
Lingxi Xie; Qi Tian; Meng Wang; Bo Zhang
In image classification tasks, one of the most successful algorithms is the bag-of-features (BoFs) model. Although the BoF model has many advantages, such as simplicity, generality, and scalability, it still suffers from several drawbacks, including the limited semantic description of local descriptors, lack of robust structures upon single visual words, and missing of efficient spatial weighting. To overcome these shortcomings, various techniques have been proposed, such as extracting multiple descriptors, spatial context modeling, and interest region detection. Though they have been proven to improve the BoF model to some extent, there still lacks a coherent scheme to integrate each individual module together. To address the problems above, we propose a novel framework with spatial pooling of complementary features. Our model expands the traditional BoF model on three aspects. First, we propose a new scheme for combining texture and edge-based local features together at the descriptor extraction level. Next, we build geometric visual phrases to model spatial context upon complementary features for midlevel image representation. Finally, based on a smoothed edgemap, a simple and effective spatial weighting scheme is performed to capture the image saliency. We test the proposed framework on several benchmark data sets for image classification. The extensive results show the superior performance of our algorithm over the state-of-the-art methods.
international conference on multimedia retrieval | 2015
Lingxi Xie; Bo Zhang; Qi Tian
In this paper, we demonstrate that the essentials of image classification and retrieval are the same, since both tasks could be tackled by measuring the similarity between images. To this end, we propose ONE (Online Nearest-neighbor Estimation), a unified algorithm for both image classification and retrieval. ONE is surprisingly simple, which only involves manual object definition, regional description and nearest-neighbor search. We take advantage of PCA and PQ approximation and GPU parallelization to scale our algorithm up to large-scale image search. Experimental results verify that ONE achieves state-of-the-art accuracy in a wide range of image classification and retrieval benchmarks.
international conference on computer vision | 2013
Lingxi Xie; Qi Tian; Shuicheng Yan; Bo Zhang
As a special topic in computer vision, fine-grained visual categorization (FGVC) has been attracting growing attention these years. Different with traditional image classification tasks in which objects have large inter-class variation, the visual concepts in the fine-grained datasets, such as hundreds of bird species, often have very similar semantics. Due to the large inter-class similarity, it is very difficult to classify the objects without locating really discriminative features, therefore it becomes more important for the algorithm to make full use of the part information in order to train a robust model. In this paper, we propose a powerful flowchart named Hierarchical Part Matching (HPM) to cope with fine-grained classification tasks. We extend the Bag-of-Features (BoF) model by introducing several novel modules to integrate into image representation, including foreground inference and segmentation, Hierarchical Structure Learning (HSL), and Geometric Phrase Pooling (GPP). We verify in experiments that our algorithm achieves the state-of-the-art classification accuracy in the Caltech-UCSD-Birds-200-2011 dataset by making full use of the ground-truth part annotations.
computer vision and pattern recognition | 2014
Lingxi Xie; Jingdong Wang; Baining Guo; Bo Zhang; Qi Tian
Scene recognition is a basic task towards image understanding. Spatial Pyramid Matching (SPM) has been shown to be an efficient solution for spatial context modeling. In this paper, we introduce an alternative approach, Orientational Pyramid Matching (OPM), for orientational context modeling. Our approach is motivated by the observation that the 3D orientations of objects are a crucial factor to discriminate indoor scenes. The novelty lies in that OPM uses the 3D orientations to form the pyramid and produce the pooling regions, which is unlike SPM that uses the spatial positions to form the pyramid. Experimental results on challenging scene classification tasks show that OPM achieves the performance comparable with SPM and that OPM and SPM make complementary contributions so that their combination gives the state-of-the-art performance.
Computer Vision and Image Understanding | 2014
Lingxi Xie; Qi Tian; Wengang Zhou; Bo Zhang
Abstract Near-duplicate image search in very large Web databases has been a hot topic in recent years. In the traditional methods, the Bag-of-Visual-Words (BoVW) model and the inverted index structure are very widely adopted. Despite the simplicity, efficiency and scalability, these algorithms highly depends on the accurate matching of local features. However, there are many reasons in real applications that limit the descriptive power of low-level features, and therefore cause the search results suffer from unsatisfied precision and recall. To overcome these shortcomings, it is reasonable to re-rank the initial search results using some post-processing approaches, such as spatial verification, query expansion and diffusion-based algorithms. In this paper, we investigate the re-ranking problem from a graph-based perspective. We construct ImageWeb, a sparse graph consisting of all the images in the database, in which two images are connected if and only if one is ranked among the top of another’s initial search result. Based on the ImageWeb, we use HITS, a query-dependent algorithm to re-rank the images according to the affinity values. We verify that it is possible to discover the nature of image relationships for search result refinement without using any handcrafted methods such as spatial verification. We also consider some tradeoff strategies to intuitively guide the selection of searching parameters. Experiments are conducted on the large-scale image datasets with more than one million images. Our algorithm achieves the state-of-the-art search performance with very fast speed at the online stages.
computer vision and pattern recognition | 2016
Lingxi Xie; Jingdong Wang; Zhen Wei; Meng Wang; Qi Tian
During a long period of time we are combating overfitting in the CNN training process with model regularization, including weight decay, model averaging, data augmentation, etc. In this paper, we present DisturbLabel, an extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration. Although it seems weird to intentionally generate incorrect training labels, we show that DisturbLabel prevents the network training from over-fitting by implicitly averaging over exponentially many networks which are trained with different label sets. To the best of our knowledge, DisturbLabel serves as the first work which adds noises on the loss layer. Meanwhile, DisturbLabel cooperates well with Dropout to provide complementary regularization functions. Experiments demonstrate competitive recognition results on several popular image recognition datasets.
IEEE Transactions on Multimedia | 2015
Lingxi Xie; Jingdong Wang; Bo Zhang; Qi Tian
Large-scale image search has been attracting lots of attention from both academic and commercial fields. The conventional bag-of-visual-words (BoVW) model with inverted index is verified efficient at retrieving near-duplicate images, but it is less capable of discovering fine-grained concepts in the query and returning semantically matched search results. In this paper, we suggest that instance search should return not only near-duplicate images, but also fine-grained results, which is usually the actual intention of a user. We propose a new and interesting problem named fine-grained image search, which means that we prefer those images containing the same fine-grained concept with the query. We formulate the problem by constructing a hierarchical database and defining an evaluation method. We thereafter introduce a baseline system using fine-grained classification scores to represent and co-index images so that the semantic attributes are better incorporated in the online querying stage. Large-scale experiments reveal that promising search results are achieved with reasonable time and memory consumption. We hope this paper will be the foundation for future work on image search. We also expect more follow-up efforts along this research topic and look forward to commercial fine-grained image search engines.
acm multimedia | 2012
Lingxi Xie; Qi Tian; Bo Zhang
The Bag-of-Features (BoF) model has played an important role for image representation in many multimedia applications. It has been extensively applied to many tasks including image classification, image retrieval, scene understanding, and so on. Despite the advantages of this model such as simplicity, efficiency and generality, there are also notable drawbacks for this model, including poor power of semantic expression of local descriptors, and lack of robust structures upon single visual words. To overcome these problems, various techniques have been proposed, such as multiple descriptors, spatial context modeling and interest region detection. Though they have been proven to improve the BoF model to some extent, there still lacks a coherent scheme to integrate each individual module. To address the problems above, we propose a novel framework with spatial pooling of heterogeneous features. Our framework differs from the traditional Bag-of-Features model on three aspects. First, we propose a new scheme for combining texture and edge based local features together at the descriptor extraction level. Next, we build geometric visual phrases to model spatial context upon heterogeneous features for mid-level representation of images. Finally, based on a smoothed edgemap, a simple and effective spatial weighting scheme is performed on our mid-level image representation. We test our integrated framework on several benchmark datasets for image classification and retrieval applications. The extensive results show the superior performance of our algorithm over state-of-the-art methods.
computer vision and pattern recognition | 2016
Lingxi Xie; Liang Zheng; Jingdong Wang; Alan L. Yuille; Qi Tian
An increasing number of computer vision tasks can be tackled with deep features, which are the intermediate outputs of a pre-trained Convolutional Neural Network. Despite the astonishing performance, deep features extracted from low-level neurons are still below satisfaction, arguably because they cannot access the spatial context contained in the higher layers. In this paper, we present InterActive, a novel algorithm which computes the activeness of neurons and network connections. Activeness is propagated through a neural network in a top-down manner, carrying highlevel context and improving the descriptive power of lowlevel and mid-level neurons. Visualization indicates that neuron activeness can be interpreted as spatial-weighted neuron responses. We achieve state-of-the-art classification performance on a wide range of image datasets.
medical image computing and computer-assisted intervention | 2017
Yuyin Zhou; Lingxi Xie; Wei Shen; Yan Wang; Elliot K. Fishman; Alan L. Yuille
Deep neural networks have been widely adopted for automatic organ segmentation from abdominal CT scans. However, the segmentation accuracy of some small organs (e.g., the pancreas) is sometimes below satisfaction, arguably because deep networks are easily disrupted by the complex and variable background regions which occupies a large fraction of the input volume. In this paper, we formulate this problem into a fixed-point model which uses a predicted segmentation mask to shrink the input region. This is motivated by the fact that a smaller input region often leads to more accurate segmentation. In the training process, we use the ground-truth annotation to generate accurate input regions and optimize network weights. On the testing stage, we fix the network parameters and update the segmentation results in an iterative manner. We evaluate our approach on the NIH pancreas segmentation dataset, and outperform the state-of-the-art by more than \(4\%\), measured by the average Dice-Sorensen Coefficient (DSC). In addition, we report \(62.43\%\) DSC in the worst case, which guarantees the reliability of our approach in clinical applications.