Jia Li
Chinese Academy of Sciences
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Jia Li.
International Journal of Computer Vision | 2010
Jia Li; Yonghong Tian; Tiejun Huang; Wen Gao
In this paper, we present a probabilistic multi-task learning approach for visual saliency estimation in video. In our approach, the problem of visual saliency estimation is modeled by simultaneously considering the stimulus-driven and task-related factors in a probabilistic framework. In this framework, a stimulus-driven component simulates the low-level processes in human vision system using multi-scale wavelet decomposition and unbiased feature competition; while a task-related component simulates the high-level processes to bias the competition of the input features. Different from existing approaches, we propose a multi-task learning algorithm to learn the task-related “stimulus-saliency” mapping functions for each scene. The algorithm also learns various fusion strategies, which are used to integrate the stimulus-driven and task-related components to obtain the visual saliency. Extensive experiments were carried out on two public eye-fixation datasets and one regional saliency dataset. Experimental results show that our approach outperforms eight state-of-the-art approaches remarkably.
acm multimedia | 2010
Haonan Yu; Jia Li; Yonghong Tian; Tiejun Huang
Automatic interesting object extraction is widely used in many image applications. Among various extraction approaches, saliency-based ones usually have a better performance since they well accord with human visual perception. However, nearly all existing saliency-based approaches suffer the integrity problem, namely, the extracted result is either a small part of the object (referred to as sketch-like) or a large region that contains some redundant part of the background (referred to as envelope-like). In this paper, we propose a novel object extraction approach by integrating two kinds of complementary saliency maps (i.e., sketch-like and envelope-like maps). In our approach, the extraction process is decomposed into two sub-processes, one used to extract a high-precision result based on the sketch-like map, and the other used to extract a high-recall result based on the envelope-like map. Then a classification step is used to extract an exact object based on the two results. By transferring the complex extraction task to an easier classification problem, our approach can effectively break down the integrity problem. Experimental results show that the proposed approach outperforms six state-of-art saliency-based methods remarkably in automatic object extraction, and is even comparable to some interactive approaches.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2015
Jia Li; Ling-Yu Duan; Xiaowu Chen; Tiejun Huang; Yonghong Tian
There are two sides to every story of visual saliency modeling in the frequency domain. On the one hand, image saliency can be effectively estimated by applying simple operations to the frequency spectrum. On the other hand, it is still unclear which part of the frequency spectrum contributes the most to popping-out targets and suppressing distractors. Toward this end, this paper tentatively explores the secret of image saliency in the frequency domain. From the results obtained in several qualitative and quantitative experiments, we find that the secret of visual saliency may mainly hide in the phases of intermediate frequencies. To explain this finding, we reinterpret the concept of discrete Fourier transform from the perspective of template-based contrast computation and thus develop several principles for designing the saliency detector in the frequency domain. Following these principles, we propose a novel approach to design the saliency detector under the assistance of prior knowledge obtained through both unsupervised and supervised learning processes. Experimental results on a public image benchmark show that the learned saliency detector outperforms 18 state-of-the-art approaches in predicting human fixations.
international conference on multimedia and expo | 2009
Jia Li; Yonghong Tian; Tiejun Huang; Wen Gao
Recently, visual saliency has drawn great research interest in the field of computer vision and multimedia. Various approaches aiming at calculating visual saliency have been proposed. To evaluate these approaches, several datasets have been presented for visual saliency in images. However, there are few datasets to capture spatiotemporal visual saliency in video. Intuitively, visual saliency in video is strongly affected by temporal context and might vary significantly even in visually similar frames. In this paper, we present an extensive dataset with 7.5-hour videos to capture spatiotemporal visual saliency. The salient regions in frames sequentially sampled from these videos are manually labeled by 23 subjects and then averaged to generate the ground-truth saliency maps. We also present three metrics to evaluate competing approaches. Several typical algorithms were evaluated on the dataset. The experimental results show that this dataset is very suitable for evaluating visual saliency. We also discover some interesting findings that would be addressed in future research. Currently, the dataset is freely available online together with the source code for evaluation.
IEEE Transactions on Circuits and Systems for Video Technology | 2011
Jia Li; Yonghong Tian; Tiejun Huang; Wen Gao
Visual saliency plays an important role in various video applications such as video retargeting and intelligent video advertising. However, existing visual saliency estimation approaches often construct a unified model for all scenes, thus leading to poor performance for the scenes with diversified contents. To solve this problem, we propose a multi-task rank learning approach which can be used to infer multiple saliency models that apply to different scene clusters. In our approach, the problem of visual saliency estimation is formulated in a pair-wise rank learning framework, in which the visual features can be effectively integrated to distinguish salient targets from distractors. A multi-task learning algorithm is then presented to infer multiple visual saliency models simultaneously. By an appropriate sharing of information across models, the generalization ability of each model can be greatly improved. Extensive experiments on a public eye-fixation dataset show that our multi-task rank learning approach outperforms 12 state-of-the-art methods remarkably in visual saliency estimation.
IEEE Transactions on Image Processing | 2012
Jia Li; Dong Xu; Wen Gao
Visual saliency is a useful clue to depict visually important image/video contents in many multimedia applications. In visual saliency estimation, a feasible solution is to learn a “feature-saliency” mapping model from the user data obtained by manually labeling activities or eye-tracking devices. However, label ambiguities may also arise due to the inaccurate and inadequate user data. To process the noisy training data, we propose a multi-instance learning to rank approach for visual saliency estimation. In our approach, the correlations between various image patches are incorporated into an ordinal regression framework. By iteratively refining a ranking model and relabeling the image patches with respect to their mutual correlations, the label ambiguities can be effectively removed from the training data. Consequently, visual saliency can be effectively estimated by the ranking model, which can pop out real targets and suppress real distractors. Extensive experiments on two public image data sets show that our approach outperforms 11 state-of-the-art methods remarkably in visual saliency estimation.
Science in China Series F: Information Sciences | 2011
Tiejun Huang; Yonghong Tian; Jia Li; Haonan Yu
General object recognition and image understanding is recognized as a dramatic goal for computer vision and multimedia retrieval. In spite of the great efforts devoted in the last two decades, it still remains an open problem. In this paper, we propose a selective attention-driven model for general image understanding, named GORIUM (general object recognition and image understanding model). The key idea of our model is to discover recurring visual objects by selective attention modeling and pairwise local invariant features matching on a large image set in an unsupervised manner. Towards this end, it can be formulated as a four-layer bottomup model, i.e., salient region detection, object segmentation, automatic object discovering and visual dictionary construction. By exploiting multi-task learning methods to model visual saliency simultaneously with the bottom-up and top-down factors, the lowest layer can effectively detect salient objects in an image. The second layer exploits a simple yet effective learning approach to generate two complementary maps from several raw saliency maps, which then can be utilized to segment the salient objects precisely from a complex scene. For the third layer, we have also implemented an unsupervised approach to automatically discover general objects from large image set by pairwise matching with local invariant features. Afterwards, visual dictionary construction can be implemented by using many state-of-the-art algorithms and tools available nowadays.
IEEE Signal Processing Letters | 2010
Jia Li; Yonghong Tian; Tiejun Huang; Wen Gao
This paper presents a cost-sensitive rank learning approach for visual saliency estimation. This approach avoids the explicit selection of positive and negative samples, which is often used by existing learning-based visual saliency estimation approaches. Instead, both the positive and unlabeled data are directly integrated into a rank learning framework in a cost-sensitive manner. Compared with existing approaches, the rank learning framework can take the influences of both the local visual attributes and the pair-wise contexts into account simultaneously. Experimental results show that our algorithm outperforms several state-of-the-art approaches remarkably in visual saliency estimation.
international conference on image processing | 2008
Jia Li; Yonghong Tian; Tiejun Huang; Wen Gao
Text segmentation, or named text binarization, is usually an essential step for text information extraction from images and videos. However, most existing text segmentation methods have difficulties in extracting multi-polarity texts, where multi-polarity texts mean those texts with multiple colors or intensities in the same line. In this paper, we propose a novel algorithm for multi- polarity text segmentation based on graph theory. By representing a text image with an undirected weighted graph and partitioning it iteratively, multi-polarity text image can be effectively split into several single-polarity text images. As a result, these text images are then segmented by single-polarity text segmentation algorithms. Experiments on thousands of multi-polarity text images show that our algorithm can effectively segment multi-polarity texts.
acm multimedia | 2010
Min Wang; Jia Li; Tiejun Huang; Yonghong Tian; Ling-Yu Duan; Guochen Jia
Visual saliency can be a useful tool for image content analysis such as automatic image cropping and image compression. In existing methods on visual saliency detection, most of them are related to the model of receptive field. In this paper, we propose a bottom-up model which introduces 2D Log-Gabor wavelets for saliency detection. Compared with the traditional model of receptive field, the 2D Log-Gabor wavelets can better simulate the biological characteristics of the simple cortical cell in the receptive filed. Moreover, we also incorporate the influence of center bias into our model, which is a common phenomenon that directs visual attention to the center of images in natural scenes. Experimental results show that our approach outperforms three state-of-the-art approaches remarkably.
