Tiejun Huang
Peking University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tiejun Huang.
International Journal of Computer Vision | 2010
Jia Li; Yonghong Tian; Tiejun Huang; Wen Gao
In this paper, we present a probabilistic multi-task learning approach for visual saliency estimation in video. In our approach, the problem of visual saliency estimation is modeled by simultaneously considering the stimulus-driven and task-related factors in a probabilistic framework. In this framework, a stimulus-driven component simulates the low-level processes in human vision system using multi-scale wavelet decomposition and unbiased feature competition; while a task-related component simulates the high-level processes to bias the competition of the input features. Different from existing approaches, we propose a multi-task learning algorithm to learn the task-related “stimulus-saliency” mapping functions for each scene. The algorithm also learns various fusion strategies, which are used to integrate the stimulus-driven and task-related components to obtain the visual saliency. Extensive experiments were carried out on two public eye-fixation datasets and one regional saliency dataset. Experimental results show that our approach outperforms eight state-of-the-art approaches remarkably.
international conference on data mining | 2006
Chong Huang; Yonghong Tian; Zhi Zhou; Charles X. Ling; Tiejun Huang
Keyphrases play a key role in text indexing, summarization and categorization. However, most of the existing keyphrase extraction approaches require human-labeled training sets. In this paper, we propose an automatic keyphrase extraction algorithm, which can be used in both supervised and unsupervised tasks. This algorithm treats each document as a semantic network. Structural dynamics of the network are used to extract keyphrases (key nodes) unsupervised. Experiments demonstrate the proposed algorithm averagely improves 50% in effectiveness and 30% in efficiency in unsupervised tasks and performs comparatively with supervised extractors. Moreover, by applying this algorithm to supervised tasks, we develop a classifier with an overall accuracy up to 80%.
computer vision and pattern recognition | 2016
Peixi Peng; Tao Xiang; Yaowei Wang; Massimiliano Pontil; Shaogang Gong; Tiejun Huang; Yonghong Tian
Most existing person re-identification (Re-ID) approaches follow a supervised learning framework, in which a large number of labelled matching pairs are required for training. This severely limits their scalability in realworld applications. To overcome this limitation, we develop a novel cross-dataset transfer learning approach to learn a discriminative representation. It is unsupervised in the sense that the target dataset is completely unlabelled. Specifically, we present an multi-task dictionary learning method which is able to learn a dataset-shared but target-data-biased representation. Experimental results on five benchmark datasets demonstrate that the method significantly outperforms the state-of-the-art.
IEEE Transactions on Image Processing | 2014
Xianguo Zhang; Tiejun Huang; Yonghong Tian; Wen Gao
The exponential growth of surveillance videos presents an unprecedented challenge for high-efficiency surveillance video coding technology. Compared with the existing coding standards that were basically developed for generic videos, surveillance video coding should be designed to make the best use of the special characteristics of surveillance videos (e.g., relative static background). To do so, this paper first conducts two analyses on how to improve the background and foreground prediction efficiencies in surveillance video coding. Following the analysis results, we propose a background-modeling-based adaptive prediction (BMAP) method. In this method, all blocks to be encoded are firstly classified into three categories. Then, according to the category of each block, two novel inter predictions are selectively utilized, namely, the background reference prediction (BRP) that uses the background modeled from the original input frames as the long-term reference and the background difference prediction (BDP) that predicts the current data in the background difference domain. For background blocks, the BRP can effectively improve the prediction efficiency using the higher quality background as the reference; whereas for foreground-background-hybrid blocks, the BDP can provide a better reference after subtracting its background pixels. Experimental results show that the BMAP can achieve at least twice the compression ratio on surveillance videos as AVC (MPEG-4 Advanced Video Coding) high profile, yet with a slightly additional encoding complexity. Moreover, for the foreground coding performance, which is crucial to the subjective quality of moving objects in surveillance videos, BMAP also obtains remarkable gains over several state-of-the-art methods.
acm multimedia | 2004
Shuqiang Jiang; Qixiang Ye; Wen Gao; Tiejun Huang
With the growing popularity of digitized sports video, automatic analysis of them need be processed to facilitate semantic summarization and retrieval. Playfield plays the fundamental role in automatically analyzing many sports programs. Many semantic clues could be inferred from the results of playfield segmentation. In this paper, a novel playfield segmentation method based on Gaussian mixture models (GMMs) is proposed. Firstly, training pixels are automatically sampled from frames. Then, by supposing that field pixels are the dominant components in most of the video frames, we build the GMMs of the field pixels and use these models to detect playfield pixels. Finally region-growing operation is employed to segment the playfield regions from the background. Experimental results show that the proposed method is robust to various sports videos even for very poor grass field conditions. Based on the results of playfield segmentation, match situation analysis is investigated, which is also desired for sports professionals and longtime fanners. The results are encouraging.
IEEE Computer | 2010
Yonghong Tian; Jaideep Srivastava; Tiejun Huang; Noshir Contractor
The explosive growth of social multimedia content on the Internet is revolutionizing content distribution and social interaction. It has even led to a new research area, called social multimedia computing.
IEEE MultiMedia | 2014
Ling-Yu Duan; Jie Lin; Jie Chen; Tiejun Huang; Wen Gao
To ensure application interoperability in visual object search technologies, the MPEG Working Group has made great efforts in standardizing visual search technologies. Moreover, extraction and transmission of compact descriptors are valuable for next-generation, mobile, visual search applications. This article reviews the significant progress of MPEG Compact Descriptors for Visual Search (CDVS) in standardizing technologies that will enable efficient and interoperable design of visual search applications. In addition, the article presents the location search and recognition oriented data collection and benchmark under the MPEG CDVS evaluation framework.
IEEE Transactions on Image Processing | 2014
Rongrong Ji; Ling-Yu Duan; Jie Chen; Tiejun Huang; Wen Gao
Visual patterns, i.e., high-order combinations of visual words, contributes to a discriminative abstraction of the high-dimensional bag-of-words image representation. However, the existing visual patterns are built upon the 2D photographic concurrences of visual words, which is ill-posed comparing with their real-world 3D concurrences, since the words from different objects or different depth might be incorrectly bound into an identical pattern. On the other hand, designing compact descriptors from the mined patterns is left open. To address both issues, in this paper, we propose a novel compact bag-of-patterns (CBoPs) descriptor with an application to low bit rate mobile landmark search. First, to overcome the ill-posed 2D photographic configuration, we build up a 3D point cloud from the reference images of each landmark, therefore more accurate pattern candidates can be extracted from the 3D concurrences of visual words. A novel gravity distance metric is then proposed to mine discriminative visual patterns. Second, we come up with compact image description by introducing a CBoPs descriptor. CBoP is figured out by sparse coding over the mined visual patterns, which maximally reconstructs the original bag-of-words histogram with a minimum coding length. We developed a low bit rate mobile landmark search prototype, in which CBoP descriptor is directly extracted and sent from the mobile end to reduce the query delivery latency. The CBoP performance is quantized in several large-scale benchmarks with comparisons to the state-of-the-art compact descriptors, topic features, and hashing descriptors. We have reported comparable accuracy to the million-scale bag-of-words histogram over the million scale visual words, with high descriptor compression rate (approximately 100-bits) than the state-of-the-art bag-of-words compression scheme.
International Journal of Computer Vision | 2014
Jia Li; Yonghong Tian; Tiejun Huang
Visual saliency is a useful cue to locate the conspicuous image content. To estimate saliency, many approaches have been proposed to detect the unique or rare visual stimuli. However, such bottom-up solutions are often insufficient since the prior knowledge, which often indicates a biased selectivity on the input stimuli, is not taken into account. To solve this problem, this paper presents a novel approach to estimate image saliency by learning the prior knowledge. In our approach, the influences of the visual stimuli and the prior knowledge are jointly incorporated into a Bayesian framework. In this framework, the bottom-up saliency is calculated to pop-out the visual subsets that are probably salient, while the prior knowledge is used to recover the wrongly suppressed targets and inhibit the improperly popped-out distractors. Compared with existing approaches, the prior knowledge used in our approach, including the foreground prior and the correlation prior, is statistically learned from 9.6 million images in an unsupervised manner. Experimental results on two public benchmarks show that such statistical priors are effective to modulate the bottom-up saliency to achieve impressive improvements when compared with 10 state-of-the-art methods.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2015
Jia Li; Ling-Yu Duan; Xiaowu Chen; Tiejun Huang; Yonghong Tian
There are two sides to every story of visual saliency modeling in the frequency domain. On the one hand, image saliency can be effectively estimated by applying simple operations to the frequency spectrum. On the other hand, it is still unclear which part of the frequency spectrum contributes the most to popping-out targets and suppressing distractors. Toward this end, this paper tentatively explores the secret of image saliency in the frequency domain. From the results obtained in several qualitative and quantitative experiments, we find that the secret of visual saliency may mainly hide in the phases of intermediate frequencies. To explain this finding, we reinterpret the concept of discrete Fourier transform from the perspective of template-based contrast computation and thus develop several principles for designing the saliency detector in the frequency domain. Following these principles, we propose a novel approach to design the saliency detector under the assistance of prior knowledge obtained through both unsupervised and supervised learning processes. Experimental results on a public image benchmark show that the learned saliency detector outperforms 18 state-of-the-art approaches in predicting human fixations.