Is this you? Create Your Porfile

Jinhui Tang

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jinhui Tang is active.

Explore More

Publication

Featured researches published by Jinhui Tang.

conference on image and video retrieval | 2009

NUS-WIDE: a real-world web image database from National University of Singapore

Tat-Seng Chua; Jinhui Tang; Haojie Li; Zhiping Luo; Yan-Tao Zheng

This paper introduces a web image dataset created by NUSs Lab for Media Search. The dataset includes: (1) 269,648 images and the associated tags from Flickr, with a total of 5,018 unique tags; (2) six types of low-level features extracted from these images, including 64-D color histogram, 144-D color correlogram, 73-D edge direction histogram, 128-D wavelet texture, 225-D block-wise color moments extracted over 5x5 fixed grid partitions, and 500-D bag of words based on SIFT descriptions; and (3) ground-truth for 81 concepts that can be used for evaluation. Based on this dataset, we highlight characteristics of Web image collections and identify four research issues on web image annotation and retrieval. We also provide the baseline results for web image annotation by learning from the tags using the traditional k-NN algorithm. The benchmark results indicate that it is possible to learn effective models from sufficiently large image dataset to facilitate general image retrieval.

IEEE Transactions on Multimedia | 2009

Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation

Meng Wang; Xian-Sheng Hua; Jinhui Tang; Richang Hong

In the past few years, video annotation has benefited a lot from the progress of machine learning techniques. Recently, graph-based semi-supervised learning has gained much attention in this domain. However, as a crucial factor of these algorithms, the estimation of pairwise similarity has not been sufficiently studied. Generally, the similarity of two samples is estimated based on the Euclidean distance between them. But we will show that the similarity between two samples is not merely related to their distance but also related to the distribution of surrounding samples and labels. It is shown that the traditional distance-based similarity measure may lead to high classification error rates even on several simple datasets. To address this issue, we propose a novel neighborhood similarity measure, which explores the local sample and label distributions. We show that the neighborhood similarity between two samples simultaneously takes into account three characteristics: 1) their distance; 2) the distribution difference of the surrounding samples; and 3) the distribution difference of surrounding labels. Extensive experiments have demonstrated the superiority of neighborhood similarity over the existing distance-based similarity.

ACM Transactions on Intelligent Systems and Technology | 2011

Image annotation by k NN-sparse graph-based label propagation over noisily tagged web images

Jinhui Tang; Shuicheng Yan; Tat-Seng Chua; Guo-Jun Qi; Ramesh Jain

In this article, we exploit the problem of annotating a large-scale image corpus by label propagation over noisily tagged web images. To annotate the images more accurately, we propose a novel kNN-sparse graph-based semi-supervised learning approach for harnessing the labeled and unlabeled data simultaneously. The sparse graph constructed by datum-wise one-vs-kNN sparse reconstructions of all samples can remove most of the semantically unrelated links among the data, and thus it is more robust and discriminative than the conventional graphs. Meanwhile, we apply the approximate k nearest neighbors to accelerate the sparse graph construction without loosing its effectiveness. More importantly, we propose an effective training label refinement strategy within this graph-based learning framework to handle the noise in the training labels, by bringing in a dual regularization for both the quantity and sparsity of the noise. We conduct extensive experiments on a real-world image database consisting of 55,615 Flickr images and noisily tagged training labels. The results demonstrate both the effectiveness and efficiency of the proposed approach and its capability to deal with the noise in the training labels.

IEEE Transactions on Multimedia | 2010

Image Annotation by Graph-Based Inference With Integrated Multiple/Single Instance Representations

Jinhui Tang; Haojie Li; Guo-Jun Qi; Tat-Seng Chua

In most of the learning-based image annotation approaches, images are represented using multiple-instance (local) or single-instance (global) features. Their performances, however, are mixed as for certain concepts, the single-instance representations of images are more suitable, while for others, the multiple-instance representations are better. Thus this paper explores a unified learning framework that combines the multiple-instance and single-instance representations for image annotation. More specifically, we propose an integrated graph-based semi-supervised learning framework to utilize these two types of representations simultaneously. We further explore three strategies to convert from multiple-instance representation into a single-instance one. Experiments conducted on the COREL image dataset demonstrate the effectiveness and efficiency of the proposed integrated framework and the conversion strategies.

acm multimedia | 2010

W2Go: a travel guidance system by automatic landmark ranking

Yue Gao; Jinhui Tang; Qionghai Dai; Tat-Seng Chua; Ramesh Jain

In this paper, we present a travel guidance system W2Go (Where to Go), which can automatically recognize and rank the landmarks for travellers. In this system, a novel Automatic Landmark Ranking (ALR) method is proposed by utilizing the tag and geo-tag information of photos in Flickr and user knowledge from Yahoo Travel Guide. ALR selects the popular tourist attractions (landmarks) based on not only the subjective opinion of the travel editors as is currently done on sites like WikiTravel and Yahoo Travel Guide, but also the ranking derived from popularity among tourists. Our approach utilizes geo-tag information to locate the positions of the tag-indicated places, and computes the probability of a tag being a landmark/site name. For potential landmarks, impact factors are calculated from the frequency of tags, user numbers in Flickr, and user knowledge in Yahoo Travel Guide. These tags are then ranked based on the impact factors. Several representative views for popular landmarks are generated from the crawled images with geo-tags to describe and present them in context of information derived from several relevant reference sources. The experimental comparisons to the other systems are conducted on eight famous cities over the world. User-based evaluation demonstrates the effectiveness of the proposed ALR method and the W2Go system.

acm multimedia | 2009

Label to region by bi-layer sparsity priors

Xiaobai Liu; Bin Cheng; Shuicheng Yan; Jinhui Tang; Tat-Seng Chua; Hai Jin

In this work, we investigate how to automatically reassign the manually annotated labels at the image-level to those contextually derived semantic regions. First, we propose a bi-layer sparse coding formulation for uncovering how an image or semantic region can be robustly reconstructed from the over-segmented image patches of an image set. We then harness it for the automatic label to region assignment of the entire image set. The solution to bi-layer sparse coding is achieved by convex l1-norm minimization. The underlying philosophy of bi-layer sparse coding is that an image or semantic region can be sparsely reconstructed via the atomic image patches belonging to the images with common labels, while the robustness in label propagation requires that these selected atomic patches come from very few images. Each layer of sparse coding produces the image label assignment to those selected atomic patches and merged candidate regions based on the shared image labels. The results from all bi-layer sparse codings over all candidate regions are then fused to obtain the entire label to region assignments. Besides, the presenting bi-layer sparse coding framework can be naturally applied to perform image annotation on new test images. Extensive experiments on three public image datasets clearly demonstrate the effectiveness of our proposed framework in both label to region assignment and image annotation tasks.

systems man and cybernetics | 2009

Correlative Linear Neighborhood Propagation for Video Annotation

Jinhui Tang; Xian-Sheng Hua; Meng Wang; Zhiwei Gu; Guo-Jun Qi; Xiuqing Wu

Recently, graph-based semisupervised learning methods have been widely applied in multimedia research area. However, for the application of video semantic annotation in multilabel setting, these methods neglect an important characteristic of video data: The semantic concepts appear correlatively and interact naturally with each other rather than exist in isolation. In this paper, we adapt this semantic correlation into graph-based semisupervised learning and propose a novel method named correlative linear neighborhood propagation to improve annotation performance. Experiments conducted on the Text REtrieval Conference VIDeo retrieval evaluation data set have demonstrated its effectiveness and efficiency.

ACM Transactions on Multimedia Computing, Communications, and Applications | 2011

Beyond search: Event-driven summarization for web videos

Jinhui Tang; Hung-Khoon Tan; Chong-Wah Ngo; Shuicheng Yan; Tat-Seng Chua

The explosive growth of Web videos brings out the challenge of how to efficiently browse hundreds or even thousands of videos at a glance. Given an event-driven query, social media Web sites usually return a large number of videos that are diverse and noisy in a ranking list. Exploring such results will be time-consuming and thus degrades user experience. This article presents a novel scheme that is able to summarize the content of video search results by mining and threading “key” shots, such that users can get an overview of main content of these videos at a glance. The proposed framework mainly comprises four stages. First, given an event query, a set of Web videos is collected associated with their ranking order and tags. Second, key-shots are established and ranked based on near-duplicate keyframe detection and they are threaded in a chronological order. Third, we analyze the tags associated with key-shots. Irrelevant tags are filtered out via a representativeness and descriptiveness analysis, whereas the remaining tags are propagated among key-shots by random walk. Finally, summarization is formulated as an optimization framework that compromises relevance of key-shots and user-defined skimming ratio. We provide two types of summarization: video skimming and visual-textual storyboard. We conduct user studies on twenty event queries for over hundred hours of videos crawled from YouTube. The evaluation demonstrates the feasibility and effectiveness of the proposed solution.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2009

Two-Dimensional Multilabel Active Learning with an Efficient Online Adaptation Model for Image Classification

Guo-Jun Qi; Xian-Sheng Hua; Yong Rui; Jinhui Tang; Hong-Jiang Zhang

Conventional active learning dynamically constructs the training set only along the sample dimension. While this is the right strategy in binary classification, it is suboptimal for multilabel image classification. We argue that for each selected sample, only some effective labels need to be annotated while others can be inferred by exploring the label correlations. The reason is that the contributions of different labels to minimizing the classification error are different due to the inherent label correlations. To this end, we propose to select sample-label pairs, rather than only samples, to minimize a multilabel Bayesian classification error bound. We call it two-dimensional active learning because it considers both the sample dimension and the label dimension. Furthermore, as the number of training samples increases rapidly over time due to active learning, it becomes intractable for the offline learner to retrain a new model on the whole training set. So we develop an efficient online learner to adapt the existing model with the new one by minimizing their model distance under a set of multilabel constraints. The effectiveness and efficiency of the proposed method are evaluated on two benchmark data sets and a realistic image collection from a real-world image sharing Web site-Corbis.

Neurocomputing | 2010

View-based 3D model retrieval with probabilistic graph model

Yue Gao; Jinhui Tang; Haojie Li; Qionghai Dai; Naiyao Zhang

In this paper, we present a view-based 3D model retrieval algorithm using probabilistic graph model. In this work, five circle camera arrays are employed, and five groups of views are captured from each 3D model. Each captured view set is modeled as a first order Markov Chain. The task of 3D model retrieval is defined as a probabilistic analysis procedure, and the comparison between the query and other 3D models is changed to compute the conditional probabilities of 3D models in the database given the query model. The purpose to search 3D model is to find the maximal a posterior probability of the models in the database given the query model. Then, we present a solution to estimate the conditional probabilities. The proposed 3D model retrieval algorithm has been evaluated on the NTU 3D model database. Experimental results and comparison with other methods show the effectiveness of the proposed approach.

Explore More