Meng Wang
University of Science and Technology of China
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Meng Wang.
acm multimedia | 2007
Meng Wang; Xian-Sheng Hua; Xun Yuan; Yan Song; Li-Rong Dai
Learning based semantic video annotation is a promising approach for enabling content-based video search. However, severe difficulties, such as insufficiency of training data and curse of dimensionality, are frequently encountered. This paper proposes a novel unified scheme, Optimized Multi-Graph-based Semi-Supervised Learning (OMG-SSL), to simultaneously attack these difficulties. Instead of only using a single graph, OMG-SSL integrates multiple graphs into a regularization and optimization framework to sufficiently explore their complementary nature. We then show that various crucial factors in video annotation, including multiple modalities, multiple distance metrics, and temporal consistency, in fact all correspond to different correlations among samples, and hence they can be represented by different graphs. Therefore, OMG-SSL is able to simultaneously deal with these factors within a unified framework. Experiments on the TRECVID benchmark demonstrate the effectiveness of our proposed approach.
ACM Transactions on Multimedia Computing, Communications, and Applications | 2008
Guo-Jun Qi; Xian-Sheng Hua; Yong Rui; Jinhui Tang; Tao Mei; Meng Wang; Hong-Jiang Zhang
Automatic video annotation is an important ingredient for semantic-level video browsing, search and navigation. Much attention has been paid to this topic in recent years. These researches have evolved through two paradigms. In the first paradigm, each concept is individually annotated by a pre-trained binary classifier. However, this method ignores the rich information between the video concepts and only achieves limited success. Evolved from the first paradigm, the methods in the second paradigm add an extra step on the top of the first individual classifiers to fuse the multiple detections of the concepts. However, the performance of these methods can be degraded by the error propagation incurred in the first step to the second fusion one. In this article, another paradigm of the video annotation method is proposed to address these problems. It simultaneously annotates the concepts as well as model correlations between them in one step by the proposed Correlative Multilabel (CML) method, which benefits from the compensation of complementary information between different labels. Furthermore, since the video clips are composed by temporally ordered frame sequences, we extend the proposed method to exploit the rich temporal information in the videos. Specifically, a temporal-kernel is incorporated into the CML method based on the discriminative information between Hidden Markov Models (HMMs) that are learned from the videos. We compare the performance between the proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. As to be shown, superior performance of the proposed method is gained.
acm multimedia | 2006
Meng Wang; Xian-Sheng Hua; Yan Song; Xun Yuan; Shipeng Li; Hong-Jiang Zhang
Insufficiency of labeled training data is a major obstacle for automatically annotating large-scale video databases with semantic concepts. Existing semi-supervised learning algorithms based on parametric models try to tackle this issue by incorporating the information in a large amount of unlabeled data. However, they are based on a model assumption that the assumed generative model is correct, which usually cannot be satisfied in automatic video annotation due to the large variations of video semantic concepts. In this paper, we propose a novel semi-supervised learning algorithm, named Semi Supervised Learning by Kernel Density Estimation (SSLKDE), which is based on a non-parametric method, and therefore the model assumption is avoided. While only labeled data are utilized in the classical Kernel Density Estimation (KDE) approach, in SSLKDE both labeled and unlabeled data are leveraged to estimate class conditional probability densities based on an extended form of KDE. We also investigate the connection between SSLKDE and existing graph-based semi-supervised learning algorithms. Experiments prove that SSLKDE significantly outperforms existing supervised methods for video annotation.
acm multimedia | 2007
Meng Wang; Tao Mei; Xun Yuan; Yan Song; Li-Rong Dai
Graph-based semi-supervised learning methods have been proven effective in tackling the difficulty of training data insufficiency in many practical applications such as video annotation. These methods are all based on an assumption that the labels of similar samples are close. However, as a crucial factor of these algorithms, the estimation of pairwise similarity has not been sufficiently studied. Usually, the similarity of two samples is estimated based on the Euclidean distance between them. But we will show that similarities are not merely related to distances but also related to the structures around the samples. It is shown that distance-based similarity measure may lead to high classification error rates even on several simple datasets. In this paper we propose a novel neighborhood similarity measure, which simultaneously takes into account both thse distance between samples and the difference between the structures around the corresponding samples. Experiments on synthetic dataset and TRECVID benchmark demonstrate that the neighborhood similarity is superior to existing distance based similarity.
international conference on semantic computing | 2007
Meng Wang; Xian-Sheng Hua; Yan Song; Jinhui Tang; Li-Rong Dai
Active learning methods have been widely applied to reduce human labeling effort in multimedia annotation tasks. However, in traditional methods multiple concepts are usually sequentially annotated, i.e., each concept is exhaustively annotated before proceeding to the next, without taking the learnabilities of different concepts into consideration. Furthermore, in most of these methods only a single modality is applied. This paper presents a novel multi- concept multi-modality active learning method which ex- changeably annotates multiple concepts in the context of multi-modality. It iteratively selects a concept and a batch of unlabeled samples, and then these samples are annotated with the selected concept. Afier that, a graph-based semi-supervised learning is conducted on each modality for the selected concept. The proposed method takes into account both the learnabilities of different concepts and the potentials of different modalities. Experimental results on TRECVID 2005 benchmark have demonstrated its effectiveness and efficiency.
international conference on multimedia and expo | 2007
Richang Hong; Chao Wang; Yong Ge; Meng Wang; Xiuqing Wu; Rong Zhang
This paper proposes a novel multi-focus image fusion algorithm. Different from traditional decision map based methods, our algorithm is based on salience preserving gradient, which can better emphasize the structure details of sources while preserving the color consistency. We firstly measure the salience map of the gradient from each source, and then use their saliency to modulate their contributions in computing the global statistics. Gradients with high saliency are properly highlighted in the target gradient, and thereby salient features in the sources are well preserved. Furthermore we extend it to color domain by proposing an importance-weight based trigonometric average method to merge the color components. Extensive experiments on several datasets have demonstrated the effectiveness of our approach.
international conference on multimedia and expo | 2006
Meng Wang; Xian-Sheng Hua; Li-Rong Dai; Yan Song
For automatic semantic annotation of large-scale video database, the insufficiency of labeled training samples is a major obstacle. General semi-supervised learning algorithms can help solve the problem but the improvement is limited. In this paper, two semi-supervised learning algorithms, self-training and co-training, are enhanced by exploring the temporal consistency of semantic concepts in video sequences. In the enhanced algorithms, instead of individual shots, time-constraint shot clusters are taken as the basic sample units, in which most mis-classifications can be corrected before they are applied for re-training, thus more accurate statistical models can be obtained. Experiments show that enhanced self-training/co-training significantly improves the performance of video annotation
international conference on multimedia and expo | 2007
Xun Yuan; Xian-Sheng Hua; Meng Wang; Guo-Jun Qi; Xiuqing Wu
In image retrieval, the concepts are usually in region-level but annotated in image-level, which leads to a major difficulty in learning the target concepts. In this paper, we formulate region-based image retrieval as a multiple-instance learning (MIL) problem, and propose an efficient and effective algorithm, named MI-AdaBoost, to solve it. The algorithm firstly maps each bag into a new bag feature space using a certain set of instance prototypes, and then adopts AdaBoost to select the bag features and build classifiers simultaneously. Experiments on both COREL and MUSK datasets show the proposed scheme is much more efficient than some typical existing MIL algorithms while has comparable results.
international conference on acoustics, speech, and signal processing | 2006
Yan Song; Xian-Sheng Hua; Li-Rong Dai; Meng Wang; Ren-Hua Wang
Given a large set of video database, how to connect video segments with a certain set of semantic concepts with least manual labors is an elementary step for video indexing and searching. Due to the large gap between high-level semantics and low-level features, automatic video annotation with high accuracy is a challenging task. In this paper, we propose a novel automatic video annotation framework, which improves the annotation performance by learning from unlabeled samples and exploring temporal relationship in video sequences. To effectively learn from unlabeled data, a sample selection scheme based on combining a set of complementary predictors is proposed, which iteratively refines the performance of the initial predictors. A filtering-based method is applied to further improve the annotation accuracy as well, in which video temporal relationship is sufficiently exploited. Experiment results show that the proposed automatic video annotation method performs superior to general supervised learning methods and co-training
international conference on multimedia and expo | 2007
Meng Wang; Xian-Sheng Hue; Yan Song; Richang Hong; Li-Rong Dai
Eager learning methods, such as SVM, are widely applied in video annotation task for their substantial performance. However, their computational costs are usually prohibitive when a large dataset is faced, especially when annotating a large lexicon of semantic concepts. This paper proposes a video annotation scheme based on lazy learning, and shows that this scheme is much more computationally efficient and flexible. Based on a recently proposed improved Parzen window method, we provide a lazy learning based video annotation scheme. After building the pairwise relationships in dataset, the annotation can be finished rapidly for each concept. Experiments show that the proposed method is much more efficient than SVM while retaining comparable performance.
