Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Genliang Guan is active.

Publication


Featured researches published by Genliang Guan.


Pattern Recognition | 2015

Video summarization via minimum sparse reconstruction

Shaohui Mei; Genliang Guan; Zhiyong Wang; Shuai Wan; Mingyi He; David Dagan Feng

The rapid growth of video data demands both effective and efficient video summarization methods so that users are empowered to quickly browse and comprehend a large amount of video content. In this paper, we formulate the video summarization task with a novel minimum sparse reconstruction (MSR) problem. That is, the original video sequence can be best reconstructed with as few selected keyframes as possible. Different from the recently proposed convex relaxation based sparse dictionary selection method, our proposed method utilizes the true sparse constraint L0 norm, instead of the relaxed constraint L 2 , 1 norm, such that keyframes are directly selected as a sparse dictionary that can well reconstruct all the video frames. An on-line version is further developed owing to the real-time efficiency of the proposed MSR principle. In addition, a percentage of reconstruction (POR) criterion is proposed to intuitively guide users in obtaining a summary with an appropriate length. Experimental results on two benchmark datasets with various types of videos demonstrate that the proposed methods outperform the state of the art. HighlightsA minimum sparse reconstruction (MSR) based video summarization (VS) model is constructed.An L0 norm based constraint is imposed to ensure real sparsity.Two efficient and effective MSR based VS algorithms are proposed for off-line and on-line applications, respectively.A scalable strategy is designed to provide flexibility for practical applications.


IEEE Transactions on Circuits and Systems for Video Technology | 2013

Keypoint-Based Keyframe Selection

Genliang Guan; Zhiyong Wang; Shiyang Lu; Jeremiah D. Deng; David Dagan Feng

Keyframe selection has been crucial for effective and efficient video content analysis. While most of the existing approaches represent individual frames with global features, we, for the first time, propose a keypoint-based framework to address the keyframe selection problem so that local features can be employed in selecting keyframes. In general, the selected keyframes should both be representative of video content and containing minimum redundancy. Therefore, we introduce two criteria, coverage and redundancy, based on keypoint matching in the selection process. Comprehensive experiments demonstrate that our approach outperforms the state of the art.


IEEE Transactions on Multimedia | 2014

A Bag-of-Importance Model With Locality-Constrained Coding Based Feature Learning for Video Summarization

Shiyang Lu; Zhiyong Wang; Tao Mei; Genliang Guan; David Dagan Feng

Video summarization helps users obtain quick comprehension of video content. Recently, some studies have utilized local features to represent each video frame and formulate video summarization as a coverage problem of local features. However, the importance of individual local features has not been exploited. In this paper, we propose a novel Bag-of-Importance (BoI) model for static video summarization by identifying the frames with important local features as keyframes, which is one of the first studies formulating video summarization at local feature level, instead of at global feature level. That is, by representing each frame with local features, a video is characterized with a bag of local features weighted with individual importance scores and the frames with more important local features are more representative, where the representativeness of each frame is the aggregation of the weighted importance of the local features contained in the frame. In addition, we propose to learn a transformation from a raw local feature to a more powerful sparse nonlinear representation for deriving the importance score of each local feature, rather than directly utilize the hand-crafted visual features like most of the existing approaches. Specifically, we first employ locality-constrained linear coding (LCC) to project each local feature into a sparse transformed space. LCC is able to take advantage of the manifold geometric structure of the high dimensional feature space and form the manifold of the low dimensional transformed space with the coordinates of a set of anchor points. Then we calculate the l2 norm of each anchor point as the importance score of each local feature which is projected to the anchor point. Finally, the distribution of the importance scores of all the local features in a video is obtained as the BoI representation of the video. We further differentiate the importance of local features with a spatial weighting template by taking the perceptual difference among spatial regions of a frame into account. As a result, our proposed video summarization approach is able to exploit both the inter-frame and intra-frame properties of feature representations and identify keyframes capturing both the dominant content and discriminative details within a video. Experimental results on three video datasets across various genres demonstrate that the proposed approach clearly outperforms several state-of-the-art methods.


international conference on multimedia and expo | 2012

Video Summarization with Global and Local Features

Genliang Guan; Zhiyong Wang; Kaimin Yu; Shaohui Mei; Mingyi He; David Dagan Feng

Video summarization has been crucial for effective and efficient access of video content due to the ever increasing amount of video data. Most of the existing key frame based summarization approaches represent individual frames with global features, which neglects the local details of visual content. Considering that a video generally depicts a story with a number of scenes in different temporal order and shooting angles, we formulate scene summarization as identifying a set of frames which best covers the key point pool constructed from the scene. Therefore, our approach is a two-step process, identifying scenes and selecting representative content for each scene. Global features are utilized to identify scenes through clustering due to the visual similarity among video frames of the same scene, and local features to summarize each scene. We develop a key point based key frame selection method to identify representative content of a scene, which allows users to flexibly tune summarization length. Our preliminary results indicate that the proposed approach is very promising and potentially robust to clustering based scene identification.


ACM Transactions on Multimedia Computing, Communications, and Applications | 2014

A Top-Down Approach for Video Summarization

Genliang Guan; Zhiyong Wang; Shaohui Mei; Max Ott; Mingyi He; David Dagan Feng

While most existing video summarization approaches aim to identify important frames of a video from either a global or local perspective, we propose a top-down approach consisting of scene identification and scene summarization. For scene identification, we represent each frame with global features and utilize a scalable clustering method. We then formulate scene summarization as choosing those frames that best cover a set of local descriptors with minimal redundancy. In addition, we develop a visual word-based approach to make our approach more computationally scalable. Experimental results on two benchmark datasets demonstrate that our proposed approach clearly outperforms the state-of-the-art.


international conference on multimedia and expo | 2014

L 2,0 constrained sparse dictionary selection for video summarization

Shaohui Mei; Genliang Guan; Zhiyong Wang; Mingyi He; Xian-Sheng Hua; David Dagan Feng

The ever increasing volume of video content has created profound challenges for developing efficient video summarization (VS) techniques to access the data. Recent developments on sparse dictionary selection have demonstrated promising results for VS, however, the convex relaxation based solution cannot ensure the sparsity of the dictionary directly and it selects keyframes in a local point of view. In this paper, an L2,0 constrained sparse dictionary selection model is proposed to reformulate the problem of VS. In addition, a simultaneous orthogonal matching pursuit (SOMP) based method is proposed to obtain an approximate solution for the proposed model without smoothing the penalty function, and thus selects keyframes in a global point of view. In order to allow for intuitive and flexible configuration of VS process, a percentage of residuals (POR) criterion is also developed to produce video summaries in different lengths. Experimental results demonstrate that our proposed method outperforms the state-of-the-art.


Multimedia Tools and Applications | 2013

Semantic context based refinement for news video annotation

Zhiyong Wang; Genliang Guan; Yu Qiu; Li Zhuo; Dagan Feng

Automatic video annotation is to bridge the semantic gap and facilitate concept based video retrieval by detecting high level concepts from video data. Recently, utilizing context information has emerged as an important direction in such domain. In this paper, we present a novel video annotation refinement approach by utilizing extrinsic semantic context extracted from video subtitles and intrinsic context among candidate annotation concepts. The extrinsic semantic context is formed by identifying a set of key terms from video subtitles. The semantic similarity between those key terms and the candidate annotation concepts is then exploited to refine initial annotation results, while most existing approaches utilize textual information heuristically. Similarity measurements including Google distance and WordNet distance have been investigated for such a refinement purpose, which is different with approaches deriving semantic relationship among concepts from given training datasets. Visualness is also utilized to discriminate individual terms for further refinement. In addition, Random Walk with Restarts (RWR) technique is employed to perform final refinement of the annotation results by exploring the inter-relationship among annotation concepts. Comprehensive experiments on TRECVID 2005 dataset have been conducted to demonstrate the effectiveness of the proposed annotation approach and to investigate the impact of various factors.


multimedia signal processing | 2008

Measuring semantic similarity between concepts in visual domain

Zhiyong Wang; Genliang Guan; Jiajun Wang; David Dagan Feng

Concept similarity has been intensively researched in the natural language processing domain due to its important role in many applications such as language modeling and information retrieval. There are few studies on measuring concept similarity in visual domain, though concept based multimedia information retrieval has attracted a lot of attentions. In this paper, we present a scalable framework for such a purpose, which is different from traditional approaches to exploring correlation among concepts in image/video annotation domain. For each concept, a model based on feature distribution is built using sample images collected from the Internet. And similarity between concepts is measured with the similarity between their models. Hereby, a Gaussian mixture model (GMM) is employed to model each concept and two similarity measurements are investigated. Experimental results on 13,974 images of 16 concepts collected through image search engines have demonstrated that the similarity between concepts is very close to human perception. In addition, the entropy of GMM cluster distributions can be a good indication of selecting concepts for image/video annotation.


acm multimedia | 2012

What is happening: annotating images with verbs

Gang Tian; Genliang Guan; Zhiyong Wang; Dagan Feng

Image annotation has been widely investigated to discover the semantics of an image. However, most of the existing algorithms focus on noun tags (e.g. concepts and objects). Since an image is a snapshot of the real world event, annotating images with verbs will enable richer understanding of an image. In this paper, we propose a data-driven approach to verb oriented image annotation. At first, we obtain verb candidates by generating search queries for a given image with initial noun tags and establishing a sentence corpus from those queries. We utilize visualness to filter tags which are not visually presentable (e.g. pain) and differentiate tags into two categories (i.e. scene based and object based) to impose linguistic rules in verb extraction. Then we further re-rank the candidate verbs with the tag context discovered from the images which are both semantically and visually similar to the given image in the MIRFlickr dataset. Our experimental results from user study demonstrate that our proposed approach is promising.


multimedia signal processing | 2009

Improved concept similarity measuring in the visual domain

Genliang Guan; Zhiyong Wang; Qi Tian; David Dagan Feng

Exploring semantic similarity between concepts in visual domain has a wide range of applications such as natural language processing and multimedia retrieval, which in general requires both a large pool of sample images for each concept and a model to capture its visual characteristics. Instead of relying on high quality and large quantity sample data which is very difficult to obtain, in this paper, a novel method is proposed to achieve improvement in measuring concept similarity by incorporating concept modeling technique into data pruning process. At first, a number of sampling concept models are obtained by sampling a subset from the sample dataset of each concept. Then noisy samples are discarded in terms of their probabilities to the sampling concept models. Experimental results on 31,275 web images of 38 concepts defined in LSCOM indicate that the concept similarity obtained through our proposed approach is more coherent to human cognition. A concept hierarchy tree built from the 38 concepts with their similarity further demonstrates the effectiveness of our proposed method.

Collaboration


Dive into the Genliang Guan's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mingyi He

Northwestern Polytechnical University

View shared research outputs
Top Co-Authors

Avatar

Shaohui Mei

Northwestern Polytechnical University

View shared research outputs
Top Co-Authors

Avatar

Kaimin Yu

Information Technology University

View shared research outputs
Top Co-Authors

Avatar

Shuai Wan

Northwestern Polytechnical University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Li Zhuo

Beijing University of Technology

View shared research outputs
Top Co-Authors

Avatar

Qiuxia Wu

South China University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge