Gangqiang Zhao
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gangqiang Zhao.
acm multimedia | 2007
Gangqiang Zhao; Ling Chen; Jie Song; Gencai Chen
Although there exists dozens of vision based 3D head tracking methods, none of them considers the problem of large motion, especially the movement along the Z axis. In this paper we propose a novel tracking method to handle this problem by using Scale Invariant Feature Transform (SIFT) based registration algorithm. Salient SIFT features are first detected and tracked between two images, and then the 3D points corresponding to these features are obtained from a stereo camera. With these 3D points, a registration algorithm in a RANSAC framework is employed to detect the outliers and estimate the head pose. Performance evaluation shows an accurate pose recovery (3° RMS) when the head has large motion, even with movement along the Z axis was about 150 cm.
IEEE Transactions on Image Processing | 2012
Junsong Yuan; Gangqiang Zhao; Yun Fu; Zhu Li; Aggelos K. Katsaggelos; Ying Wu
Given a collection of images or a short video sequence, we define a thematic object as the key object that frequently appears and is the representative of the visual contents. Successful discovery of the thematic object is helpful for object search and tagging, video summarization and understanding, etc. However, this task is challenging because 1) there lacks a priori knowledge of the thematic objects, such as their shapes, scales, locations, and times of re-occurrences, and 2) the thematic object of interest can be under severe variations in appearances due to viewpoint and lighting condition changes, scale variations, etc. Instead of using a top-down generative model to discover thematic visual patterns, we propose a novel bottom-up approach to gradually prune uncommon local visual primitives and recover the thematic objects. A multilayer candidate pruning procedure is designed to accelerate the image data mining process. Our solution can efficiently locate thematic objects of various sizes and can tolerate large appearance variations of the same thematic object. Experiments on challenging image and video data sets and comparisons with existing methods validate the effectiveness of our method.
international conference on image processing | 2012
Gangqiang Zhao; Junsong Yuan
This paper presents a novel road curb detection method using 3D-LIDAR scanner. To detect the curbs, the ground points are separated from the pointcloud first. Then the candidate curb points are selected using three spatial cues: the elevation difference, gradient value and normal orientation. Afterwards the false curb points caused by obstacles are removed using the short-term memory technique. Next the curbs are fitted using the parabola model. Finally, the particle filter is used to smooth the curb detection result. The proposed approach was evaluated on a dataset collected by an autonomous ground vehicle driving around the Ford Research campus and downtown Dearborn. Our curb detection results are accurate and robust despite variations introduced by moving vehicles and pedestrians, static obstacles, road curvature changes, etc.
IEEE Transactions on Circuits and Systems for Video Technology | 2016
Jiong Yang; Gangqiang Zhao; Junsong Yuan; Xiaohui Shen; Zhe Lin; Brian L. Price; Jonathan Brandt
In this paper, we propose a new method for detecting primary objects in unconstrained videos in a completely automatic setting. Here, we define the primary object in a video as the object that presents saliently in most of the frames. Unlike previous works considering only local saliency detection or common pattern discovery, the proposed method integrates the local visual/motion saliency extracted from each frame, global appearance consistency throughout the video, and spatiotemporal smoothness constraint on object trajectories. We first identify a temporal coherent salient region throughout the whole video, and then explicitly learn a global appearance model to distinguish the primary object against the background. In order to obtain high-quality saliency estimations from both appearance and motion cues, we propose a novel self-adaptive saliency map fusion method by learning the reliability of saliency maps from labeled data. As a whole, our method can robustly localize and track primary objects in diverse video content, and handle the challenges such as fast object and camera motion, large scale and appearance variation, background clutter, and pose deformation. Moreover, compared with some existing approaches that assume the object is present in all the frames, our approach can naturally handle the case where the object is present only in part of the frames, e.g., the object enters the scene in the middle of the video or leaves the scene before the video ends. We also propose a new video data set containing 51 videos for primary object detection with per-frame ground-truth labeling. Quantitative experiments on several challenging video data sets demonstrate the superiority of our method compared with the recent state of the arts.
acm multimedia | 2010
Gangqiang Zhao; Ling Chen; Gencai Chen; Junsong Yuan
Invariant feature descriptors such as SIFT and GLOH have been demonstrated to be very robust for image matching and object recognition. However, such descriptors are typically of high dimensionality, e.g. 128-dimension in the case of SIFT. This limits the performance of feature matching techniques in terms of speed and scalability. A new compact feature descriptor, called Kernel Projection Based SIFT (KPB-SIFT), is presented in this paper. Like SIFT, our descriptor encodes the salient aspects of image information in the feature points neighborhood. However, instead of using SIFTs smoothed weighted histograms, we apply kernel projection techniques to orientation gradient patches. The produced KPB-SIFT descriptor is more compact as compared to the state-of-the-art, does not require pre-training step needed by PCA based descriptors, and shows superior advantages in terms of distinctiveness, invariance to scale, and tolerance of geometric distortions. We extensively evaluated the effectiveness of KPB-SIFT with datasets acquired under varying circumstances.
acm multimedia | 2010
Gangqiang Zhao; Junsong Yuan
Discovering common objects that appear frequently in a number of images is a challenging problem, due to (1) the appearance variations of the same common object and (2) the enormous computational cost involved in exploring the huge solution space, including the location, scale, and the number of common objects. We characterize each image as a collection of visual primitives and propose a novel bottom-up approach to gradually prune local primitives to recover the whole common object. A multi-layer candidate pruning procedure is designed to accelerate the image data mining process. Our solution provides accurate localization of the common object, thus is able to crop the common objects despite their variations due to scale, view-point, lighting condition changes. Moreover, it can extract common objects even with few number of images. Experiments on challenging image and video datasets validate the effectiveness and efficiency of our method.
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2014
Hongxing Wang; Gangqiang Zhao; Junsong Yuan
In image and video data, visual pattern refers to re‐occurring composition of visual primitives. Such visual patterns extract the essence of the image and video data that convey rich information. However, unlike frequent patterns in transaction data, there are considerable visual content variations and complex spatial structures among visual primitives, which make effective exploration of visual patterns a challenging task. Many methods have been proposed to address the problem of visual pattern discovery during the past decade. In this article, we provide a review of the major progress in visual pattern discovery. We categorize the existing methods into two groups: bottom‐up pattern discovery and top‐down pattern modeling. The bottom‐up pattern discovery method starts with unordered visual primitives followed by merging the primitives until larger visual patterns are found. In contrast, the top‐down method starts with the modeling of visual primitive compositions and then infers the pattern discovery result. A summary of related applications is also presented. At the end we identify the open issues for future research. WIREs Data Mining Knowl Discov 2014, 4:24–37. doi: 10.1002/widm.1110
Journal of Visual Communication and Image Representation | 2014
Gangqiang Zhao; Xuhong Xiao; Junsong Yuan; Gee Wah Ng
One geometry segmentation algorithm is proposed to parse scanner pointclouds.One efficient multilayer perception classifier is trained to parse camera images.We propose one fuzzy logic based fusion method to integrate results of two sensors.We propose one Markov random field based temporal fusion method.The fused results are more reliable than those of individual sensors. Fusion of information gathered from multiple sources is essential to build a comprehensive situation picture for autonomous ground vehicles. In this paper, an approach which performs scene parsing and data fusion for a 3D-LIDAR scanner (Velodyne HDL-64E) and a video camera is described. First of all, a geometry segmentation algorithm is proposed for detection of obstacles and ground areas from data collected by the Velodyne scanner. Then, corresponding image collected by the video camera is classified patch by patch into more detailed categories. After that, parsing result of each frame is obtained by fusing result of Velodyne data and that of image using the fuzzy logic inference framework. Finally, parsing results of consecutive frames are smoothed by the Markov random field based temporal fusion method. The proposed approach has been evaluated with datasets collected by our autonomous ground vehicle testbed in both rural and urban areas. The fused results are more reliable than that acquired via analysis of only images or Velodyne data.
computer vision and pattern recognition | 2013
Gangqiang Zhao; Junsong Yuan; Gang Hua
A topical video object refers to an object that is frequently highlighted in a video. It could be, e.g., the product logo and the leading actor/actress in a TV commercial. We propose a topic model that incorporates a word co-occurrence prior for efficient discovery of topical video objects from a set of key frames. Previous work using topic models, such as Latent Dirichelet Allocation (LDA), for video object discovery often takes a bag-of-visual-words representation, which ignored important co-occurrence information among the local features. We show that such data driven co-occurrence information from bottom-up can conveniently be incorporated in LDA with a Gaussian Markov prior, which combines top down probabilistic topic modeling with bottom up priors in a unified model. Our experiments on challenging videos demonstrate that the proposed approach can discover different types of topical objects despite variations in scale, view-point, color and lighting changes, or even partial occlusions. The efficacy of the co-occurrence prior is clearly demonstrated when comparing with topic models without such priors.
international conference on data mining | 2011
Gangqiang Zhao; Junsong Yuan
One category of videos usually contains the same thematic pattern, e.g., the spin action in skating videos. The discovery of the thematic pattern is essential to understand and summarize the video contents. This paper addresses two critical issues in mining thematic video patterns: (1) automatic discovery of thematic patterns without any training or supervision information, and (2) accurate localization of the occurrences of all thematic patterns in videos. The major contributions are two-fold. First, we formulate the thematic video pattern discovery as a cohesive sub-graph selection problem by finding a sub-set of visual words that are spatio-temporally collocated. Then spatio-temporal branch-and-bound search can locate all instances accurately. Second, a novel method is proposed to efficiently find the cohesive sub-graph of maximum overall mutual information scores. Our experimental results on challenging commercial and action videos show that our approach can discover different types of thematic patterns despite variations in scale, view-point, color and lighting conditions, or partial occlusions. Our approach is also robust to the videos with cluttered and dynamic backgrounds.