Guorong Cai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guorong Cai is active.

Explore More

Publication

Featured researches published by Guorong Cai.

Signal Processing | 2013

Perspective-SIFT: An efficient tool for low-altitude remote sensing image registration

Guorong Cai; Pierre-Marc Jodoin; Shaozi Li; Yundong Wu; Songzhi Su; Zhen-Kun Huang

This paper presents an automated image registration approach that is robust to perspective distortions. State-of-the-art method affine-SIFT uses affine transform to simulate various viewpoints to increase the robustness of registration. However, affine transformation does not follow the process by which real-world images are formed. To solve this problem, we propose a perspective scale invariant feature transform (PSIFT) that uses homographic transformation to simulate perspective distortion. As for ASIFT, PSIFT is based on the scale invariant feature transform (SIFT) and has a two-resolution scheme, namely a low-resolution phase and a high-resolution phase. The low-resolution phase of PSIFT simulates several image views following a perspective transformation by varying two camera axis orientation parameters. Given those simulated images, SIFT is then used to extract features and find matches among them. In the high-resolution phase, the perspective transformations which lead the largest number of matches in the low-resolution stage are selected to generate SIFT features on the original images. Experimental results obtained on three categories of low-altitude remote sensing images and Morel-Yus dataset show that PSIFT outperforms significantly the state-of-the-art ASIFT, SIFT, Random Ferns, Harris-Affine, MSER and Hessian Affine, especially when images suffer severe perspective distortion.

Multimedia Tools and Applications | 2017

Stratified pooling based deep convolutional neural networks for human action recognition

Sheng Yu; Yun Cheng; Songzhi Su; Guorong Cai; Shaozi Li

Video based human action recognition is an active and challenging topic in computer vision. Over the last few years, deep convolutional neural networks (CNN) has become the most popular method and achieved the state-of-the-art performance on several datasets, such as HMDB-51 and UCF-101. Since each video has a various number of frame-level features, how to combine these features to acquire good video-level feature becomes a challenging task. Therefore, this paper proposed a novel action recognition method named stratified pooling, which is based on deep convolutional neural networks (SP-CNN). The process is mainly composed of five parts: (i) fine-tuning a pre-trained CNN on the target dataset, (ii) frame-level features extraction; (iii) the principal component analysis (PCA) method for feature dimensionality reduction; (iv) stratified pooling frame-level features to get video-level feature; and (v) SVM for multiclass classification. Finally, the experimental results conducted on HMDB-51 and UCF-101 datasets show that the proposed method outperforms the state-of-the-art.

IEEE Signal Processing Letters | 2015

Novel Graph Cuts Method for Multi-Frame Super-Resolution

Dongxiao Zhang; Pierre-Marc Jodoin; Cuihua Li; Yundong Wu; Guorong Cai

In this letter, we propose a new graph cuts multi-frame super resolution method. The method is carried out in 3 steps. First, we project each high-resolution pixel p onto the low-resolution images and select low-resolution pixels which fall within the zone of influence of p. Second, we weigh the contribution of the low-resolution pixels via a soft switching function and add them to construct a virtual low resolution pixel. The high resolution image is then recovered after minimizing a Maximum a posteriori Markov Random Field (MAP-MRF) energy function. This is done by approximating our energy function to make it graph representable and minimize it with a graph cuts α-expansion algorithm. Experimental results show that our approach outperforms state-of-the-art methods.

Neurocomputing | 2018

Discriminative parts learning for 3D human action recognition

Min Huang; Guorong Cai; Hong-Bo Zhang; Sheng Yu; Dongying Gong; Donglin Cao; Shaozi Li; Songzhi Su

Abstract Human action recognition from RGBD videos has attracted much attention recently in the area of computer vision. Mainstream methods focus on designing highly discriminative features, which suffer from high dimension. As for human experience, discriminative parts, such as hands or legs, play an important role for identifying human actions. Motivated by this phenomenon, we propose a Random Forest (RF) Out-of-Bag (OB) estimation based approach to extract discriminative parts for each action. First, all the features of joint-based parts are separately fed into the RF Classifier. The OB estimation of each part is used to evaluate the discrimination of the joints in the part. Second, joints with high discrimination for the whole dataset are selected to design feature. Therefore, feature dimension is reduced efficiently. Experiments conducted on MSR Action 3D and MSR Daily Activity3D dataset show that our proposed approach outperforms state-of-the-art methods in accuracy with lower feature dimensions.

Iet Computer Vision | 2017

Meta-action descriptor for action recognition in RGBD video

Min Huang; Songzhi Su; Guorong Cai; Hong-Bo Zhang; Donglin Cao; Shaozi Li

Action recognition is one of the hottest research topics in computer vision. Recent methods represent actions based on global or local video features. These approaches, however, lack semantic structure and may not provide a deep insight into the essence of an action. In this work, the authors argue that semantic clues, such as joint positions and part-level motion clustering, help verify actions. To this end, a meta-action descriptor for action recognition in RGBD video is proposed in this study. Specifically, two discrimination-based strategies – dynamic and discriminative part clustering – are introduced to improve accuracy. Experiments conducted on the MSR Action 3D dataset show that the proposed method significantly outperforms the methods without joint position semantic.

Acta Automatica Sinica | 2014

A perspective invariant image matching algorithm

Guorong Cai; Shaozi Li; Yundong Wu; Songzhi Su; Shui-Li Chen; 李绍滋; 苏松志

To solve the problem of affine transform and discrete sampling in ASIFT(Affine scale invariant feature transform),the PSIFT(Perspective scale invariant feature transform),which is based on particle swarm optimization,is proposed in this paper.The proposed algorithm uses a virtual camera and homographic transform to simulate perspective distortion among multi-view images.Therefore,particle swarm optimization is employed to determine the appropriate homography,which is decomposed into three rotation matrices.Experimental results obtained on three categories of low-altitude remote sensing images show that the proposed method outperforms significantly the state-of-the-art ASIFT,SIFT,Harris-affine and MSER,especially when images suffer severe perspective distortion.

Remote Sensing | 2018

A Robust Transform Estimator Based on Residual Analysis and Its Application on UAV Aerial Images

Guorong Cai; Songzhi Su; Chengcai Leng; Yundong Wu; Feng Lu

Estimating the transformation between two images from the same scene is a fundamental step for image registration, image stitching and 3D reconstruction. State-of-the-art methods are mainly based on sorted residual for generating hypotheses. This scheme has acquired encouraging results in many remote sensing applications. Unfortunately, mainstream residual based methods may fail in estimating the transform between Unmanned Aerial Vehicle (UAV) low altitude remote sensing images, due to the fact that UAV images always have repetitive patterns and severe viewpoint changes, which produce lower inlier rate and higher pseudo outlier rate than other tasks. We performed extensive experiments and found the main reason is that these methods compute feature pair similarity within a fixed window, making them sensitive to the size of residual window. To solve this problem, three schemes that based on the distribution of residuals are proposed, which are called Relational Window (RW), Sliding Window (SW), Reverse Residual Order (RRO), respectively. Specially, RW employs a relaxation residual window size to evaluate the highest similarity within a relaxation model length. SW fixes the number of overlap models while varying the length of window size. RRO takes the permutation of residual values into consideration to measure similarity, not only including the number of overlap structures, but also giving penalty to reverse number within the overlap structures. Experimental results conducted on our own built UAV high resolution remote sensing images show that the proposed three strategies all outperform traditional methods in the presence of severe perspective distortion due to viewpoint change.

Iet Computer Vision | 2018

Combining 2D and 3D features to improve road detection based on stereo cameras

Guorong Cai; Songzhi Su; Wenli He; Yundong Wu; Shaozi Li

Road detection is a fundamental component of autonomous driving systems since it provides valid space and candidate regions of objects for driving decision. The core of road detection methods is extracting effective and discriminative features. Since two-dimensional (2D) and 3D features are complementary, the authors propose a robust multi-feature combination and optimisation framework for stereo image pairs, called Feature++. First, several 2D and 3D features such as Gabor and plane are, respectively, extracted after the generation of 2D super-pixel and a 3D depth image from stereo matching. Second, the combined features are fed into a three-layer shallow neural network classifier to decide whether a super-pixel is road region or not. Finally, the classified results are further refined using fully connected conditional random field (CRF), taking the content information into consideration. We extensively evaluate the performance of four 2D features, four 3D features, and their combinations. Experiments conducted on the KITTI ROAD benchmark show that (i) the combinations of 2D and 3D features greatly improve the road detection performance and (ii) using CRF as a refinement step is necessary. Overall, their proposed `Feature + + method outperforms most manually designed features, and is comparable with state-of-the-art methods that are based on deep learning methods.

Concurrency and Computation: Practice and Experience | 2018

Cover patches: A general feature extraction strategy for spoofing detection

Guorong Cai; Songzhi Su; Chengcai Leng; Jipeng Wu; Yundong Wu; Shaozi Li

Face anti‐spoofing has attracted many attentions in security applications, such as mobile payment and entrance guard. Until now, face anti‐spoofing technique is still a challenging task. Mainstream image‐based spoofing algorithms usually use global motion or texture information to distinguish whether an input face is live or fake. However, the performance of these methods are sensitive in light changes, or images acquired from different sensors. The main reason is that spoofed face image always has slight different texture in local areas, such as landmark or salient region of face. To this end, this paper proposes a novel multi‐patches feature extraction strategy to detect spoofing. First, a set of patches with specific combination scheme is selected to cover the face image. Second, features such as hand‐crafted Gray Level Co‐occurrence Matrix (GLCM), Local Binary Patterns (LBP), or deep features are extracted from these patches. Third, all features are combined as the global descriptor of the face image, then fed into an SVM classifier to verify the anti‐spoofing detection. Experimental results show that the proposed strategy can effectively enhance the performance, concerning with the accuracy of spoofed face detection in four widely used anti‐spoofing databases.

ACM Transactions on Multimedia Computing, Communications, and Applications | 2018

Multifeature Selection for 3D Human Action Recognition

Min Huang; Songzhi Su; Hong-Bo Zhang; Guorong Cai; Dongying Gong; Donglin Cao; Shaozi Li

In mainstream approaches for 3D human action recognition, depth and skeleton features are combined to improve recognition accuracy. However, this strategy results in high feature dimensions and low discrimination due to redundant feature vectors. To solve this drawback, a multi-feature selection approach for 3D human action recognition is proposed in this paper. First, three novel single-modal features are proposed to describe depth appearance, depth motion, and skeleton motion. Second, a classification entropy of random forest is used to evaluate the discrimination of the depth appearance based features. Finally, one of the three features is selected to recognize the sample according to the discrimination evaluation. Experimental results show that the proposed multi-feature selection approach significantly outperforms other approaches based on single-modal feature and feature fusion.

Explore More