Yue Ming
Beijing University of Posts and Telecommunications
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yue Ming.
Neurocomputing | 2014
Yue Ming
A new framework is proposed for 3D face recognition, called Rigid-area Orthogonal Spectral Regression (ROSR). We utilize the depth images of 3D facial rigid area for efficiently discriminant feature extraction. The framework can effectively estimate the regression matrix to describe intrinsic facial surface features. Large expressions, treated as non-rigid transformations, along with data noise, are the major obstacles that significantly deteriorate the facial linear structure. In our framework, we first utilize the curvature information to remove the non-rigid areas in the 3D face images. Orthogonality can minimize the reconstruction errors and Spectral Regression can accurately describe the manifold structure of the samples. We take these advantages into consideration and propose the ROSR framework, employed for 3D face recognition. Additionally, regression analysis is much faster than the traditional methods. CASIA, Bosphorus and FRGC 3D face databases are introduced for experimental evaluation. Compared with the other commonly used algorithms, our framework has a consistently better performance in terms of efficiency and robustness.
international conference on digital signal processing | 2015
Lei Tian; Chunxiao Fan; Yue Ming; Yi Jin
High-level features can represent the semantics of the original data and it is a plausible way to avoid the problem of hand-crafted features for face recognition. This paper proposes an effective deep learning framework by stacking multiple output features that learned through each stage of the Convolutional Neural Network (CNN). Different from the traditional deep learning network, we use Principal Component Analysis (PCA) to get the filter kernels of convolutional layer, which is name as Stacked PCA Network (SPCANet). Our SPCANet model follows the basic architecture of the CNN, which comprises three layers in each stage: convolutional filter layer, nonlinear processing layer and feature pooling layer. Firstly, in the convolutional filter layer of our model, PCA instead of stochastic gradient descent (SGD) is employed to learn filter kernels, and the output of all cascaded convolutional filter layers is used as the input of nonlinear processing layer. Secondly, the following nonlinear processing layer is also simplified. We use hashing method for nonlinear processing. Thirdly, the block based histograms instead of max-pooling technique are employed in the feature pooling layer. In the last output layer, the output of each stage is stacked together as one final feature output of our model. Extensive experiments conducted on many different face recognition scenarios demonstrate the effectiveness of our proposed approach.
Neurocomputing | 2015
Yue Ming
Abstract The times of Big Data promotes increasingly higher demands for fine-motion analysis, such as hand activity recognition. However, in real-world scenarios, hand activity recognition suffers huge challenges with variations of illumination, poses and occlusions. The depth acquisition provides an effective way to solve the above issues. In this paper, a complete framework of hand activity recognition combined depth information is presented for fine-motion analysis. First, the improved graph cuts method is introduced to hand location and tracking over time. Then, combined with 3D geometric characteristics and hand behavior prior information, 3D Mesh MoSIFT feature descriptor is proposed to represent the discriminant property of hand activity. Simulation orthogonal matching pursuit (SOMP) is used to encode the visual codewords. Experiments are based on the public available depth datasets (ChaLearn gesture dataset and our captured RGB-D dataset). Compared with the previous popular approaches, our framework has a consistently better performance for real-world applications with fine-motion analysis in terms of effectiveness, robustness and universality.
PLOS ONE | 2015
Yue Ming; Guangchao Wang; Chunxiao Fan
With the rapid development of 3D somatosensory technology, human behavior recognition has become an important research field. Human behavior feature analysis has evolved from traditional 2D features to 3D features. In order to improve the performance of human activity recognition, a human behavior recognition method is proposed, which is based on a hybrid texture-edge local pattern coding feature extraction and integration of RGB and depth videos information. The paper mainly focuses on background subtraction on RGB and depth video sequences of behaviors, extracting and integrating historical images of the behavior outlines, feature extraction and classification. The new method of 3D human behavior recognition has achieved the rapid and efficient recognition of behavior videos. A large number of experiments show that the proposed method has faster speed and higher recognition rate. The recognition method has good robustness for different environmental colors, lightings and other factors. Meanwhile, the feature of mixed texture-edge uniform local binary pattern can be used in most 3D behavior recognition.
Neurocomputing | 2016
Yue Ming; Xiaopeng Hong
In this paper, we design a unified 3D face authentication system for practical use. First, we propose a facial depth recovery method to construct a facial depth map from stereoscopic videos. It effectively utilize prior facial information and incorporate the visibility term to classify static and dynamic pixels for robust depth estimation. Secondly, in order to make 3D face authentication more accurate and consistent, we present an intrinsic scale feature detection for interesting points on 3D facial mesh regions. Then, a novel feature descriptor is proposed, called Local Mesh Scale-Invariant Feature Transform (LMSIFT) to reflect the different face recognition abilities in different facial regions. Finally, the sparse optimization problem of visual codebook is used to 3D face learning. We evaluate our approach on publicly available 3D face databases and self-collected realistic scene databases. We also develop an interactive education system to investigate its performance in practice, which demonstrates the high performance of the proposed approach for accurate 3D face authentication. Compared with previous popular approaches, our system has consistently better performance in terms of effectiveness, robustness and universality.
Journal of Electronic Imaging | 2016
Lei Tian; Chunxiao Fan; Yue Ming
Abstract. It is well known that higher level features can represent the abstract semantics of original data. We propose a multiple scales combined deep learning network to learn a set of high-level feature representations through each stage of convolutional neural network for face recognition, which is named as multiscaled principle component analysis (PCA) Network (MS-PCANet). There are two main differences between our model and the traditional deep learning network. On the one hand, we get the prefixed filter kernels by learning the principal component of images’ patches using PCA, nonlinearly process the convolutional results by using simple binary hashing, and pool them using spatial pyramid pooling method. On the other hand, in our model, the output features of several stages are fed to the classifier. The purpose of combining feature representations from multiple stages is to provide multiscaled features to the classifier, since the features in the latter stage are more global and invariant than those in the early stage. Therefore, our MS-PCANet feature compactly encodes both holistic abstract information and local specific information. Extensive experimental results show our MS-PCANet model can efficiently extract high-level feature presentations and outperform state-of-the-art face/expression recognition methods on multiple modalities benchmark face-related datasets.
Neurocomputing | 2016
Lei Tian; Chunxiao Fan; Yue Ming
Binary feature descriptors have been widely used in computer vision field due to their excellent discriminative power and strong robustness, and local binary patterns (LBP) and its variations have proven that they are effective face descriptors. However, the forms of such binary feature descriptors are predefined in the hand-crafted way, which requires strong domain knowledge to design them. In this paper, we propose a simple and efficient iterative quantization binary codes (IQBC) feature learning method to learn a discriminative binary face descriptor in the data-driven way. Firstly, similar to traditional LBP method, we extract patch-wise pixel difference vectors (PDVs) by computing and concatenating the difference between center patch and its neighboring patches. Then, inspired by multi-class spectral clustering and the orthogonal Procrustes problem, which both are widely used in image retrieval field, we learn an optimized rotation to minimize the quantization error of mapping data to the vertices of a zero-centered binary hypercube by using iterative quantization scheme. In other words, we learn a feature mapping to project these pixel difference vectors into low-dimensional binary vectors. And our IQBC can be used with unsupervised data embedding method such as principle component analysis (PCA) and supervised data embedding method such as canonical correlation analysis (CCA), namely IQBC-PCA and IQBC-CCA. Lastly, we cluster and pool these projected binary codes into a histogram-based feature that describes the co-occurrence of binary codes. And we consider the histogram-based feature as our final feature representation for each face image. We investigate the performance of our IQBC-PCA and IQBC-CCA on FERET, CAS-PEAL-R1, LFW and PaSC databases. Extensive experimental results demonstrate that our IQBC descriptor outperforms other state-of-the-art face descriptors.
Multimedia Tools and Applications | 2017
Lei Tian; Chunxiao Fan; Yue Ming
Local feature descriptor has been widely used in computer vision field due to their excellent discriminative power and strong robustness. However, the forms of such local descriptors are predefined in the hand-crafted way, which requires strong domain knowledge to design them. In this paper, we propose a simple and efficient Spherical Hashing based Binary Codes (SHBC) feature learning method to learn a discriminative and robust binary face descriptor in the data-driven way. Firstly, we extract patch-wise pixel difference vectors (PDVs) by computing the difference between center patch and its neighboring patches. Then, inspired by the fact that hypersphere provide much stronger power in defining a tighter closed region in the original data space than hyperplane, we learn a hypersphere-based hashing function to map these PDVs into low-dimensional binary codes by an efficient iterative optimization process, which achieves both balanced bits partitioning of data points and independence between hashing functions. In order to better capture the semantic information of the dataset, our SHBC also can be used with supervised data embedding method, such as Canonical Correlation Analysis (CCA), namely Supervised-SHBC (S-SHBC). Lastly, we cluster and pool these learned binary codes into a histogram-based feature that describes the co-occurrence of binary codes. And we consider the histogram-based feature as our final feature representation for each face image. We investigate the performance of our SHBC and S-SHBC on FERET, CAS-PEAL-R1, LFW and PaSC databases. Extensive experimental results demonstrate that our SHBC descriptor outperforms other state-of-the-art face descriptors.
international conference on intelligent robotics and applications | 2015
Lei Tian; Chunxiao Fan; Yue Ming; Jiakun Shi
In this work, we take advantage of the superiority of Spectral Graph Theory in classification application and propose a novel deep learning framework for face analysis which is called Spectral Regression Discriminant Analysis Network SRDANet. Our SRDANet model shares the same basic architecture of Convolutional Neural Network CNN, which comprises three basic components: convolutional filter layer, nonlinear processing layer and feature pooling layer. While it is different from traditional deep learning network that in our convolutional layer, we extract the leading eigenvectors from patches in facial image which are used as filter kernels instead of randomly initializing kernels and update them by stochastic gradient descent SGD. And the output of all cascaded convolutional filter layers is used as the input of nonlinear processing layer. In the following nonlinear processing layer, we use hashing method for nonlinear processing. In feature pooling layer, the block-based histograms are employed to pooling output features instead of max-pooling technique. At last, the output of feature pooling layer is considered as one final feature output of our model. Different from the previous single-task research for face analysis, our proposed approach demonstrates an excellent performance in face recognition and expression recognition with 2D/3D facial images simultaneously. Extensive experiments conducted on many different face analysis databases demonstrate the efficiency of our proposed SRDANet model. Databases such as Extended Yale B, PIE, ORL are used for 2D face recognition, FRGC v2 is used for 3D face recognition and BU-3DFE is used for 3D expression recognition.
international conference on human system interactions | 2015
Chunxiao Fan; Lei Tian; Guangchao Wang; Yue Ming; Jiakun Shi; Yi Jin
Nowadays, more and more activity recognition algorithms begin to improve recognition performance by combining the RGB and depth information. Although, the space-time volumes (STV) algorithm and the space-time local features algorithm can combine the RGB and depth information effectively, they also have their own defects. Such as they need expensive computational cost and they are not suitable for modeling nonperiodic activity. In this paper, we propose a novel algorithm for three dimensional human activity recognition that combines spatial-domain local texture features and spatio-temporal local texture features. On the one hand, in order to extract spatial local texture features, we mix the RGB and depth image sequence which have been applied with ViBe (Visual Background extractor) and binarization operator. Then we obtain the RGB-MOHBBI and depth-MOBHBI respectively and perform intersect operation on them. Afterwards, we extract LBP feature from the mixed MOHBBI to describe spatial domain feature. On the other hand, we follow the same background subtraction and binarization method to process the RGB and depth image sequences and get the spatial-temporal local texture features. And then, we project the three dimensional image volume on plane X-T and plane Y-T to get the spatio-temporal behavior volume change image to which we apply LBP operator to extract features that can represent human activity feature in spatio-temporal domain. At last, we combine the two local features that are extracted by LBP algorithm as one integrated feature of our model final output. Extensive experiments are conducted on the BUPT Arm Activity Dataset and the BUPT Arm And Finger Activity Dataset. The experimental results demonstrate the algorithm we proposed in this paper can make up for the deficiency of traditional activity recognition algorithms effectively and provide excellent experiment results on different databases of various complexities.