Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xiaopeng Hong is active.

Publication


Featured researches published by Xiaopeng Hong.


Image and Vision Computing | 2014

A review of recent advances in visual speech decoding

Ziheng Zhou; Guoying Zhao; Xiaopeng Hong; Matti Pietikäinen

Abstract Visual speech information plays an important role in automatic speech recognition (ASR) especially when audio is corrupted or even inaccessible. Despite the success of audio-based ASR, the problem of visual speech decoding remains widely open. This paper provides a detailed review of recent advances in this research area. In comparison with the previous survey [97] which covers the whole ASR system that uses visual speech information, we focus on the important questions asked by researchers and summarize the recent studies that attempt to answer them. In particular, there are three questions related to the extraction of visual features, concerning speaker dependency, pose variation and temporal information, respectively. Another question is about audio-visual speech fusion, considering the dynamic changes of modality reliabilities encountered in practice. In addition, the state-of-the-art on facial landmark localization is briefly introduced in this paper. Those advanced techniques can be used to improve the region-of-interest detection, but have been largely ignored when building a visual-based ASR system. We also provide details of audio-visual speech databases. Finally, we discuss the remaining challenges and offer our insights into the future research on visual speech decoding.


computer vision and pattern recognition | 2009

Sigma Set: A small second order statistical region descriptor

Xiaopeng Hong; Hong Chang; Shiguang Shan; Xilin Chen; Wen Gao

Given an image region of pixels, second order statistics can be used to construct a descriptor for object representation. One example is the covariance matrix descriptor, which shows high discriminative power and good robustness in many computer vision applications. However, operations for the covariance matrix on Riemannian manifolds are usually computationally demanding. This paper proposes a novel second order statistics based region descriptor, named “Sigma Set”, in the form of a small set of vectors, which can be uniquely constructed through Cholesky decomposition on the covariance matrix. Sigma Set is of low dimension, powerful and robust. Moreover, compared with the covariance matrix, Sigma Set is not only more efficient in distance evaluation and average calculation, but also easier to be enriched with first order statistics. Experimental results in texture classification and object tracking verify the effectiveness and efficiency of this novel object descriptor.


IEEE Transactions on Image Processing | 2014

Combining LBP Difference and Feature Correlation for Texture Description

Xiaopeng Hong; Guoying Zhao; Matti Pietikäinen; Xilin Chen

Effective characterization of texture images requires exploiting multiple visual cues from the image appearance. The local binary pattern (LBP) and its variants achieve great success in texture description. However, because the LBP(-like) feature is an index of discrete patterns rather than a numerical feature, it is difficult to combine the LBP(-like) feature with other discriminative ones by a compact descriptor. To overcome the problem derived from the nonnumerical constraint of the LBP, this paper proposes a numerical variant accordingly, named the LBP difference (LBPD). The LBPD characterizes the extent to which one LBP varies from the average local structure of an image region of interest. It is simple, rotation invariant, and computationally efficient. To achieve enhanced performance, we combine the LBPD with other discriminative cues by a covariance matrix. The proposed descriptor, termed the covariance and LBPD descriptor (COV-LBPD), is able to capture the intrinsic correlation between the LBPD and other features in a compact manner. Experimental results show that the COV-LBPD achieves promising results on publicly available data sets.


Neurocomputing | 2016

Dynamic texture and scene classification by transferring deep image features

Xianbiao Qi; Chun-Guang Li; Guoying Zhao; Xiaopeng Hong; Matti Pietikäinen

Dynamic texture and scene classification are two fundamental problems in understanding natural video content. Extracting robust and effective features is a crucial step towards solving these problems. However, the existing approaches suffer from the sensitivity to either varying illumination, or viewpoint changes, or even camera motion, and/or the lack of spatial information. Inspired by the success of deep structures in image classification, we attempt to leverage a deep structure to extract features for dynamic texture and scene classification. To tackle with the challenges in training a deep structure, we propose to transfer some prior knowledge from image domain to video domain. To be more specific, we propose to apply a well-trained Convolutional Neural Network (ConvNet) as a feature extractor to extract mid-level features from each frame, and then form the video-level representation by concatenating the first and the second order statistics over the mid-level features. We term this two-level feature extraction scheme as a Transferred ConvNet Feature (TCoF). Moreover, we explore two different implementations of the TCoF scheme, i.e., the spatial TCoF and the temporal TCoF. In the spatial TCoF, the mean-removed frames are used as the inputs of the ConvNet; whereas in the temporal TCoF, the differences between two adjacent frames are used as the inputs of the ConvNet. We evaluate systematically the proposed spatial TCoF and the temporal TCoF schemes on three benchmark data sets, including DynTex, YUPENN, and Maryland, and demonstrate that the proposed approach yields superior performance.


Neurocomputing | 2016

Spontaneous facial micro-expression analysis using Spatiotemporal Completed Local Quantized Patterns

Xiaohua Huang; Guoying Zhao; Xiaopeng Hong; Wenming Zheng; Matti Pietikäinen

Spontaneous facial micro-expression analysis has become an active task for recognizing suppressed and involuntary facial expressions shown on the face of humans. Recently, Local Binary Pattern from Three Orthogonal Planes (LBP-TOP) has been employed for micro-expression analysis. However, LBP-TOP suffers from two critical problems, causing a decrease in the performance of micro-expression analysis. It generally extracts appearance and motion features from the sign-based difference between two pixels but not yet considers other useful information. As well, LBP-TOP commonly uses classical pattern types which may be not optimal for local structure in some applications. This paper proposes SpatioTemporal Completed Local Quantization Patterns (STCLQP) for facial micro-expression analysis. Firstly, STCLQP extracts three interesting information containing sign, magnitude and orientation components. Secondly, an efficient vector quantization and codebook selection are developed for each component in appearance and temporal domains to learn compact and discriminative codebooks for generalizing classical pattern types. Finally, based on discriminative codebooks, spatiotemporal features of sign, magnitude and orientation components are extracted and fused. Experiments are conducted on three publicly available facial micro-expression databases. Some interesting findings about the neighboring patterns and the component analysis are concluded. Comparing with the state of the art, experimental results demonstrate that STCLQP achieves a substantial improvement for analyzing facial micro-expressions. HighlightsWe propose spatiotemporal completed local quantized pattern for micro-expression analysis.We propose to use three useful information, including the sign-based, magnitude-based and orientation-based difference of pixels for LBP.We propose to use an efficient vector quantization and discriminative codebook selection to make LBP-TOP more discriminative and compact.We evaluate the framework on three publicly available facial micro-expression databases.We evaluate the influence of parameters, different components and codebook selection to spatiotemporal completed local quantized pattern.


international conference on image analysis and processing | 2013

Age Estimation Using Local Binary Pattern Kernel Density Estimate

Juha Ylioinas; Abdenour Hadid; Xiaopeng Hong; Matti Pietikäinen

We propose a novel kernel method for constructing local binary pattern statistics for facial representation in human age estimation. For age estimation, we make use of the de facto support vector regression technique. The main contributions of our work include (i) evaluation of a pose correction method based on simple image flipping and (ii) a comparison of two local binary pattern based facial representations, namely a spatially enhanced histogram and a novel kernel density estimate. Our single- and cross-database experiments indicate that the kernel density estimate based representation yields better estimation accuracy than the corresponding histogram one, which we regard as a very interesting finding. In overall, the constructed age estimation system provides comparable performance against the state-of-the-art methods. We are using a well-defined evaluation protocol allowing a fair comparison of our results.


Neurocomputing | 2014

Structured partial least squares for simultaneous object tracking and segmentation

Bineng Zhong; Xiaotong Yuan; Rongrong Ji; Yan Yan; Zhen Cui; Xiaopeng Hong; Yan Chen; Tian Wang; Duansheng Chen; Jiaxin Yu

Segmentation-based tracking methods are a class of powerful tracking methods that have been highly successful in alleviating model drift during online-learning of the trackers. These methods typically include a detection component and a segmentation component, in which the tracked objects are first located by detection; then the results from detection are used to guide the process of segmentation to reduce the noises in the training data. However, one of the limitations is that the processes of detection and segmentation are treated entirely separately. The drift from detection may affect the results of segmentation. This also aggravates the trackers drift. In this paper, we propose a novel method to address this limitation by incorporating structured labeling information in the partial least square analysis algorithms for simultaneous object tracking and segmentation. This allows for novel structured labeling constraints to be placed directly on the tracked objects to provide useful contour constraint to alleviate the drifting problem. We show through both visual results and quantitative measurements on the challenging sequences that our method produces more robust tracking results while obtaining accurate object segmentation results.


international conference on multimodal interfaces | 2014

Improved Spatiotemporal Local Monogenic Binary Pattern for Emotion Recognition in The Wild

Xiaohua Huang; Qiuhai He; Xiaopeng Hong; Guoying Zhao; Matti Pietikäinen

Local binary pattern from three orthogonal planes (LBP-TOP) has been widely used in emotion recognition in the wild. However, it suffers from illumination and pose changes. This paper mainly focuses on the robustness of LBP-TOP to unconstrained environment. Recent proposed method, spatiotemporal local monogenic binary pattern (STLMBP), was verified to work promisingly in different illumination conditions. Thus this paper proposes an improved spatiotemporal feature descriptor based on STLMBP. The improved descriptor uses not only magnitude and orientation, but also the phase information, which provide complementary information. In detail, the magnitude, orientation and phase images are obtained by using an effective monogenic filter, and multiple feature vectors are finally fused by multiple kernel learning. STLMBP and the proposed method are evaluated in the Acted Facial Expression in the Wild as part of the 2014 Emotion Recognition in the Wild Challenge. They achieve competitive results, with an accuracy gain of 6.35% and 7.65% above the challenge baseline (LBP-TOP) over video.


computer vision and pattern recognition | 2016

Recurrent Convolutional Neural Network Regression for Continuous Pain Intensity Estimation in Video

Jing Zhou; Xiaopeng Hong; Fei Su; Guoying Zhao

Automatic pain intensity estimation possesses a significant position in healthcare and medical field. Traditional static methods prefer to extract features from frames separately in a video, which would result in unstable changes and peaks among adjacent frames. To overcome this problem, we propose a real-time regression framework based on the recurrent convolutional neural network for automatic frame-level pain intensity estimation. Given vector sequences of AAM-warped facial images, we used a slidingwindow strategy to obtain fixed-length input samples for the recurrent network. We then carefully design the architecture of the recurrent network to output continuousvalued pain intensity. The proposed end-to-end pain intensity regression framework can predict the pain intensity of each frame by considering a sufficiently large historical frames while limiting the scale of the parameters within the model. Our method achieves promising results regarding both accuracy and running speed on the published UNBCMcMaster Shoulder Pain Expression Archive Database.


ACM Transactions on Accessible Computing | 2016

Isolated Sign Language Recognition with Grassmann Covariance Matrices

Hanjie Wang; Xiujuan Chai; Xiaopeng Hong; Guoying Zhao; Xilin Chen

In this article, to utilize long-term dynamics over an isolated sign sequence, we propose a covariance matrix--based representation to naturally fuse information from multimodal sources. To tackle the drawback induced by the commonly used Riemannian metric, the proximity of covariance matrices is measured on the Grassmann manifold. However, the inherent Grassmann metric cannot be directly applied to the covariance matrix. We solve this problem by evaluating and selecting the most significant singular vectors of covariance matrices of sign sequences. The resulting compact representation is called the Grassmann covariance matrix. Finally, the Grassmann metric is used to be a kernel for the support vector machine, which enables learning of the signs in a discriminative manner. To validate the proposed method, we collect three challenging sign language datasets, on which comprehensive evaluations show that the proposed method outperforms the state-of-the-art methods both in accuracy and computational cost.

Collaboration


Dive into the Xiaopeng Hong's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xilin Chen

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yue Ming

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chunxiao Fan

Beijing University of Posts and Telecommunications

View shared research outputs
Top Co-Authors

Avatar

Lei Tian

Beijing University of Posts and Telecommunications

View shared research outputs
Researchain Logo
Decentralizing Knowledge