Xiangyu Zhu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiangyu Zhu is active.

Explore More

Publication

Featured researches published by Xiangyu Zhu.

computer vision and pattern recognition | 2015

Person re-identification by Local Maximal Occurrence representation and metric learning

Shengcai Liao; Yang Hu; Xiangyu Zhu; Stan Z. Li

Person re-identification is an important technique towards automatic search of a persons presence in a surveillance video. Two fundamental problems are critical for person re-identification, feature representation and metric learning. An effective feature representation should be robust to illumination and viewpoint changes, and a discriminant metric should be learned to match various person images. In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA). The LOMO feature analyzes the horizontal occurrence of local features, and maximizes the occurrence to make a stable representation against viewpoint changes. Besides, to handle illumination variations, we apply the Retinex transform and a scale invariant texture operator. To learn a discriminant metric, we propose to learn a discriminant low dimensional subspace by cross-view quadratic discriminant analysis, and simultaneously, a QDA metric is learned on the derived subspace. We also present a practical computation method for XQDA, as well as its regularization. Experiments on four challenging person re-identification databases, VIPeR, QMUL GRID, CUHK Campus, and CUHK03, show that the proposed method improves the state-of-the-art rank-1 identification rates by 2.2%, 4.88%, 28.91%, and 31.55% on the four databases, respectively.

computer vision and pattern recognition | 2016

Face Alignment Across Large Poses: A 3D Solution

Xiangyu Zhu; Zhen Lei; Xiaoming Liu; Hailin Shi; Stan Z. Li

Face alignment, which fits a face model to an image and extracts the semantic meanings of facial pixels, has been an important topic in CV community. However, most algorithms are designed for faces in small to medium poses (below 45), lacking the ability to align faces in large poses up to 90. The challenges are three-fold: Firstly, the commonly used landmark-based face model assumes that all the landmarks are visible and is therefore not suitable for profile views. Secondly, the face appearance varies more dramatically across large poses, ranging from frontal view to profile view. Thirdly, labelling landmarks in large poses is extremely challenging since the invisible landmarks have to be guessed. In this paper, we propose a solution to the three problems in an new alignment framework, called 3D Dense Face Alignment (3DDFA), in which a dense 3D face model is fitted to the image via convolutional neutral network (CNN). We also propose a method to synthesize large-scale training samples in profile views to solve the third problem of data labelling. Experiments on the challenging AFLW database show that our approach achieves significant improvements over state-of-the-art methods.

computer vision and pattern recognition | 2015

High-fidelity Pose and Expression Normalization for face recognition in the wild

Xiangyu Zhu; Zhen Lei; Junjie Yan; Dong Yi; Stan Z. Li

Pose and expression normalization is a crucial step to recover the canonical view of faces under arbitrary conditions, so as to improve the face recognition performance. An ideal normalization method is desired to be automatic, database independent and high-fidelity, where the face appearance should be preserved with little artifact and information loss. However, most normalization methods fail to satisfy one or more of the goals. In this paper, we propose a High-fidelity Pose and Expression Normalization (HPEN) method with 3D Morphable Model (3DMM) which can automatically generate a natural face image in frontal pose and neutral expression. Specifically, we firstly make a landmark marching assumption to describe the non-correspondence between 2D and 3D landmarks caused by pose variations and propose a pose adaptive 3DMM fitting algorithm. Secondly, we mesh the whole image into a 3D object and eliminate the pose and expression variations using an identity preserving 3D transformation. Finally, we propose an inpainting method based on Possion Editing to fill the invisible region caused by self occlusion. Extensive experiments on Multi-PIE and LFW demonstrate that the proposed method significantly improves face recognition performance and outperforms state-of-the-art methods in both constrained and unconstrained environments.

european conference on computer vision | 2016

Embedding Deep Metric for Person Re-identification: A Study Against Large Variations

Hailin Shi; Yang Yang; Xiangyu Zhu; Shengcai Liao; Zhen Lei; Wei-Shi Zheng; Stan Z. Li

Person re-identification is challenging due to the large variations of pose, illumination, occlusion and camera view. Owing to these variations, the pedestrian data is distributed as highly-curved manifolds in the feature space, despite the current convolutional neural networks (CNN)’s capability of feature extraction. However, the distribution is unknown, so it is difficult to use the geodesic distance when comparing two samples. In practice, the current deep embedding methods use the Euclidean distance for the training and test. On the other hand, the manifold learning methods suggest to use the Euclidean distance in the local range, combining with the graphical relationship between samples, for approximating the geodesic distance. From this point of view, selecting suitable positive (i.e. intra-class) training samples within a local range is critical for training the CNN embedding, especially when the data has large intra-class variations. In this paper, we propose a novel moderate positive sample mining method to train robust CNN for person re-identification, dealing with the problem of large variation. In addition, we improve the learning by a metric weight constraint, so that the learned metric has a better generalization ability. Experiments show that these two strategies are effective in learning robust deep metrics for person re-identification, and accordingly our deep model significantly outperforms the state-of-the-art methods on several benchmarks of person re-identification. Therefore, the study presented in this paper may be useful in inspiring new designs of deep models for person re-identification.

computer vision and pattern recognition | 2015

Object detection by labeling superpixels

Junjie Yan; Yinan Yu; Xiangyu Zhu; Zhen Lei; Stan Z. Li

Object detection is often conducted by object proposal generation and classification sequentially. This paper handles object detection in a superpixel oriented manner instead of the proposal oriented. Specially, this paper takes object detection as a multi-label superpixel labeling problem by minimizing an energy function. It uses the data cost term to capture the appearance, smooth cost term to encode the spatial context and label cost term to favor compact detection. The data cost is learned through a convolutional neural network and the parameters in the labeling model are learned through a structural SVM. Compared with proposal generation and classification based methods, the proposed superpixel labeling method can naturally detect objects missed by proposal generation step and capture the global image context to infer the overlapping objects. The proposed method shows its advantage in Pascal VOC and ImageNet. Notably, it performs better than the ImageNet ILSVRC2014 winner GoogLeNet (45.0% V.S. 43.9% in mAP) with much shallower and fewer CNNs.

ieee international conference on automatic face gesture recognition | 2015

Discriminative 3D morphable model fitting

Xiangyu Zhu; Junjie Yan; Dong Yi; Zhen Lei; Stan Z. Li

This paper presents a novel discriminative method for estimating 3D shape from a single image with 3D Morphable Model (3DMM). Until now, most traditional 3DMM fitting methods depend on the analysis-by-synthesis framework which searches for the best parameters by minimizing the difference between the input image and the model appearance. They are highly sensitive to initialization and have to rely on the stochastic optimization to handle local minimum problem, which is usually a time-consuming process. To solve the problem, we find a different direction to estimate the shape parameters through learning a regressor instead of minimizing the appearance difference. Compared with the traditional analysis-by-synthesis framework, the new discriminative approach makes it possible to utilize large databases to train a robust fitting model which can reconstruct shape from image features accurately and efficiently. We compare our method with two popular 3DMM fitting algorithms on FRGC database. Experimental results show that our approach significantly outperforms the state-of-the-art in terms of efficiency, robustness and accuracy.

ieee international conference on automatic face gesture recognition | 2017

Multi-modality Network with Visual and Geometrical Information for Micro Emotion Recognition

Jianzhu Guo; Jinlin Wu; Jun Wan; Xiangyu Zhu; Zhen Lei; Stan Z. Li

Micro emotion recognition is a very challenging problem because of the subtle appearance variants among different facial expression classes. To deal with the mentioned problem, we proposed a multi-modality convolutional neural networks (CNNs) based on visual and geometrical information in this paper. The visual face image and structured geometry are embedded into a unified network and the recognition accuracy can be benefic from the fused information. The proposed network includes two branches. The first branch is used to extract visual feature from color face images, and another branch is used to extract the geometry feature from 68 facial landmarks. Then, both visual and geometry features are concatenated into a long vector. Finally, the concatenated vector is fed to the hinge loss layer. Compared with the CNN architecture only used face images, our method is more effective and has got better performance. In the final testing phase of Micro Emotion Challenge1, our method has got the first place with the misclassification of 80.212137.

IEEE Signal Processing Letters | 2017

Cross-Modality Face Recognition via Heterogeneous Joint Bayesian

Hailin Shi; Xiaobo Wang; Dong Yi; Zhen Lei; Xiangyu Zhu; Stan Z. Li

In many face recognition applications, the modalities of face images between the gallery and probe sets are different, which is known as heterogeneous face recognition. How to reduce the feature gap between images from different modalities is a critical issue to develop a highly accurate face recognition algorithm. Recently, joint Bayesian (JB) has demonstrated superior performance on general face recognition compared to traditional discriminant analysis methods like subspace learning. However, the original JB treats the two input samples equally and does not take into account the modality difference between them and may be suboptimal to address the heterogeneous face recognition problem. In this work, we extend the original JB by modeling the gallery and probe images using two different Gaussian distributions to propose a heterogeneous joint Bayesian (HJB) formulation for cross-modality face recognition. The proposed HJB explicitly models the modality difference of image pairs and, therefore, is able to better discriminate the same/different face pairs accurately. Extensive experiments conducted in the case of visible–near-infrared and ID photo versus spot face recognition problems show the superiority of the HJB over previous methods.

chinese conference on biometric recognition | 2017

Detecting Face with Densely Connected Face Proposal Network

Shifeng Zhang; Xiangyu Zhu; Zhen Lei; Hailin Shi; Xiaobo Wang; Stan Z. Li

Accuracy and efficiency are two conflicting challenges for face detection, since effective models tend to be computationally prohibitive. To address these two conflicting challenges, our core idea is to shrink the input image and focus on detecting small faces. Specifically, we propose a novel face detector, dubbed the name Densely Connected Face Proposal Network (DCFPN), with high performance as well as real-time speed on the CPU devices. On the one hand, we subtly design a lightweight-but-powerful fully convolutional network with the consideration of efficiency and accuracy. On the other hand, we use the dense anchor strategy and propose a fair L1 loss function to handle small faces well. As a consequence, our method can detect faces at 30 FPS on a single 2.60 GHz CPU core and 250 FPS using a GPU for the VGA-resolution images. We achieve state-of-the-art performance on the AFW, PASCAL face and FDDB datasets.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2017

Face Alignment In Full Pose Range: A 3D Total Solution

Xiangyu Zhu; Xiaoming Liu; Zhen Lei; Stan Z. Li

Face alignment, which fits a face model to an image and extracts the semantic meanings of facial pixels, has been an important topic in the computer vision community. However, most algorithms are designed for faces in small to medium poses (yaw angle is smaller than 45 degree), which lack the ability to align faces in large poses up to 90 degree. The challenges are three-fold. First, the commonly used landmark face model assumes that all the landmarks are visible and is therefore not suitable for large poses. Second, the face appearance varies more drastically across large poses, from the frontal view to the profile view. Third, labelling landmarks in large poses is extremely challenging since the invisible landmarks have to be guessed. In this paper, we propose to tackle these three challenges in an new alignment framework termed 3D Dense Face Alignment (3DDFA), in which a dense 3D Morphable Model (3DMM) is fitted to the image via Cascaded Convolutional Neural Networks. We also utilize 3D information to synthesize face images in profile views to provide abundant samples for training. Experiments on the challenging AFLW database show that the proposed approach achieves significant improvements over the state-of-the-art methods.

Explore More