Is this you? Create Your Porfile

Xi Zhou

Chinese Academy of Sciences

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xi Zhou is active.

Explore More

Publication

Featured researches published by Xi Zhou.

european conference on computer vision | 2010

Image classification using super-vector coding of local image descriptors

Xi Zhou; Kai Yu; Tong Zhang; Thomas S. Huang

This paper introduces a new framework for image classification using local visual descriptors. The pipeline first performs a non-linear feature transformation on descriptors, then aggregates the results together to form image-level representations, and finally applies a classification model. For all the three steps we suggest novel solutions which make our approach appealing in theory, more scalable in computation, and transparent in classification. Our experiments demonstrate that the proposed classification method achieves state-of-the-art accuracy on the well-known PASCAL benchmarks.

international conference on computer vision | 2009

Hierarchical Gaussianization for image classification

Xi Zhou; Na Cui; Zhen Li; Feng Liang; Thomas S. Huang

In this paper, we propose a new image representation to capture both the appearance and spatial information for image classification applications. First, we model the feature vectors, from the whole corpus, from each image and at each individual patch, in a Bayesian hierarchical framework using mixtures of Gaussians. After such a hierarchical Gaussianization, each image is represented by a Gaussian mixture model (GMM) for its appearance, and several Gaussian maps for its spatial layout. Then we extract the appearance information from the GMM parameters, and the spatial information from global and local statistics over Gaussian maps. Finally, we employ a supervised dimension reduction technique called DAP (discriminant attribute projection) to remove noise directions and to further enhance the discriminating power of our representation. We justify that the traditional histogram representation and the spatial pyramid matching are special cases of our hierarchical Gaussianization. We compare our new representation with other approaches in scene classification, object recognition and face recognition, and our performance ranks among the top in all three tasks.

computer vision and pattern recognition | 2008

Regression from patch-kernel

Shuicheng Yan; Xi Zhou; Ming Liu; Mark Hasegawa-Johnson; Thomas S. Huang

In this paper, we present a patch-based regression framework for addressing the human age and head pose estimation problems. Firstly, each image is encoded as an ensemble of orderless coordinate patches, the global distribution of which is described by Gaussian mixture models (GMM), and then each image is further expressed as a specific distribution model by Maximum a Posteriori adaptation from the global GMM. Then the patch-kernel is designed for characterizing the Kullback-Leibler divergence between the derived models for any two images, and its discriminating power is further enhanced by a weak learning process, called inter-modality similarity synchronization. Finally, kernel regression is employed for ultimate human age or head pose estimation. These three stages are complementary to each other, and jointly minimize the regression error. The effectiveness of this regression framework is validated by three experiments: 1) on the YAMAHA aging database, our solution brings a more than 50% reduction in age estimation error compared with the best reported results; 2) on the FG-NET aging database, our solution based on raw image features performs even better than the state-of-the-art algorithms which require fine face alignment for extracting warped appearance features; and 3) on the CHIL head pose database, our solution significantly outperforms the best one reported in the CLEAR07 evaluation.

Pattern Recognition Letters | 2010

Real-world acoustic event detection

Xiaodan Zhuang; Xi Zhou; Mark Hasegawa-Johnson; Thomas S. Huang

Acoustic Event Detection (AED) aims to identify both timestamps and types of events in an audio stream. This becomes very challenging when going beyond restricted highlight events and well controlled recordings. We propose extracting discriminative features for AED using a boosting approach, which outperform classical speech perceptual features, such as Mel-frequency Cepstral Coefficients and log frequency filterbank parameters. We propose leveraging statistical models better fitting the task. First, a tandem connectionist-HMM approach combines the sequence modeling capabilities of the HMM with the high-accuracy context-dependent discriminative capabilities of an artificial neural network trained using the minimum cross entropy criterion. Second, an SVM-GMM-supervector approach uses noise-adaptive kernels better approximating the KL divergence between feature distributions in different audio segments. Experiments on the CLEAR 2007 AED Evaluation set-up demonstrate that the presented features and models lead to over 45% relative performance improvement, and also outperform the best system in the CLEAR AED Evaluation, on detection of twelve general acoustic events in a real seminar environment.

ieee international conference on automatic face & gesture recognition | 2008

Multi-view facial expression recognition

Yuxiao Hu; Zhihong Zeng; Lijun Yin; Xiaozhou Wei; Xi Zhou; Thomas S. Huang

The ability to handle multi-view facial expressions is important for computers to understand affective behavior under less constrained environment. However, most of existing methods for facial expression recognition are based on the near-frontal view face data, which are likely to fail in the non-frontal facial expression analysis. In this paper, we conduct an investigation on analyzing multi-view facial expressions. Three local patch descriptors (HoG, LBP, and SIFT) are used to extract facial features, which are the inputs to a nearest-neighbor indexing method that identifies facial expressions. We also investigate the influence of feature dimension reductions (PCA, LDA, and LPP) and classifier fusion on the recognition performance. We test our approaches on multi-view data generated from BU-3DFE 3D facial expression database that includes 100 subjects with 6 emotions and 4 intensity levels. Our extensive person-independent experiments suggest that the SIFT descriptor outperforms HoG and LBP, and LPP outperforms PCA and LDA in this application. But the classifier fusion does not show a significant advantage over SIFT-only classifier.

international conference on acoustics, speech, and signal processing | 2008

Feature analysis and selection for acoustic event detection

Xiaodan Zhuang; Xi Zhou; Thomas S. Huang; Mark Hasegawa-Johnson

Speech perceptual features, such as Mel-frequency Cepstral Coefficients (MFCC), have been widely used in acoustic event detection. However, the different spectral structures between speech and acoustic events degrade the performance of the speech feature sets. We propose quantifying the discriminative capability of each feature component according to the approximated Bayesian accuracy and deriving a discriminative feature set for acoustic event detection. Compared to MFCC, feature sets derived using the proposed approaches achieve about 30% relative accuracy improvement in acoustic event detection.

Multimodal Technologies for Perception of Humans | 2008

HMM-Based Acoustic Event Detection with AdaBoost Feature Selection

Xi Zhou; Xiaodan Zhuang; Ming Liu; Hao Tang; Mark Hasegawa-Johnson; Thomas S. Huang

Because of the spectral difference between speech and acous- tic events, we propose using Kullback-Leibler distance to quantify the discriminant capability of all speech feature components in acoustic event detection. Based on these distances, we use AdaBoost to select a discriminant feature set and demonstrate that this feature set outperforms classical speech feature set such as MFCC in one-pass HMM-based acoustic event detection. We implement an HMM-based acoustic events detection system with lattice rescoring using a feature set selected by the above AdaBoost based approach.

international conference on pattern recognition | 2008

Face age estimation using patch-based hidden Markov model supervectors

Xiaodan Zhuang; Xi Zhou; Mark Hasegawa-Johnson; Thomas S. Huang

Recent studies in patch-based Gaussian Mixture Model (GMM) approaches for face age estimation present promising results. We propose using a hidden Markov model (HMM) supervector to represent face image patches, to improve from the previous GMM supervector approach by capturing the spatial structure of human faces and loosening the assumption of identical face patch distribution within a face image. The Euclidean distance of HMM supervectors constructed from two face images measures the similarity of the human faces, derived from the approximated Kullback-Leibler divergence between the joint distributions of patches with implicit unsupervised alignment of different regions in two human faces. The proposed HMM supervector approach compares favorably with the GMM supervector approach in face age estimation on a large face dataset.

ACM Transactions on Multimedia Computing, Communications, and Applications | 2014

“Wow! You Are So Beautiful Today!”

Luoqi Liu; Junliang Xing; Si Liu; Hui Xu; Xi Zhou; Shuicheng Yan

Beauty e-Experts, a fully automatic system for makeover recommendation and synthesis, is developed in this work. The makeover recommendation and synthesis system simultaneously considers many kinds of makeover items on hairstyle and makeup. Given a user-provided frontal face image with short/bound hair and no/light makeup, the Beauty e-Experts system not only recommends the most suitable hairdo and makeup, but also synthesizes the virtual hairdo and makeup effects. To acquire enough knowledge for beauty modeling, we built the Beauty e-Experts Database, which contains 1,505 female photos with a variety of attributes annotated with different discrete values. We organize these attributes into two different categories, beauty attributes and beauty-related attributes. Beauty attributes refer to those values that are changeable during the makeover process and thus need to be recommended by the system. Beauty-related attributes are those values that cannot be changed during the makeup process but can help the system to perform recommendation. Based on this Beauty e-Experts Dataset, two problems are addressed for the Beauty e-Experts system: what to recommend and how to wear it, which describes a similar process of selecting hairstyle and cosmetics in daily life. For the what-to-recommend problem, we propose a multiple tree-structured supergraph model to explore the complex relationships among high-level beauty attributes, mid-level beauty-related attributes, and low-level image features. Based on this model, the most compatible beauty attributes for a given facial image can be efficiently inferred. For the how-to-wear-it problem, an effective and efficient facial image synthesis module is designed to seamlessly synthesize the recommended makeovers into the user facial image. We have conducted extensive experiments on testing images of various conditions to evaluate and analyze the proposed system. The experimental results well demonstrate the effectiveness and efficiency of the proposed system.

acm multimedia | 2013

Wow! you are so beautiful today!

Luoqi Liu; Hui Xu; Junliang Xing; Si Liu; Xi Zhou; Shuicheng Yan

Beauty e-Experts, a fully automatic system for hairstyle and facial makeup recommendation and synthesis, is developed in this work. Given a user-provided frontal face image with short/bound hair and no/light makeup, the Beauty e-Experts system can not only recommend the most suitable hairdo and makeup, but also show the synthetic effects. To obtain enough knowledge for beauty modeling, we build the Beauty e-Experts Database, which contains 1,505 attractive female photos with a variety of beauty attributes and beauty-related attributes annotated. Based on this Beauty e-Experts Dataset, two problems are considered for the Beauty e-Experts system: what to recommend and how to wear, which describe a similar process of selecting hairstyle and cosmetics in our daily life. For the what-to-recommend problem, we propose a multiple tree-structured super-graphs model to explore the complex relationships among the high-level beauty attributes, mid-level beauty-related attributes and low-level image features, and then based on this model, the most compatible beauty attributes for a given facial image can be efficiently inferred. For the how-to-wear problem, an effective and efficient facial image synthesis module is designed to seamlessly synthesize the recommended hairstyle and makeup into the user facial image. Extensive experimental evaluations and analysis on testing images of various conditions well demonstrate the effectiveness of the proposed system.Beauty e-Experts, a fully automatic system for hairstyle and facial makeup recommendation and synthesis, is developed in this work. Given a user-provided frontal face image with short/bound hair and no/light makeup, the Beauty e-Experts system can not only recommend the most suitable hairdo and makeup, but also show the synthetic effects. To obtain enough knowledge for beauty modeling, we build the Beauty e-Experts Database, which contains 1,505 attractive female photos with a variety of beauty attributes and beauty-related attributes annotated. Based on this Beauty e-Experts Dataset, two problems are considered for the Beauty e-Experts system: what to recommend and how to wear, which describe a similar process of selecting hairstyle and cosmetics in our daily life. For the what-to-recommend problem, we propose a multiple tree-structured super-graphs model to explore the complex relationships among the high-level beauty attributes, mid-level beauty-related attributes and low-level image features, and then based on this model, the most compatible beauty attributes for a given facial image can be efficiently inferred. For the how-to-wear problem, an effective and efficient facial image synthesis module is designed to seamlessly synthesize the recommended hairstyle and makeup into the user facial image. Extensive experimental evaluations and analysis on testing images of various conditions well demonstrate the effectiveness of the proposed system.

Explore More