Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhiwu Huang is active.

Publication


Featured researches published by Zhiwu Huang.


international conference on multimodal interfaces | 2014

Combining Multiple Kernel Methods on Riemannian Manifold for Emotion Recognition in the Wild

Mengyi Liu; Ruiping Wang; Shaoxin Li; Shiguang Shan; Zhiwu Huang; Xilin Chen

In this paper, we present the method for our submission to the Emotion Recognition in the Wild Challenge (EmotiW 2014). The challenge is to automatically classify the emotions acted by human subjects in video clips under real-world environment. In our method, each video clip can be represented by three types of image set models (i.e. linear subspace, covariance matrix, and Gaussian distribution) respectively, which can all be viewed as points residing on some Riemannian manifolds. Then different Riemannian kernels are employed on these set models correspondingly for similarity/distance measurement. For classification, three types of classifiers, i.e. kernel SVM, logistic regression, and partial least squares, are investigated for comparisons. Finally, an optimal fusion of classifiers learned from different kernels and different modalities (video and audio) is conducted at the decision level for further boosting the performance. We perform an extensive evaluation on the challenge data (including validation set and blind test set), and evaluate the effects of different strategies in our pipeline. The final recognition accuracy achieved 50.4% on test set, with a significant gain of 16.7% above the challenge baseline 33.7%.


computer vision and pattern recognition | 2015

Projection Metric Learning on Grassmann Manifold with Application to Video based Face Recognition

Zhiwu Huang; Ruiping Wang; Shiguang Shan; Xilin Chen

In video based face recognition, great success has been made by representing videos as linear subspaces, which typically lie in a special type of non-Euclidean space known as Grassmann manifold. To leverage the kernel-based methods developed for Euclidean space, several recent methods have been proposed to embed the Grassmann manifold into a high dimensional Hilbert space by exploiting the well established Project Metric, which can approximate the Riemannian geometry of Grassmann manifold. Nevertheless, they inevitably introduce the drawbacks from traditional kernel-based methods such as implicit map and high computational cost to the Grassmann manifold. To overcome such limitations, we propose a novel method to learn the Projection Metric directly on Grassmann manifold rather than in Hilbert space. From the perspective of manifold learning, our method can be regarded as performing a geometry-aware dimensionality reduction from the original Grassmann manifold to a lower-dimensional, more discriminative Grassmann manifold where more favorable classification can be achieved. Experiments on several real-world video face datasets demonstrate that the proposed method yields competitive performance compared with the state-of-the-art algorithms.


computer vision and pattern recognition | 2015

Discriminant analysis on Riemannian manifold of Gaussian distributions for face recognition with image sets

Wen Wang; Ruiping Wang; Zhiwu Huang; Shiguang Shan; Xilin Chen

This paper presents a method named Discriminant Analysis on Riemannian manifold of Gaussian distributions (DARG) to solve the problem of face recognition with image sets. Our goal is to capture the underlying data distribution in each set and thus facilitate more robust classification. To this end, we represent image set as Gaussian Mixture Model (GMM) comprising a number of Gaussian components with prior probabilities and seek to discriminate Gaussian components from different classes. In the light of information geometry, the Gaussians lie on a specific Riemannian manifold. To encode such Riemannian geometry properly, we investigate several distances between Gaussians and further derive a series of provably positive definite probabilistic kernels. Through these kernels, a weighted Kernel Discriminant Analysis is finally devised which treats the Gaussians in GMMs as samples and their prior probabilities as sample weights. The proposed method is evaluated by face identification and verification tasks on four most challenging and largest databases, YouTube Celebrities, COX, YouTube Face DB and Point-and-Shoot Challenge, to demonstrate its superiority over the state-of-the-art.


computer vision and pattern recognition | 2014

Learning Euclidean-to-Riemannian Metric for Point-to-Set Classification

Zhiwu Huang; Ruiping Wang; Shiguang Shan; Xilin Chen

In this paper, we focus on the problem of point-to-set classification, where single points are matched against sets of correlated points. Since the points commonly lie in Euclidean space while the sets are typically modeled as elements on Riemannian manifold, they can be treated as Euclidean points and Riemannian points respectively. To learn a metric between the heterogeneous points, we propose a novel Euclidean-to-Riemannian metric learning framework. Specifically, by exploiting typical Riemannian metrics, the Riemannian manifold is first embedded into a high dimensional Hilbert space to reduce the gaps between the heterogeneous spaces and meanwhile respect the Riemannian geometry of the manifold. The final distance metric is then learned by pursuing multiple transformations from the Hilbert space and the original Euclidean space (or its corresponding Hilbert space) to a common Euclidean subspace, where classical Euclidean distances of transformed heterogeneous points can be measured. Extensive experiments clearly demonstrate the superiority of our proposed approach over the state-of-the-art methods.


international conference on multimodal interfaces | 2013

Partial least squares regression on grassmannian manifold for emotion recognition

Mengyi Liu; Ruiping Wang; Zhiwu Huang; Shiguang Shan; Xilin Chen

In this paper, we propose a method for video-based human emotion recognition. For each video clip, all frames are represented as an image set, which can be modeled as a linear subspace to be embedded in Grassmannian manifold. After feature extraction, Class-specific One-to-Rest Partial Least Squares (PLS) is learned on video and audio data respectively to distinguish each class from the other confusing ones. Finally, an optimal fusion of classifiers learned from both modalities (video and audio) is conducted at decision level. Our method is evaluated on the Emotion Recognition In The Wild Challenge (EmotiW 2013). The experimental results on both validation set and blind test set are presented for comparison. The final accuracy achieved on test set outperforms the baseline by 26%.


ieee international conference on automatic face gesture recognition | 2015

Report on the FG 2015 Video Person Recognition Evaluation

J. Ross Beveridge; Hao Zhang; Bruce A. Draper; Patrick J. Flynn; Zhen-Hua Feng; Patrik Huber; Josef Kittler; Zhiwu Huang; Shaoxin Li; Yan Li; Meina Kan; Ruiping Wang; Shiguang Shan; Xilin Chen; Haoxiang Li; Gang Hua; Vitomir Struc; Janez Krizaj; Changxing Ding; Dacheng Tao; P. Jonathon Phillips

This report presents results from the Video Person Recognition Evaluation held in conjunction with the 11th IEEE International Conference on Automatic Face and Gesture Recognition. Two experiments required algorithms to recognize people in videos from the Point-and-Shoot Face Recognition Challenge Problem (PaSC). The first consisted of videos from a tripod mounted high quality video camera. The second contained videos acquired from 5 different handheld video cameras. There were 1401 videos in each experiment of 265 subjects. The subjects, the scenes, and the actions carried out by the people are the same in both experiments. Five groups from around the world participated in the evaluation. The video handheld experiment was included in the International Joint Conference on Biometrics (IJCB) 2014 Handheld Video Face and Person Recognition Competition. The top verification rate from this evaluation is double that of the top performer in the IJCB competition. Analysis shows that the factor most effecting algorithm performance is the combination of location and action: where the video was acquired and what the person was doing.


IEEE Transactions on Image Processing | 2015

A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database

Zhiwu Huang; Shiguang Shan; Ruiping Wang; Haihong Zhang; Shihong Lao; Alifu Kuerban; Xilin Chen

Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Videoto-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX1 Face DB. Specifically, we make three contributions. First, we collect and release a largescale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation.


Pattern Recognition | 2015

Face recognition on large-scale video in the wild with hybrid Euclidean-and-Riemannian metric learning

Zhiwu Huang; Ruiping Wang; Shiguang Shan; Xilin Chen

Face recognition on large-scale video in the wild is becoming increasingly important due to the ubiquity of video data captured by surveillance cameras, handheld devices, Internet uploads, and other sources. By treating each video as one image set, set-based methods recently have made great success in the field of video-based face recognition. In the wild world, videos often contain extremely complex data variations and thus pose a big challenge of set modeling for set-based methods. In this paper, we propose a novel Hybrid Euclidean-and-Riemannian Metric Learning (HERML) method to fuse multiple statistics of image set. Specifically, we represent each image set simultaneously by mean, covariance matrix and Gaussian distribution, which generally complement each other in the aspect of set modeling. However, it is not trivial to fuse them since mean, covariance matrix and Gaussian model typically lie in multiple heterogeneous spaces equipped with Euclidean or Riemannian metric. Therefore, we first implicitly map the original statistics into high dimensional Hilbert spaces by exploiting Euclidean and Riemannian kernels. With a LogDet divergence based objective function, the hybrid kernels are then fused by our hybrid metric learning framework, which can efficiently perform the fusing procedure on large-scale videos. The proposed method is evaluated on four public and challenging large-scale video face datasets. Extensive experimental results demonstrate that our method has a clear superiority over the state-of-the-art set-based methods for large-scale video-based face recognition. HighlightsRepresent image set by mean, covariance and Gaussian for discriminant information.Heterogeneous Euclidean and Riemannian kernels are exploited and fused clearly.Clear superiority over state-of-the-art set-based methods is achieved in testing.


computer vision and pattern recognition | 2015

Face video retrieval with image query via hashing across Euclidean space and Riemannian manifold

Yan Li; Ruiping Wang; Zhiwu Huang; Shiguang Shan; Xilin Chen

Retrieving videos of a specific person given his/her face image as query becomes more and more appealing for applications like smart movie fast-forwards and suspect searching. It also forms an interesting but challenging computer vision task, as the visual data to match, i.e., still image and video clip are usually represented quite differently. Typically, face image is represented as point (i.e., vector) in Euclidean space, while video clip is seemingly modeled as a point (e.g., covariance matrix) on some particular Riemannian manifold in the light of its recent promising success. It thus incurs a new hashing-based retrieval problem of matching two heterogeneous representations, respectively in Euclidean space and Riemannian manifold. This work makes the first attempt to embed the two heterogeneous spaces into a common discriminant Hamming space. Specifically, we propose Hashing across Euclidean space and Riemannian manifold (HER) by deriving a unified framework to firstly embed the two spaces into corresponding reproducing kernel Hilbert spaces, and then iteratively optimize the intra- and inter-space Hamming distances in a max-margin framework to learn the hash functions for the two spaces. Extensive experiments demonstrate the impressive superiority of our method over the state-of-the-art competitive hash learning methods.


asian conference on computer vision | 2012

Benchmarking still-to-video face recognition via partial and local linear discriminant analysis on COX-S2V dataset

Zhiwu Huang; Shiguang Shan; Haihong Zhang; Shihong Lao; Alifu Kuerban; Xilin Chen

In this paper, we explore the real-world Still-to-Video (S2V) face recognition scenario, where only very few (single, in many cases) still images per person are enrolled into the gallery while it is usually possible to capture one or multiple video clips as probe. Typical application of S2V is mug-shot based watch list screening. Generally, in this scenario, the still image(s) were collected under controlled environment, thus of high quality and resolution, in frontal view, with normal lighting and neutral expression. On the contrary, the testing video frames are of low resolution and low quality, possibly with blur, and captured under poor lighting, in non-frontal view. We reveal that the S2V face recognition has been heavily overlooked in the past. Therefore, we provide a benchmarking in terms of both a large scale dataset and a new solution to the problem. Specifically, we collect (and release) a new dataset named COX-S2V, which contains 1,000 subjects, with each subject a high quality photo and four video clips captured simulating video surveillance scenario. Together with the database, a clear evaluation protocol is designed for benchmarking. In addition, in addressing this problem, we further propose a novel method named Partial and Local Linear Discriminant Analysis (PaLo-LDA). We then evaluated the method on COX-S2V and compared with several classic methods including LDA, LPP, ScSR. Evaluation results not only show the grand challenges of the COX-S2V, but also validate the effectiveness of the proposed PaLo-LDA method over the competitive methods.

Collaboration


Dive into the Zhiwu Huang's collaboration.

Top Co-Authors

Avatar

Shiguang Shan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xilin Chen

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Ruiping Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mengyi Liu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Shaoxin Li

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge