Is this you? Create Your Porfile

Zibo Meng

University of South Carolina

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zibo Meng is active.

Explore More

Publication

Featured researches published by Zibo Meng.

computer vision and pattern recognition | 2014

Facial Expression Recognition via a Boosted Deep Belief Network

Ping Liu; Shizhong Han; Zibo Meng; Yan Tong

A training process for facial expression recognition is usually performed sequentially in three individual stages: feature learning, feature selection, and classifier construction. Extensive empirical studies are needed to search for an optimal combination of feature representation, feature set, and classifier to achieve good recognition performance. This paper presents a novel Boosted Deep Belief Network (BDBN) for performing the three training stages iteratively in a unified loopy framework. Through the proposed BDBN framework, a set of features, which is effective to characterize expression-related facial appearance/shape changes, can be learned and selected to form a boosted strong classifier in a statistical way. As learning continues, the strong classifier is improved iteratively and more importantly, the discriminative capabilities of selected features are strengthened as well according to their relative importance to the strong classifier via a joint fine-tune process in the BDBN framework. Extensive experiments on two public databases showed that the BDBN framework yielded dramatic improvements in facial expression analysis.

european conference on computer vision | 2014

Feature Disentangling Machine - A Novel Approach of Feature Selection and Disentangling in Facial Expression Analysis

Ping Liu; Joey Tianyi Zhou; Ivor W. Tsang; Zibo Meng; Shizhong Han; Yan Tong

Studies in psychology show that not all facial regions are of importance in recognizing facial expressions and different facial regions make different contributions in various facial expressions. Motivated by this, a novel framework, named Feature Disentangling Machine (FDM), is proposed to effectively select active features characterizing facial expressions. More importantly, the FDM aims to disentangle these selected features into non-overlapped groups, in particular, common features that are shared across different expressions and expression-specific features that are discriminative only for a target expression. Specifically, the FDM integrates sparse support vector machine and multi-task learning in a unified framework, where a novel loss function and a set of constraints are formulated to precisely control the sparsity and naturally disentangle active features. Extensive experiments on two well-known facial expression databases have demonstrated that the FDM outperforms the state-of-the-art methods for facial expression analysis. More importantly, the FDM achieves an impressive performance in a cross-database validation, which demonstrates the generalization capability of the selected features.

ieee international conference on automatic face gesture recognition | 2017

Identity-Aware Convolutional Neural Network for Facial Expression Recognition

Zibo Meng; Ping Liu; Jie Cai; Shizhong Han; Yan Tong

Facial expression recognition suffers under realworldconditions, especially on unseen subjects due to highinter-subject variations. To alleviate variations introduced bypersonal attributes and achieve better facial expression recognitionperformance, a novel identity-aware convolutional neuralnetwork (IACNN) is proposed. In particular, a CNN with a newarchitecture is employed as individual streams of a bi-streamidentity-aware network. An expression-sensitive contrastive lossis developed to measure the expression similarity to ensure thefeatures learned by the network are invariant to expressionvariations. More importantly, an identity-sensitive contrastiveloss is proposed to learn identity-related information from identitylabels to achieve identity-invariant expression recognition.Extensive experiments on three public databases including aspontaneous facial expression database have shown that theproposed IACNN achieves promising results in real world.

european conference on computer vision | 2014

Video-Based Action Detection Using Multiple Wearable Cameras

Kang Zheng; Yuewei Lin; Youjie Zhou; Dhaval Salvi; Xiaochuan Fan; Dazhou Guo; Zibo Meng; Song Wang

This paper is focused on developing a new approach for video-based action detection where a set of temporally synchronized videos are taken by multiple wearable cameras from different and varying views and our goal is to accurately localize the starting and ending time of each instance of the actions of interest in such videos. Compared with traditional approaches based on fixed-camera videos, this new approach incorporates the visual attention of the camera wearers and allows for the action detection in a larger area, although it brings in new challenges such as unconstrained motion of cameras. In this approach, we leverage the multi-view information and the temporal synchronization of the input videos for more reliable action detection. Specifically, we detect and track the focal character in each video and conduct action recognition only for the focal character in each temporal sliding window. To more accurately localize the starting and ending time of actions, we develop a strategy that may merge temporally adjacent sliding windows when detecting durative actions, and non-maximally suppress temporally adjacent sliding windows when detecting momentary actions. Finally we propose a voting scheme to integrate the detection results from multiple videos for more accurate action detection. For the experiments, we collect a new dataset of multiple wearable-camera videos that reflect the complex scenarios in practice.

international conference on image processing | 2014

Facial grid transformation: A novel face registration approach for improving facial action unit recognition

Shizhong Han; Zibo Meng; Ping Liu; Yan Tong

Face registration is a major and critical step for face analysis. Existing facial activity recognition systems often employ coarse face alignment based on a few fiducial points such as eyes and extract features from equal-sized grid. Such extracted features are susceptible to variations in face pose, facial deformation, and person-specific geometry. In this work, we propose a novel face registration method named facial grid transformation to improve feature extraction for recognizing facial Action Units (AUs). Based on the transformed grid, novel grid edge features are developed to capture local facial motions related to AUs. Extensive experiments on two well-known AU-coded databases have demonstrated that the proposed method yields significant improvements over the methods based on equal-sized grid on both posed and more importantly, spontaneous facial displays. Furthermore, the proposed method also outperforms the state-of-the-art methods using either coarse alignment or mesh-based face registration.

information reuse and integration | 2017

Detecting Small Signs from Large Images

Zibo Meng; Xiaochuan Fan; Xin Chen; Min Chen; Yan Tong

In the past decade, Convolutional Neural Networks (CNNs) have been demonstrated successful for object detections. However, the size of network input is limited by the amount of memory available on GPUs. Moreover, performance degrades when detecting small objects. To alleviate the memory usage and improve the performance of detecting small traffic signs, we proposed an approach for detecting small traffic signs from large images under real world conditions. In particular, large images are broken into small patches as input to a Small-Object-Sensitive-CNN (SOS-CNN) modified from a Single Shot Multibox Detector (SSD) framework with a VGG-16 network as the base network to produce patch-level object detection results. Scale invariance is achieved by applying the SOS-CNN on an image pyramid. Then, image-level object detection is obtained by projecting all the patch-level detection results to the image at the original scale. Experimental results on a real-world conditioned traffic sign dataset have demonstrated the effectiveness of the proposed method in terms of detection accuracy and recall, especially for those with small sizes.

ICMI '18 Proceedings of the 20th ACM International Conference on Multimodal Interaction | 2018

Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network

Ahmed Shehab Khan; Zhiyuan Li; Jie Cai; Zibo Meng; James O'Reilly; Yan Tong

Group-level Emotion Recognition (GER) in the wild is a challenging task gaining lots of attention. Most recent works utilized two channels of information, a channel involving only faces and a channel containing the whole image, to solve this problem. However, modeling the relationship between faces and scene in a global image remains challenging. In this paper, we proposed a novel face-location aware global network, capturing the face location information in the form of an attention heatmap to better model such relationships. We also proposed a multi-scale face network to infer the group-level emotion from individual faces, which explicitly handles high variance in image and face size, as images in the wild are collected from different sources with different resolutions. In addition, a global blurred stream was developed to explicitly learn and extract the scene-only features. Finally, we proposed a four-stream hybrid network, consisting of the face-location aware global stream, the multi-scale face stream, a global blurred stream, and a global stream, to address the GER task, and showed the effectiveness of our method in GER sub-challenge, a part of the six Emotion Recognition in the Wild (EmotiW 2018) [10] Challenge. The proposed method achieved 65.59% and 78.39% accuracy on the testing and validation sets, respectively, and is ranked the third place on the leaderboard.

IEEE Transactions on Affective Computing | 2017

Listen to Your Face: Inferring Facial Action Units from Audio Channel

Zibo Meng; Shizhong Han; Yan Tong

Extensive efforts have been devoted to recognizing facial action units (AUs). However, it is still challenging to recognize AUs from spontaneous facial displays especially when they are accompanied by speech. Different from all prior work that utilized visual observations for facial AU recognition, this paper presents a novel approach that recognizes speech-related AUs exclusively from audio signals based on the fact that facial activities are highly correlated with voice during speech. Specifically, dynamic and physiological relationships between AUs and phonemes are modeled through a continuous time Bayesian network (CTBN); then AU recognition is performed by probabilistic inference via the CTBN model. A pilot audiovisual AU-coded database has been constructed to evaluate the proposed audio-based AU recognition framework. The database consists of a “clean” subset with frontal and neutral faces and a challenging subset collected with large head movements and occlusions. Experimental results on this database show that the proposed CTBN model achieves promising recognition performance for 7 speech-related AUs and outperforms both the state-of-the-art visual-based and audio-based methods especially for those AUs that are activated at low intensities or “hardly visible” in the visual channel. The improvement is more impressive on the challenging subset, where the visual-based approaches suffer significantly.

International Journal of Multimedia Data Engineering and Management | 2016

Audiovisual Facial Action Unit Recognition using Feature Level Fusion

Min Chen; Zibo Meng; Shizhong Han; Yan Tong

Recognizing facial actions is challenging, especially when they are accompanied with speech. Instead of employing information solely from the visual channel, this work aims to exploit information from both visual and audio channels in recognizing speech-related facial action units AUs. In this work, two feature-level fusion methods are proposed. The first method is based on a kind of human-crafted visual feature. The other method utilizes visual features learned by a deep convolutional neural network CNN. For both methods, features are independently extracted from visual and audio channels and aligned to handle the difference in time scales and the time shift between the two signals. These temporally aligned features are integrated via feature-level fusion for AU recognition. Experimental results on a new audiovisual AU-coded dataset have demonstrated that both fusion methods outperform their visual counterparts in recognizing speech-related AUs. The improvement is more impressive with occlusions on the facial images, which would not affect the audio channel.

neural information processing systems | 2016