Xing Zhang
Binghamton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xing Zhang.
ieee international conference on automatic face gesture recognition | 2013
Xing Zhang; Lijun Yin; Jeffrey F. Cohn; Shaun J. Canavan; Michael Reale; Andy Horowitz; Peng Liu
Facial expression is central to human experience. Its efficient and valid measurement is a challenge that automated facial image analysis seeks to address. Most publically available databases are limited to 2D static images or video of posed facial behavior. Because posed and un-posed (aka “spontaneous”) facial expressions differ along several dimensions including complexity and timing, well-annotated video of un-posed facial behavior is needed. Moreover, because the face is a three-dimensional deformable object, 2D video may be insufficient, and therefore 3D video archives are needed. We present a newly developed 3D video database of spontaneous facial expressions in a diverse group of young adults. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains using both person-specific and generic approaches. The work promotes the exploration of 3D spatiotemporal features in subtle facial expression, better understanding of the relation between pose and motion dynamics in facial action units, and deeper understanding of naturally occurring facial action.
ieee international conference on automatic face gesture recognition | 2013
Michael Reale; Xing Zhang; Lijun Yin
In this paper, we propose a new, compact, 4D spatio-temporal “Nebula” feature to improve expression and facial movement analysis performance. Given a spatio-temporal volume, the data is voxelized and fit to a cubic polynomial. A label is assigned based on the principal curvature values, and the polar angles of the direction of least curvature are computed. The labels and angles for each feature are used to build a histogram for each region of the face. The concatenated histograms from each region give us our final feature vector. This feature description is tested on the posed expression database BU-4DFE and on a new 4D spontaneous expression database. Various region configurations, histogram sizes, and feature parameters are tested, including a non-dynamic version of the approach. The LBP-TOP approach on the texture image as well as on the depth image is also tested for comparison. The onsets of the six canonical expressions are classified for 100 subjects in BU-4DFE, while the onset, offset, and non-existence of 12 Action Units (AUs) are classified for 16 subjects from our new spontaneous database. For posed expression recognition, the Nebula feature approach shows improvement over LBPTOP on the depth images and significant improvement over the non-dynamic 3D-only approach. Moreover, the Nebula feature performs better for AU classification than the compared approaches for 11 of the AUs tested in terms of accuracy as well as Area Under Receiver Operating Characteristic Curve (AUC).
computer vision and pattern recognition | 2016
Zheng Zhang; Jeffrey M. Girard; Yue Wu; Xing Zhang; Peng Liu; Umur A. Ciftci; Shaun J. Canavan; Michael Reale; Andrew Horowitz; Huiyuan Yang; Jeffrey F. Cohn; Qiang Ji; Lijun Yin
Emotion is expressed in multiple modalities, yet most research has considered at most one or two. This stems in part from the lack of large, diverse, well-annotated, multimodal databases with which to develop and test algorithms. We present a well-annotated, multimodal, multidimensional spontaneous emotion corpus of 140 participants. Emotion inductions were highly varied. Data were acquired from a variety of sensors of the face that included high-resolution 3D dynamic imaging, high-resolution 2D video, and thermal (infrared) sensing, and contact physiological sensors that included electrical conductivity of the skin, respiration, blood pressure, and heart rate. Facial expression was annotated for both the occurrence and intensity of facial action units from 2D video by experts in the Facial Action Coding System (FACS). The corpus further includes derived features from 3D, 2D, and IR (infrared) sensors and baseline results for facial expression and action unit detection. The entire corpus will be made available to the research community.
computer vision and pattern recognition | 2012
Shaun J. Canavan; Yi Sun; Xing Zhang; Lijun Yin
This paper presents a novel dynamic curvature based approach (dynamic shape-index based approach) for 3D face analysis. This method is inspired by the idea of 2D dynamic texture and 3D surface descriptors. The dynamic texture (DT) based approaches [30][31][32] encode and model the local texture features in the temporal axis, and have achieved great success in applications of 2D facial expression recognition. In this paper, we propose a so-called Dynamic Curvature (DC) approach for 3D facial activity analysis. To do so, the 3D dynamic surface is described by its surface curvature-based shape-index information. The surface features are characterized in local regions along the temporal axis. A dynamic curvature descriptor is constructed from local regions as well as temporal domains. To locate the local regions, we also applied a 3D tracking model based method for detecting and tracking 3D features across 3D dynamic sequences. Our method is validated through our experiment on 3D facial activity analysis for distinguishing neutral vs. non-neutral expressions, prototypic expressions, and their intensities.
Computer Vision and Image Understanding | 2015
Shaun J. Canavan; Peng Liu; Xing Zhang; Lijun Yin
We propose a method for landmark localization on 3D and 4D range data.A new Shape Index-Based Statistical Shape Model is proposed.Five Benchmark 3D/4D face databases are tested on.The accuracy of the landmarks is compared to ground truth data, and state-of-the-art methods.The efficacy of the landmarks is validated through expression analysis and pose estimation. In this paper we propose a novel method for detecting and tracking facial landmark features on 3D static and 3D dynamic (a.k.a. 4D) range data. Our proposed method involves fitting a shape index-based statistical shape model (SI-SSM) with both global and local constraints to the input range data. Our proposed model makes use of the global shape of the facial data as well as local patches, consisting of shape index values, around landmark features. The shape index is used due to its invariance to both lighting and pose changes. The fitting is performed by finding the correlation between the shape model and the input range data. The performance of our proposed method is evaluated in terms of various geometric data qualities, including data with noise, incompletion, occlusion, rotation, and various facial motions. The accuracy of detected features is compared to the ground truth data as well as to start of the art results. We test our method on five publicly available 3D/4D databases: BU-3DFE, BU-4DFE, BP4D-Spontaneous, FRGC 2.0, and Eurecom Kinect Face Dataset. The efficacy of the detected landmarks is validated through applications for geometric based facial expression classification for both posed and spontaneous expressions, and head pose estimation. The merit of our method is manifested as compared to the state of the art feature tracking methods.
Face and Gesture 2011 | 2011
Yanhui Huang; Xing Zhang; Yangyu Fan; Lijun Yin; Lee M. Seversky; Tao Lei; Weijun Dong
3D face scans have been widely used for face modeling and face analysis. Due to the fact that face scans provide variable point clouds across frames, they may not capture complete facial data or miss point-to-point correspondences across various facial scans, thus causing difficulties to use such data for analysis. This paper presents an efficient approach to represent facial shapes from face scans through the reconstruction of face models based on regional information and a generic model. A hybrid approach using two vertex mapping algorithms, displacement mapping and point-to-surface mapping, and a regional blending algorithm are proposed to reconstruct the facial surface detail. The resulting models can represent individual facial shapes consistently and adaptively, establishing the facial point correspondence across individual models. The accuracy of the generated models is evaluated quantitatively. The applicability of the models is validated through the application for 3D facial expression recognition based on the databases of static 3DFE and dynamic 4DFE. A comparison with the state of the art has also been reported.
international conference on multimedia and expo | 2013
Shaun J. Canavan; Xing Zhang; Lijun Yin
In this paper, we propose a novel method for detecting and tracking landmark facial features on purely geometric 3D and 4D range models. Our proposed method involves fitting a new multi-frame constrained 3D temporal deformable shape model (TDSM) to range data sequences. We consider this a temporal based deformable model as we concatenate consecutive deformable shape models into a single model driven by the appearance of facial expressions. This allows us to simultaneously fit multiple models over a sequence of time with one TDSM. To our knowledge, it is the first work to address multiple shape models as a whole to track 3D dynamic range sequences without assistance of any texture information. The accuracy of the tracking results is evaluated by comparing the detected landmarks to the ground truth. The efficacy of the 3D feature detection and tracking over range model sequences has also been validated through an application in 3D geometric based face and expression analysis and expression sequence segmentation. We tested our method on the publicly available databases, BU-3DFE [15], BU-4DFE [16], and FRGC 2.0 [12]. We also validated our approach on our newly developed 3D dynamic spontaneous expression database [17].
ieee international conference on automatic face gesture recognition | 2015
Xing Zhang; Lijun Yin; Jeffrey F. Cohn
Automatic pain expression recognition is a challenging task for pain assessment and diagnosis. Conventional 2D-based approaches to automatic pain detection lack robustness to the moderate to large head pose variation and changes in illumination that are common in real-world settings and with few exceptions omit potentially informative temporal information. In this paper, we propose an innovative 3D binary edge feature (3D-BE) to represent high-resolution 3D dynamic facial expression. To exploit temporal information, we apply a latent-dynamic conditional random field approach with the 3D-BE. The resulting pain expression detection system proves that 3D-BE represents the pain facial features well, and illustrates the potential of noncontact pain detection from 3D facial expression data.
international symposium on visual computing | 2014
Xing Zhang; Lijun Yin; Daniel Hipp; Peter Gerhardstein
In this paper, we applied a reverse correlation approach to study the features that humans use to categorize facial expressions. The well-known portrait of Mona Lisa was used as the base image to investigate the features differentiating happy and sad expressions. The base image was blended with sinusoidal noise masks to create the stimulus. Observers were required to view each image and categorized it as happy or sad. Analysis of responses using superimposed classification images revealed both locations and identity of information required to represent each certain expression. To further investigate the results, a neural network based classifier was developed to identify the expression of the superimposed images from the machine learning perspective, which reveals that the pattern which humans perceive the expression is acknowledged by machines.
international conference on multimedia and expo | 2010
Xing Zhang; Lijun Yin; Peter Gerhardstein; Daniel Hipp
Humans are able to recognize facial expressions of emotion from faces displaying a large set of confounding variables, including age, gender, ethnicity and other factors. Much work has been dedicated to attempts to characterize the process by which this highly developed capacity functions. In this paper, we propose to investigate local expression-driven features important to distinguishing facial expressions using a so-called ‘Bubbles’ technique [4]. The bubble technique is a kind of Gaussian masking to reveal information contributing to human perceptual categorization. We conducted experiments on factors from both human and machine. Observers are required to browse through the bubble-masked expression image and identify its category. By collecting responses from observers and analyzing them statistically we can find the facial features that humans employ for identifying different expressions. Humans appear to extract and use localized information specific to each expression for recognition. Additionally, we verify the findings by selecting the resulting features for expression classification using a conventional expression recognition algorithm with a public facial expression database.