Wen-Sheng Chu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wen-Sheng Chu is active.

Explore More

Publication

Featured researches published by Wen-Sheng Chu.

computer vision and pattern recognition | 2013

Selective Transfer Machine for Personalized Facial Action Unit Detection

Wen-Sheng Chu; Fernando De la Torre; Jeffery F. Cohn

Automatic facial action unit (AFA) detection from video is a long-standing problem in facial expression analysis. Most approaches emphasize choices of features and classifiers. They neglect individual differences in target persons. People vary markedly in facial morphology (e.g., heavy versus delicate brows, smooth versus deeply etched wrinkles) and behavior. Individual differences can dramatically influence how well generic classifiers generalize to previously unseen persons. While a possible solution would be to train person-specific classifiers, that often is neither feasible nor theoretically compelling. The alternative that we propose is to personalize a generic classifier in an unsupervised manner (no additional labels for the test subjects are required). We introduce a transductive learning method, which we refer to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases. STM achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject. To evaluate the effectiveness of STM, we compared STM to generic classifiers and to cross-domain learning methods in three major databases: CK+, GEMEP-FERA and RU-FACS. STM outperformed generic classifiers in all.

computer vision and pattern recognition | 2015

Video co-summarization: Video summarization by visual co-occurrence

Wen-Sheng Chu; Yale Song; Alejandro Jaimes

We present video co-summarization, a novel perspective to video summarization that exploits visual co-occurrence across multiple videos. Motivated by the observation that important visual concepts tend to appear repeatedly across videos of the same topic, we propose to summarize a video by finding shots that co-occur most frequently across videos collected using a topic keyword. The main technical challenge is dealing with the sparsity of co-occurring patterns, out of hundreds to possibly thousands of irrelevant shots in videos being considered. To deal with this challenge, we developed a Maximal Biclique Finding (MBF) algorithm that is optimized to find sparsely co-occurring patterns, discarding less co-occurring patterns even if they are dominant in one video. Our algorithm is parallelizable with closed-form updates, thus can easily scale up to handle a large number of videos simultaneously. We demonstrate the effectiveness of our approach on motion capture and self-compiled YouTube datasets. Our results suggest that summaries generated by visual co-occurrence tend to match more closely with human generated summaries, when compared to several popular unsupervised techniques.

computer vision and pattern recognition | 2015

Joint patch and multi-label learning for facial action unit detection

Kaili Zhao; Wen-Sheng Chu; Fernando De la Torre; Jeffrey F. Cohn; Honggang Zhang

The face is one of the most powerful channel of nonverbal communication. The most commonly used taxonomy to describe facial behaviour is the Facial Action Coding System (FACS). FACS segments the visible effects of facial muscle activation into 30+ action units (AUs). AUs, which may occur alone and in thousands of combinations, can describe nearly all-possible facial expressions. Most existing methods for automatic AU detection treat the problem using one-vs-all classifiers and fail to exploit dependencies among AU and facial features. We introduce joint-patch and multi-label learning (JPML) to address these issues. JPML leverages group sparsity by selecting a sparse subset of facial patches while learning a multi-label classifier. In four of five comparisons on three diverse datasets, CK+, GFT, and BP4D, JPML produced the highest average F1 scores in comparison with state-of-the art.

international conference on computer vision | 2013

Facial Action Unit Event Detection by Cascade of Tasks

Xiaoyu Ding; Wen-Sheng Chu; Fernando De la Torre; Jeffery F. Cohn; Qiao Wang

Automatic facial Action Unit (AU) detection from video is a long-standing problem in facial expression analysis. AU detection is typically posed as a classification problem between frames or segments of positive examples and negative ones, where existing work emphasizes the use of different features or classifiers. In this paper, we propose a method called Cascade of Tasks (CoT) that combines the use of different tasks (i.e., frame, segment and transition) for AU event detection. We train CoT in a sequential manner embracing diversity, which ensures robustness and generalization to unseen data. In addition to conventional frame based metrics that evaluate frames independently, we propose a new event-based metric to evaluate detection performance at event-level. We show how the CoT method consistently outperforms state-of-the-art approaches in both frame-based and event-based metrics, across three public datasets that differ in complexity: CK+, FERA and RU-FACS.

european conference on computer vision | 2012

Unsupervised temporal commonality discovery

Wen-Sheng Chu; Feng Zhou; Fernando De la Torre

Unsupervised discovery of commonalities in images has recently attracted much interest due to the need to find correspondences in large amounts of visual data. A natural extension, and a relatively unexplored problem, is how to discover common semantic temporal patterns in videos. That is, given two or more videos, find the subsequences that contain similar visual content in an unsupervised manner. We call this problem Temporal Commonality Discovery (TCD). The naive exhaustive search approach to solve the TCD problem has a computational complexity quadratic with the length of each sequence, making it impractical for regular-length sequences. This paper proposes an efficient branch and bound (B&B) algorithm to tackle the TCD problem. We derive tight bounds for classical distances between temporal bag of words of two segments, including l1, intersection and χ2. Using these bounds the B&B algorithm can efficiently find the global optimal solution. Our algorithm is general, and it can be applied to any feature that has been quantified into histograms. Experiments on finding common facial actions in video and human actions in motion capture data demonstrate the benefits of our approach. To the best of our knowledge, this is the first work that addresses unsupervised discovery of common events in videos.

computer vision and pattern recognition | 2016

Deep Region and Multi-label Learning for Facial Action Unit Detection

Kaili Zhao; Wen-Sheng Chu; Honggang Zhang

Region learning (RL) and multi-label learning (ML) have recently attracted increasing attentions in the field of facial Action Unit (AU) detection. Knowing that AUs are active on sparse facial regions, RL aims to identify these regions for a better specificity. On the other hand, a strong statistical evidence of AU correlations suggests that ML is a natural way to model the detection task. In this paper, we propose Deep Region and Multi-label Learning (DRML), a unified deep network that simultaneously addresses these two problems. One crucial aspect in DRML is a novel region layer that uses feed-forward functions to induce important facial regions, forcing the learned weights to capture structural information of the face. Our region layer serves as an alternative design between locally connected layers (i.e., confined kernels to individual pixels) and conventional convolution layers (i.e., shared kernels across an entire image). Unlike previous studies that solve RL and ML alternately, DRML by construction addresses both problems, allowing the two seemingly irrelevant problems to interact more directly. The complete network is end-to-end trainable, and automatically learns representations robust to variations inherent within a local region. Experiments on BP4D and DISFA benchmarks show that DRML performs the highest average F1-score and AUC within and across datasets in comparison with alternative methods.

IEEE Signal Processing Letters | 2011

Fast and Robust Circular Object Detection With Probabilistic Pairwise Voting

Lili Pan; Wen-Sheng Chu; Jason M. Saragih; F. De la Torre; Mei Xie

Accurate and efficient detection of circular objects in images is a challenging computer vision problem. Existing circular object detection methods can be broadly classified into two categories: voting based and maximum likelihood estimation (MLE) based. The former is robust to noise, however its computational complexity and memory requirement are high. On the other hand, MLE based methods (e.g., robust least squares fitting) are more computationally efficient but sensitive to noise, and can not detect multiple circles. This letter proposes Probabilistic Pairwise Voting (PPV), a fast and robust algorithm for circular object detection based on an extension of Hough Transform. The main contributions are threefold. 1) We formulate the problem of circular object detection as finding the intersection of lines in the three dimensional parameter space (i.e., center and radius of the circle). 2) We propose a probabilistic pairwise voting scheme to robustly discover circular objects under occlusion, image noise and moderate shape deformations. 3) We use a mode-finding algorithm to efficiently find multiple circular objects. We demonstrate the benefits of our approach on two real-world problems: 1) detecting circular objects in natural images, and 2) localizing iris in face images.

IEEE Transactions on Image Processing | 2016

Joint Patch and Multi-label Learning for Facial Action Unit and Holistic Expression Recognition

Kaili Zhao; Wen-Sheng Chu; Fernando De la Torre; Jeffrey F. Cohn; Honggang Zhang

Most action unit (AU) detection methods use one-versus-all classifiers without considering dependences between features or AUs. In this paper, we introduce a joint patch and multi-label learning (JPML) framework that models the structured joint dependence behind features, AUs, and their interplay. In particular, JPML leverages group sparsity to identify important facial patches, and learns a multi-label classifier constrained by the likelihood of co-occurring AUs. To describe such likelihood, we derive two AU relations, positive correlation and negative competition, by statistically analyzing more than 350,000 video frames annotated with multiple AUs. To the best of our knowledge, this is the first work that jointly addresses patch learning and multi-label learning for AU detection. In addition, we show that JPML can be extended to recognize holistic expressions by learning common and specific patches, which afford a more compact representation than the standard expression recognition methods. We evaluate JPML on three benchmark datasets CK+, BP4D, and GFT, using within-and cross-dataset scenarios. In four of five experiments, JPML achieved the highest averaged F1 scores in comparison with baseline and alternative methods that use either patch learning or multi-label learning alone.

ieee international conference on automatic face gesture recognition | 2017

Learning Spatial and Temporal Cues for Multi-Label Facial Action Unit Detection

Wen-Sheng Chu; Fernando De la Torre; Jeffrey F. Cohn

Facial action units (AU) are the fundamental units to decode human facial expressions. At least three aspects affect performance of automated AU detection: spatial representation, temporal modeling, and AU correlation. Unlike most studies that tackle these aspects separately, we propose a hybrid network architecture to jointly model them. Specifically, spatial representations are extracted by a Convolutional Neural Network (CNN), which, as analyzed in this paper, is able to reduce person-specific biases caused by hand-crafted descriptors (e.g., HOG and Gabor). To model temporal dependencies, Long Short-Term Memory (LSTMs) are stacked on top of these representations, regardless of the lengths of input videos. The outputs of CNNs and LSTMs are further aggregated into a fusion network to produce per-frame prediction of 12 AUs. Our network naturally addresses the three issues together, and yields superior performance compared to existing methods that consider these issues independently. Extensive experiments were conducted on two large spontaneous datasets, GFT and BP4D, with more than 400,000 frames coded with 12 AUs. On both datasets, we report improvements over a standard multi-label CNN and feature-based state-of-the-art. Finally, we provide visualization of the learned AU models, which, to our best knowledge, reveal how machines see AUs for the first time.

ieee international conference on automatic face gesture recognition | 2017

Sayette Group Formation Task (GFT) Spontaneous Facial Expression Database

Jeffrey M. Girard; Wen-Sheng Chu; László A. Jeni; Jeffrey F. Cohn

Despite the important role that facial expressionsplay in interpersonal communication and our knowledge thatinterpersonal behavior is influenced by social context, nocurrently available facial expression database includes multipleinteracting participants. The Sayette Group Formation Task(GFT) database addresses the need for well-annotated videoof multiple participants during unscripted interactions. Thedatabase includes 172,800 video frames from 96 participantsin 32 three-person groups. To aid in the development ofautomated facial expression analysis systems, GFT includesexpert annotations of FACS occurrence and intensity, faciallandmark tracking, and baseline results for linear SVM, deeplearning, active patch learning, and personalized classification.Baseline performance is quantified and compared using identicalpartitioning and a variety of metrics (including meansand confidence intervals). The highest performance scores werefound for the deep learning and active patch learning methods.Learn more at http://osf.io/7wcyz.

Explore More