Yingli Tian | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yingli Tian is active.

Explore More

Publication

Featured researches published by Yingli Tian.

ieee international conference on automatic face and gesture recognition | 2000

Comprehensive database for facial expression analysis

Takeo Kanade; Jeffrey F. Cohn; Yingli Tian

Within the past decade, significant effort has occurred in developing methods of facial expression analysis. Because most investigators have used relatively limited data sets, the generalizability of these various methods remains unknown. We describe the problem space for facial expression analysis, which includes level of description, transitions among expressions, eliciting conditions, reliability and validity of training and test data, individual differences in subjects, head orientation and scene complexity image characteristics, and relation to non-verbal behavior. We then present the CMU-Pittsburgh AU-Coded Face Expression Image Database, which currently includes 2105 digitized image sequences from 182 adult subjects of varying ethnicity, performing multiple tokens of most primary FACS action units. This database is the most comprehensive testbed to date for comparative studies of facial expression analysis.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2001

Recognizing action units for facial expression analysis

Yingli Tian; Takeo Kanade; Jeffrey F. Cohn

Most automatic expression analysis systems attempt to recognize a small set of prototypic expressions, such as happiness, anger, surprise, and fear. Such prototypic expressions, however, occur rather infrequently. Human emotions and intentions are more often communicated by changes in one or a few discrete facial features. In this paper, we develop an Automatic Face Analysis (AFA) system to analyze facial expressions based on both permanent facial features (brows, eyes, mouth) and transient facial features (deepening of facial furrows) in a nearly frontal-view face image sequence. The AFA system recognizes fine-grained changes in facial expression into action units (AUs) of the Facial Action Coding System (FACS), instead of a few prototypic expressions. Multistate face and facial component models are proposed for tracking and modeling the various facial features, including lips, eyes, brows, cheeks, and furrows. During tracking, detailed parametric descriptions of the facial features are extracted. With these parameters as the inputs, a group of action units (neutral expression, six upper face AUs and 10 lower face AUs) are recognized whether they occur alone or in combinations. The system has achieved average recognition rates of 96.4 percent (95.4 percent if neutral expressions are excluded) for upper face AUs and 96.7 percent (95.6 percent with neutral expressions excluded) for lower face AUs. The generalizability of the system has been tested by using independent image databases collected and FACS-coded for ground-truth by different research teams.

Image and Vision Computing | 2006

Appearance models for occlusion handling

Andrew W. Senior; Arun Hampapur; Yingli Tian; Lisa M. Brown; Sharath Pankanti; Ruud M. Bolle

Abstract Objects in the world exhibit complex interactions. When captured in a video sequence, some interactions manifest themselves as occlusions. A visual tracking system must be able to track objects, which are partially or even fully occluded. In this paper we present a method of tracking objects through occlusions using appearance models. These models are used to localize objects during partial occlusions, detect complete occlusions and resolve depth ordering of objects during occlusions. This paper presents a tracking system which successfully deals with complex real world interactions, as demonstrated on the PETS 2001 dataset.

computer vision and pattern recognition | 2005

Robust and efficient foreground analysis for real-time video surveillance

Yingli Tian; Max Lu; Arun Hampapur

We present a new method to robustly and efficiently analyze foreground when we detect background for a fixed camera view by using mixture of Gaussians models and multiple cues. The background is modeled by three Gaussian mixtures as in the work of Stauffer and Grimson (1999). Then the intensity and texture information are integrated to remove shadows and to enable the algorithm working for quick lighting changes. For foreground analysis, the same Gaussian mixture model is employed to detect the static foreground regions without using any tracking or motion information. Then the whole static regions are pushed back to the background model to avoid a common problem in background subtraction /spl times/ fragmentation (one object becomes multiple parts). The method was tested on our real time video surveillance system. It is robust and run about 130 fpsfor color images and 150 fps for grayscale images at size 160/spl times/120 on a 2GB Pentium IV machine with MMX optimization.

acm multimedia | 2012

Recognizing actions using depth motion maps-based histograms of oriented gradients

Xiaodong Yang; Chenyang Zhang; Yingli Tian

In this paper, we propose an effective method to recognize human actions from sequences of depth maps, which provide additional body shape and motion information for action recognition. In our approach, we project depth maps onto three orthogonal planes and accumulate global activities through entire video sequences to generate the Depth Motion Maps (DMM). Histograms of Oriented Gradients (HOG) are then computed from DMM as the representation of an action video. The recognition results on Microsoft Research (MSR) Action3D dataset show that our approach significantly outperforms the state-of-the-art methods, although our representation is much more compact. In addition, we investigate how many frames are required in our framework to recognize actions on the MSR Action3D dataset. We observe that a short sub-sequence of 30-35 frames is sufficient to achieve comparable results to that operating on entire video sequences.

computer vision and pattern recognition | 2012

EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor

Xiaodong Yang; Yingli Tian

In this paper, we propose an effective method to recognize human actions from 3D positions of body joints. With the release of RGBD sensors and associated SDK, human body joints can be extracted in real time with reasonable accuracy. In our method, we propose a new type of features based on position differences of joints, EigenJoints, which combine action information including static posture, motion, and offset. We further employ the Naïve-Bayes-Nearest-Neighbor (NBNN) classifier for multi-class action classification. The recognition results on the Microsoft Research (MSR) Action3D dataset demonstrate that our approach significantly outperforms the state-of-the-art methods. In addition, we investigate how many frames are necessary for our method to recognize actions on the MSR Action3D dataset. We observe 15-20 frames are sufficient to achieve comparable results to that using the entire video sequences.

IEEE Transactions on Image Processing | 2011

Text String Detection From Natural Scenes by Structure-Based Partition and Grouping

Chucai Yi; Yingli Tian

Text information in natural scene images serves as important clues for many image-based applications such as scene understanding, content-based image retrieval, assistive navigation, and automatic geocoding. However, locating text from a complex background with multiple colors is a challenging task. In this paper, we explore a new framework to detect text strings with arbitrary orientations in complex natural scene images. Our proposed framework of text string detection consists of two steps: 1) image partition to find text character candidates based on local gradient features and color uniformity of character components and 2) character candidate grouping to detect text strings based on joint structural features of text characters in each text string such as character size differences, distances between neighboring characters, and character alignment. By assuming that a text string has at least three characters, we propose two algorithms of text string detection: 1) adjacent character grouping method and 2) text line grouping method. The adjacent character grouping method calculates the sibling groups of each character candidate as string segments and then merges the intersecting sibling groups into text string. The text line grouping method performs Hough transform to fit text line among the centroids of text candidates. Each fitted text line describes the orientation of a potential text string. The detected text string is presented by a rectangle region covering all characters whose centroids are cascaded in its text line. To improve efficiency and accuracy, our algorithms are carried out in multi-scales. The proposed methods outperform the state-of-the-art results on the public Robust Reading Dataset, which contains text only in horizontal orientation. Furthermore, the effectiveness of our methods to detect text strings with arbitrary orientations is evaluated on the Oriented Scene Text Dataset collected by ourselves containing text strings in nonhorizontal orientations.

ieee international conference on automatic face and gesture recognition | 2002

Evaluation of Gabor-wavelet-based facial action unit recognition in image sequences of increasing complexity

Yingli Tian; Takeo Kanade; Jeffrey F. Cohn

Previous work suggests that Gabor-wavelet-based methods can achieve high sensitivity and specificity for emotion-specified expressions (e.g., happy, sad) and single action units (AUs) of the Facial Action Coding System (FACS). This paper evaluates a Gabor-wavelet-based method to recognize AUs in image sequences of increasing complexity. A recognition rate of 83% is obtained for three single AUs when image sequences contain homogeneous subjects and are without observable head motion. The accuracy of AU recognition decreases to 32% when the number of AUs increases to nine and the image sequences consist of AU combinations, head motion, and non-homogeneous subjects. For comparison, an average recognition rate of 87.6% is achieved for the geometry-feature-based method. The best recognition is a rate of 92.7% obtained by combining Gabor wavelets and geometry features.

computer vision and pattern recognition | 2004

Evaluation of Face Resolution for Expression Analysis

Yingli Tian

Most automatic facial expression analysis (AFEA) systems attempt to recognize facial expressions from data collected in a highly controlled environment with very high resolution frontal faces ( face regions greater than 200 x 200 pixels). However, in real environments, the face image is often in lower resolution and with head motion. It is unclear that the performance of AFEA systems for low resolution face images. The general approach to AFEA consists of 3 steps: face acquisition, facial feature extraction, and facial expression recognition. This paper explores the effects of different image resolutions for each step of facial expression analysis. The different approaches are compared for face detection, face data extraction and expression recognition. A total of five different resolutions of the head region are studied (288x384, 144x192, 72x96, 36x48, and 18Xx24) based on a widely used public database. The lower resolution images are down-sampled from the originals.

Journal of Visual Communication and Image Representation | 2014

Effective 3D action recognition using EigenJoints

Xiaodong Yang; Yingli Tian

HighlightsEffective method to recognize human actions using 3D skeleton joints.New action feature descriptor, EigenJoints, for action recognition.Accumulated Motion Energy (AME) method to perform informative frames selection.Our proposed approach significantly outperforms the state-of-the-art methods on three public datasets. In this paper, we propose an effective method to recognize human actions using 3D skeleton joints recovered from 3D depth data of RGBD cameras. We design a new action feature descriptor for action recognition based on differences of skeleton joints, i.e., EigenJoints which combine action information including static posture, motion property, and overall dynamics. Accumulated Motion Energy (AME) is then proposed to perform informative frame selection, which is able to remove noisy frames and reduce computational cost. We employ non-parametric Naive-Bayes-Nearest-Neighbor (NBNN) to classify multiple actions. The experimental results on several challenging datasets demonstrate that our approach outperforms the state-of-the-art methods. In addition, we investigate how many frames are necessary for our method to perform classification in the scenario of online action recognition. We observe that the first 30-40% frames are sufficient to achieve comparable results to that using the entire video sequences on the MSR Action3D dataset.

Explore More