Is this you? Create Your Porfile

Haoxiang Li

Stevens Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Haoxiang Li is active.

Explore More

Publication

Featured researches published by Haoxiang Li.

computer vision and pattern recognition | 2015

A convolutional neural network cascade for face detection

Haoxiang Li; Zhe Lin; Xiaohui Shen; Jonathan Brandt; Gang Hua

In real-world face detection, large visual variations, such as those due to pose, expression, and lighting, demand an advanced discriminative model to accurately differentiate faces from the backgrounds. Consequently, effective models for the problem tend to be computationally prohibitive. To address these two conflicting challenges, we propose a cascade architecture built on convolutional neural networks (CNNs) with very powerful discriminative capability, while maintaining high performance. The proposed CNN cascade operates at multiple resolutions, quickly rejects the background regions in the fast low resolution stages, and carefully evaluates a small number of challenging candidates in the last high resolution stage. To improve localization effectiveness, and reduce the number of candidates at later stages, we introduce a CNN-based calibration stage after each of the detection stages in the cascade. The output of each calibration stage is used to adjust the detection window position for input to the subsequent stage. The proposed method runs at 14 FPS on a single CPU core for VGA-resolution images and 100 FPS using a GPU, and achieves state-of-the-art detection performance on two public face detection benchmarks.

computer vision and pattern recognition | 2013

Probabilistic Elastic Matching for Pose Variant Face Verification

Haoxiang Li; Gang Hua; Zhe Lin; Jonathan Brandt; Jianchao Yang

Pose variation remains to be a major challenge for real-world face recognition. We approach this problem through a probabilistic elastic matching method. We take a part based representation by extracting local features (e.g., LBP or SIFT) from densely sampled multi-scale image patches. By augmenting each feature with its location, a Gaussian mixture model (GMM) is trained to capture the spatial-appearance distribution of all face images in the training corpus. Each mixture component of the GMM is confined to be a spherical Gaussian to balance the influence of the appearance and the location terms. Each Gaussian component builds correspondence of a pair of features to be matched between two faces/face tracks. For face verification, we train an SVM on the vector concatenating the difference vectors of all the feature pairs to decide if a pair of faces/face tracks is matched or not. We further propose a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy. Our experiments show that our method outperforms the state-of-the-art in the most restricted protocol on Labeled Face in the Wild (LFW) and the YouTube video face database by a significant margin.

Archive | 2016

Labeled Faces in the Wild: A Survey

Erik G. Learned-Miller; Gary B. Huang; Aruni RoyChowdhury; Haoxiang Li; Gang Hua

In 2007, Labeled Faces in the Wild was released in an effort to spur research in face recognition, specifically for the problem of face verification with unconstrained images. Since that time, more than 50 papers have been published that improve upon this benchmark in some respect. A remarkably wide variety of innovative methods have been developed to overcome the challenges presented in this database. As performance on some aspects of the benchmark approaches 100 % accuracy, it seems appropriate to review this progress, derive what general principles we can from these works, and identify key future challenges in face recognition. In this survey, we review the contributions to LFW for which the authors have provided results to the curators (results found on the LFW results web page). We also review the cross cutting topic of alignment and how it is used in various methods. We end with a brief discussion of recent databases designed to challenge the next generation of face recognition algorithms.

international conference on computer vision | 2013

Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation

Haoxiang Li; Gang Hua; Zhe Lin; Jonathan Brandt; Jianchao Yang

We propose an unsupervised detector adaptation algorithm to adapt any offline trained face detector to a specific collection of images, and hence achieve better accuracy. The core of our detector adaptation algorithm is a probabilistic elastic part (PEP) model, which is offline trained with a set of face examples. It produces a statistically aligned part based face representation, namely the PEP representation. To adapt a general face detector to a collection of images, we compute the PEP representations of the candidate detections from the general face detector, and then train a discriminative classifier with the top positives and negatives. Then we re-rank all the candidate detections with this classifier. This way, a face detector tailored to the statistics of the specific image collection is adapted from the original detector. We present extensive results on three datasets with two state-of-the-art face detectors. The significant improvement of detection accuracy over these state of-the-art face detectors strongly demonstrates the efficacy of the proposed face detector adaptation algorithm.

asian conference on computer vision | 2014

Eigen-PEP for Video Face Recognition

Haoxiang Li; Gang Hua; Xiaohui Shen; Zhe L. Lin; Jonathan Brandt

To effectively solve the problem of large scale video face recognition, we argue for a comprehensive, compact, and yet flexible representation of a face subject. It shall comprehensively integrate the visual information from all relevant video frames of the subject in a compact form. It shall also be flexible to be incrementally updated, incorporating new or retiring obsolete observations. In search for such a representation, we present the Eigen-PEP that is built upon the recent success of the probabilistic elastic part (PEP) model. It first integrates the information from relevant video sources by a part-based average pooling through the PEP model, which produces an intermediate high dimensional, part-based, and pose-invariant representation. We then compress the intermediate representation through principal component analysis, and only a number of principal eigen dimensions are kept (as small as 100). We evaluate the Eigen-PEP representation both for video-based face verification and identification on the YouTube Faces Dataset and a new Celebrity-1000 video face dataset, respectively. On YouTube Faces, we further improve the state-of-the-art recognition accuracy. On Celebrity-1000, we lead the competing baselines by a significant margin while offering a scalable solution that is linear with respect to the number of subjects.

ieee international conference on automatic face gesture recognition | 2015

Report on the FG 2015 Video Person Recognition Evaluation

J. Ross Beveridge; Hao Zhang; Bruce A. Draper; Patrick J. Flynn; Zhen-Hua Feng; Patrik Huber; Josef Kittler; Zhiwu Huang; Shaoxin Li; Yan Li; Meina Kan; Ruiping Wang; Shiguang Shan; Xilin Chen; Haoxiang Li; Gang Hua; Vitomir Struc; Janez Krizaj; Changxing Ding; Dacheng Tao; P. Jonathon Phillips

This report presents results from the Video Person Recognition Evaluation held in conjunction with the 11th IEEE International Conference on Automatic Face and Gesture Recognition. Two experiments required algorithms to recognize people in videos from the Point-and-Shoot Face Recognition Challenge Problem (PaSC). The first consisted of videos from a tripod mounted high quality video camera. The second contained videos acquired from 5 different handheld video cameras. There were 1401 videos in each experiment of 265 subjects. The subjects, the scenes, and the actions carried out by the people are the same in both experiments. Five groups from around the world participated in the evaluation. The video handheld experiment was included in the International Joint Conference on Biometrics (IJCB) 2014 Handheld Video Face and Person Recognition Competition. The top verification rate from this evaluation is double that of the top performer in the IJCB competition. Analysis shows that the factor most effecting algorithm performance is the combination of location and action: where the video was acquired and what the person was doing.

computer vision and pattern recognition | 2015

Hierarchical-PEP model for real-world face recognition

Haoxiang Li; Gang Hua

Pose variation remains one of the major factors adversely affect the accuracy of real-world face recognition systems. Inspired by the recently proposed probabilistic elastic part (PEP) model and the success of the deep hierarchical architecture in a number of visual tasks, we propose the Hierarchical-PEP model to approach the unconstrained face recognition problem. We apply the PEP model hierarchically to decompose a face image into face parts at different levels of details to build pose-invariant part-based face representations. Following the hierarchy from bottom-up, we stack the face part representations at each layer, discriminatively reduce its dimensionality, and hence aggregate the face part representations layer-by-layer to build a compact and invariant face representation. The Hierarchical-PEP model exploits the fine-grained structures of the face parts at different levels of details to address the pose variations. It is also guided by supervised information in constructing the face part/face representations. We empirically verify the Hierarchical-PEP model on two public benchmarks (i.e., the LFW and YouTube Faces) and a face recognition challenge (i.e., the PaSC grand challenge) for image-based and video-based face verification. The state-of-the-art performance demonstrates the potential of our method.

International Journal of Central Banking | 2014

The IJCB 2014 PaSC video face and person recognition competition

J. Ross Beveridge; Hao Zhang; Patrick J. Flynn; Yooyoung Lee; Venice Erin Liong; Jiwen Lu; Marcus de Assis Angeloni; Tiago de Freitas Pereira; Haoxiang Li; Gang Hua; Vitomir Struc; Janez Krizaj; P. Jonathon Phillips

The Point-and-Shoot Face Recognition Challenge (PaSC) is a performance evaluation challenge including 1401 videos of 265 people acquired with handheld cameras and depicting people engaged in activities with non-frontal head pose. This report summarizes the results from a competition using this challenge problem. In the Video-to-video Experiment a person in a query video is recognized by comparing the query video to a set of target videos. Both target and query videos are drawn from the same pool of 1401 videos. In the Still-to-video Experiment the person in a query video is to be recognized by comparing the query video to a larger target set consisting of still images. Algorithm performance is characterized by verification rate at a false accept rate of 0.01 and associated receiver operating characteristic (ROC) curves. Participants were provided eye coordinates for video frames. Results were submitted by 4 institutions: (i) Advanced Digital Science Center, Singapore; (ii) CPqD, Brasil; (iii) Stevens Institute of Technology, USA; and (iv) University of Ljubljana, Slovenia. Most competitors demonstrated video face recognition performance superior to the baseline provided with PaSC. The results represent the best performance to date on the handheld video portion of the PaSC.

computer vision and pattern recognition | 2016

A Multi-level Contextual Model for Person Recognition in Photo Albums

Haoxiang Li; Jonathan Brandt; Zhe Lin; Xiaohui Shen; Gang Hua

In this work, we present a new framework for person recognition in photo albums that exploits contextual cues at multiple levels, spanning individual persons, individual photos, and photo groups. Through experiments, we show that the information available at each of these distinct contextual levels provides complementary cues as to person identities. At the person level, we leverage clothing and body appearance in addition to facial appearance, and to compensate for instances where the faces are not visible. At the photo level we leverage a learned prior on the joint distribution of identities on the same photo to guide the identity assignments. Going beyond a single photo, we are able to infer natural groupings of photos with shared context in an unsupervised manner. By exploiting this shared contextual information, we are able to reduce the identity search space and exploit higher intra-personal appearance consistency within photo groups. Our new framework enables efficient use of these complementary multi-level contextual cues to improve overall recognition rates on the photo album person recognition task, as demonstrated through state-of-theart results on a challenging public dataset. Our results outperform competing methods by a significant margin, while being computationally efficient and practical in a real world application.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2018

Probabilistic Elastic Part Model: A Pose-Invariant Representation for Real-World Face Verification

Haoxiang Li; Gang Hua

Pose variation remains to be a major challenge for real-world face recognition. We approach this problem through a probabilistic elastic part model. We extract local descriptors (e.g., LBP or SIFT) from densely sampled multi-scale image patches. By augmenting each descriptor with its location, a Gaussian mixture model (GMM) is trained to capture the spatial-appearance distribution of the face parts of all face images in the training corpus, namely the probabilistic elastic part (PEP) model. Each mixture component of the GMM is confined to be a spherical Gaussian to balance the influence of the appearance and the location terms, which naturally defines a part. Given one or multiple face images of the same subject, the PEP-model builds its PEP representation by sequentially concatenating descriptors identified by each Gaussian component in a maximum likelihood sense. We further propose a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy. Our experiments show that we achieve state-of-the-art face verification accuracy with the proposed representations on the Labeled Face in the Wild (LFW) dataset, the YouTube video face database, and the CMU MultiPIE dataset.

Explore More