Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gee-Sern Hsu is active.

Publication


Featured researches published by Gee-Sern Hsu.


computer vision and pattern recognition | 2009

Face verification and identification using Facial Trait Code

Ping-Han Lee; Gee-Sern Hsu; Yi-Ping Hung

We propose the Facial Trait Code (FTC) to encode human facial images. The proposed FTC is motivated by the discovery of some basic patterns existing in certain local facial features. We call these basic patterns Distinctive Trait Patterns (DTP), which can be extracted from a large number of faces. We have also found that the fusion of these DTPs can accurately capture the appearance of a face. The extraction of DTP involves clustering and boosting for maximizing the discrimination between human faces. The extracted DTPs can be symbolized and used to make up the n-ary facial trait codes. A given face can be encoded at some prescribed facial traits to render an n-ary facial trait code with each symbol in its codeword corresponding to the closest DTP. We applied FTC to a face identification and verification problems with 3575 facial images from 840 people under different illumination conditions, and it yielded satisfactory results.


international conference on computer vision | 2015

Regressive Tree Structured Model for Facial Landmark Localization

Gee-Sern Hsu; Kai-Hsiang Chang; Shih-Chieh Huang

Although the Tree Structured Model (TSM) is proven effective for solving face detection, pose estimation and landmark localization in an unified model, its sluggish run time makes it unfavorable in practical applications, especially when dealing with cases of multiple faces. We propose the Regressive Tree Structure Model (RTSM) to improve the run-time speed and localization accuracy. The RTSM is composed of two component TSMs, the coarse TSM (c-TSM) and the refined TSM (r-TSM), and a Bilateral Support Vector Regressor (BSVR). The c-TSM is built on the low-resolution octaves of samples so that it provides coarse but fast face detection. The r-TSM is built on the mid-resolution octaves so that it can locate the landmarks on the face candidates given by the c-TSM and improve precision. The r-TSM based landmarks are used in the forward BSVR as references to locate the dense set of landmarks, which are then used in the backward BSVR to relocate the landmarks with large localization errors. The forward and backward regression goes on iteratively until convergence. The performance of the RTSM is validated on three benchmark databases, the Multi-PIE, LFPW and AFW, and compared with the latest TSM to demonstrate its efficacy.


systems man and cybernetics | 2012

Subject-Specific and Pose-Oriented Facial Features for Face Recognition Across Poses

Ping-Han Lee; Gee-Sern Hsu; Yun-Wen Wang; Yi-Ping Hung

Most face recognition scenarios assume that frontal faces or mug shots are available for enrollment to the database, faces of other poses are collected in the probe set. Given a face from the probe set, one needs to determine whether a match in the database exists. This is under the assumption that in forensic applications, most suspects have their mug shots available in the database, and face recognition aims at recognizing the suspects when their faces of various poses are captured by a surveillance camera. This paper considers a different scenario: given a face with multiple poses available, which may or may not include a mug shot, develop a method to recognize the face with poses different from those captured. That is, given two disjoint sets of poses of a face, one for enrollment and the other for recognition, this paper reports a method best for handling such cases. The proposed method includes feature extraction and classification. For feature extraction, we first cluster the poses of each subjects face in the enrollment set into a few pose classes and then decompose the appearance of the face in each pose class using Embedded Hidden Markov Model, which allows us to define a set of subject-specific and pose-priented (SSPO) facial components for each subject. For classification, an Adaboost weighting scheme is used to fuse the component classifiers with SSPO component features. The proposed method is proven to outperform other approaches, including a component-based classifier with local facial features cropped manually, in an extensive performance evaluation study.


international conference on automation, robotics and applications | 2000

Real-time 3-D object recognition using Scale Invariant Feature Transform and stereo vision

Gee-Sern Hsu; Chyi-Yeu Lin; Jia-Shan Wu

Scale Invariant Feature Transform (SIFT) and stereo vision are applied together to recognize objects in real time. This work reports the performance of a GPU (Graphic Processing Unit) based real-time feature detector in capturing the features of 3D objects when the objects undergo rotational and translational motions in cluttered backgrounds. We have compared the performance of the feature detector implemented upon GPU to that upon CPU, and shown that GPU-based solution has substantially outperformed its CPU counterpart.


computer vision and pattern recognition | 2013

Face Recognition across Poses Using a Single 3D Reference Model

Gee-Sern Hsu; Hsiao-Chia Peng

Approaches for cross-pose face recognition can be split into 2D image based and 3D model based. Many 2D based methods are reported with promising performance but can only work for poses same as those in the training set. Although 3D based methods can handle arbitrary poses, only a small number of approaches are available. Extended from a latest face reconstruction method using a single 3D reference model, this study focuses on using the reconstructed 3D face for recognition. The reconstructed 3D face allows the generation of multi-pose samples for recognition. The recognition performance varies with poses, the closer the pose to the frontal, the better the performance attained. Several ways to improve the performance are attempted, including different numbers of fiducial points for alignment, multiple reference models considered in the reconstruction phase, and both frontal and profile poses available in the gallery. These attempts make this approach competitive to the state-of-the-art methods.


asian conference on computer vision | 2010

Local empirical templates and density ratios for people counting

Dao Huu Hung; Sheng-Luen Chung; Gee-Sern Hsu

We extract local empirical templates and density ratios from a large collection of surveillance videos, and develop a fast and lowcost scheme for people counting. The local empirical templates are extracted by clustering the foregrounds induced by single pedestrians with similar features in silhouettes. The density ratio is obtained by comparing the size of the foreground induced by a group of pedestrians to that of the local empirical template considered the most appropriate for the region where the group foreground is captured. Because of the local scale normalization between sizes, the density ratio appears to have a bound closely related to the number of pedestrians that induce the group foreground. We estimate the bounds of density ratios for groups of different numbers of pedestrians in the learning phase, and use the estimated bounds to count the pedestrians in online settings. The results are promising.


IEEE Transactions on Circuits and Systems for Video Technology | 2014

A Framework for Making Face Detection Benchmark Databases

Gee-Sern Hsu; Tsu-Ying Chu

The images in face detection benchmark databases are mostly taken by consumer cameras, and thus are constrained by popular preferences, including a frontal pose and balanced lighting conditions. A good face detector should consider beyond such constraints and work well for other types of images, for example, those captured by a surveillance camera. To overcome such constraints, a framework is proposed to transform a mother database, originally made for benchmarking face recognition, to daughter datasets that are good for benchmarking face detection. The daughter datasets can be customized to meet the requirements of various performance criteria; therefore, a face detector can be better evaluated on desired datasets. The framework is composed of two phases: 1) intrinsic parametrization and 2) extrinsic parametrization. The former parametrizes the intrinsic variables that affect the appearance of a face, and the latter parametrizes the extrinsic variables that determine how faces appear on an image. Experiments reveal that the proposed framework can generate not just data that are similar to those available from popular benchmark databases, but also those that are hardly available from existing databases. The datasets generated by the proposed framework offer the following advantages: 1) they can define the performance specification of a face detector in terms of the detection rates on variables with different variation scopes; 2) they can benchmark the performance on one single or multiple variables, which can be difficult to collect; and 3) their ground truth is available when the datasets are generated, avoiding the time-consuming manual annotation.


advanced video and signal based surveillance | 2017

Robust license plate detection in the wild

Gee-Sern Hsu; ArulMurugan Ambikapathi; Sheng-Luen Chung; Cheng-Po Su

License Plate Detection (LPD) is the pivotal step for License Plate Recognition. In this work, we explore and customize state-of-the-art detection approaches for exclusively handling the LPD in the wild. In-the-wild LPD considers license plates captured in challenging conditions caused by bad weathers, lighting, traffics, and other factors. As conventional methods failed to handle these inevitable conditions, we explore the latest deep learning based detectors, namely YOLO (You-Only-Look-Once) and its variant YOLO-9000 (referred here as YOLO-2), and customize them for effectively handling the LPD. The prime customizations include modification of the grid size and of the bounding box parameter estimation, and the composition of a more challenging AOLPE (Application-Oriented License Plate Extended) database for performance evaluation. The AOLPE database is an extended version of the AOLP database [1] with additional images taken under extreme but frequently-encountered conditions. As the original YOLO and YOLO-2 are not designed for the LPD, they failed to handle the LPD on the AOLPE without the customizations. This study can be one of the pioneering works that revise state-of-the-art real-time deep networks for handling the LPD. It also serves as a case study for those who wish to customize existing deep networks for detecting specific objects. In addition to a pioneering explorations of deep networks for handling the in-the-wild LPD, our contribution also includes the release of the AOLPE database and evaluation protocol for a novel benchmark for the LPD.


computer vision and pattern recognition | 2014

Landmark Based Facial Component Reconstruction for Recognition across Pose

Gee-Sern Hsu; Hsiao-Chia Peng; Kai-Hsiang Chang

Different from previous 3D face modeling approaches that consider the whole facial area, the proposed method reconstructs 3D facial components for handling cross-pose recognition. It has two phases, component reconstruction and component-based recognition. In the reconstruction phase, we first extract four component regions, namely two eyes, nose and mouth, from each gallery face using the pose-invariant landmarks obtained by a modified version of a landmark detection algorithm. A 3D model of each component region is reconstructed using a constrained minimization scheme with a gender and ethnicity oriented 3D model as the reference. In the recognition phase, the pose of a given probe is determined by a set of landmarks which guides the rotation of the reconstructed components so that the reconstructed can be aligned to the probe components. The match is determined by the components instead of the whole faces so that different components can be considered at different poses. Experiments on the PIE and Multi-PIE databases show that the proposed component-based approach does not just outperform its holistic counterpart, but is also competitive to many contemporary methods.


IEEE Transactions on Circuits and Systems for Video Technology | 2013

Facial Trait Code

Ping-Han Lee; Gee-Sern Hsu; Tsuhan Chen; Yi-Ping Hung

We propose a facial trait code (FTC) to encode human facial images, and apply it to face recognition. Extracted from an exhaustive set of local patches cropped from a large stack of faces, the facial traits and the associated trait patterns can accurately capture the appearance of a given face. The extraction has two phases. The first phase is composed of clustering and boosting upon a training set of faces with neutral expression, even illumination, and frontal pose. The second phase focuses on the extraction of the facial trait patterns from the set of faces with variations in expression, illumination, and poses. To apply the FTC to face recognition, two types of codewords, hard and probabilistic, with different metrics for characterizing the facial trait patterns are proposed. The hard codeword offers a concise representation of a face, while the probabilistic codeword enables matching with better accuracy. Our experiments compare the proposed FTC to other algorithms on several public datasets, all showing promising results.

Collaboration


Dive into the Gee-Sern Hsu's collaboration.

Top Co-Authors

Avatar

Ping-Han Lee

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Yi-Ping Hung

National Taiwan University

View shared research outputs
Top Co-Authors

Avatar

Hsiao-Chia Peng

National Taiwan University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Sheng-Luen Chung

National Taiwan University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Cheng-Hua Hsieh

National Taiwan University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Chyi-Yeu Lin

National Taiwan University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Kai-Hsiang Chang

National Taiwan University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Shang-Min Yeh

National Taiwan University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shih-Chieh Huang

National Taiwan University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge