Genquan Duan
Tsinghua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Genquan Duan.
international conference on computer vision | 2009
Genquan Duan; Chang Huang; Haizhou Ai; Shihong Lao
This paper proposes a novel approach to boost a set of Associated Pairing Comparison Features (APCFs) in Granular Space for pedestrian detection, in which Pairing Comparison of Color (PCC) and Pairing Comparison of Gradient (PCG) are two kinds of essential elements. A PCC is a Boolean color comparison of two granules and a PCG is a Boolean gradient comparison of two granules, which is motivated by animal vision system that using simple comparison information in both color and gradient modes for visual perception. Unlike previous works that describe object shape, our method is to find the symbiosis of colors or gradient orientations. Experiments on multi-view multi-pose pedestrian data demonstrate the efficacy of the proposed approach.
european conference on computer vision | 2010
Genquan Duan; Haizhou Ai; Shihong Lao
Occlusions and articulated posesmake human detectionmuch more difficult than common more rigid object detection like face or car. In this paper, a Structural Filter (SF) approach to human detection is presented in order to deal with occlusions and articulated poses. A three-level hierarchical object structure consisting ofwords, sentences and paragraphs in analog to text grammar is proposed and correspondingly each level is associated to a kind of SF, that is, Word Structural Filter (WSF), Sentences Structural Filter (SSF) and Paragraph Structural Filter (PSF). A SF is a set of detectors which is able to infer what structures a test window possesses, and specifically WSF is composed of all detectors for words, SSF is composed of all detectors for sentences, and so as PSF. WSF works on the most basic units of an object. SSF deals with meaningful sub structures of an object. Visible parts of human in crowded scene can be head-shoulder, left-part, right-part, upper-body or whole-body, and articulated human change a lot in pose especially in doing sports. Visible parts and different poses are the appearance statuses of detected humans handled by PSF. The three levels of SFs, WSF, SSF and PSF, are integrated in an embedded structure to form a powerful classifier, named as Integrated Structural Filter (ISF). Detection experiments on pedestrian in highly crowded scenes and articulated human show the effectiveness and efficiency of our approach.
european conference on computer vision | 2012
Genquan Duan; Haizhou Ai; Song Cao; Shihong Lao
In this paper, we propose to track multiple previously unseen objects in unconstrained scenes. Instead of considering objects individually, we model objects in mutual context with each other to benefit robust and accurate tracking. We introduce a unified framework to combine both Individual Object Models (IOMs) and Mutual Relation Models (MRMs). The MRMs consist of three components, the relational graph to indicate related objects, the mutual relation vectors calculated within related objects to show the interactions, and the relational weights to balance all interactions and IOMs. As MRMs are varying along temporal sequences, we propose online algorithms to make MRMs adapt to current situations. We update relational graphs through analyzing object trajectories and cast the relational weight learning task as an online latent SVM problem. Extensive experiments on challenging real world video sequences demonstrate the efficiency and effectiveness of our framework.
international conference on image processing | 2012
Yuning Du; Genquan Duan; Haizhou Ai
Text detection in natural scenes is fundamental for text image analysis. In this paper, we propose a context-based approach for robust and fast text detection. Our main contribution is that we introduce a new concept of key region, which is described with context according to stroke properties, appearance consistency and specific spatial distribution of text line. With such context descriptors, we adopt SVM to learn a context-based classifier to find key regions in candidate regions. Therein, candidate regions are connected components generated by local binarization algorithm in the areas, which are detected by an offline learned text patch detector. Experimental results on two benchmark datasets demonstrate that our approach has achieved competitive performances compared with the state-of-the-art algorithms including the stroke width transform (SWT) [1] and the hybrid approach based on CRFs [2] with speedup rates of about 1.7x~4.4x.
Pattern Recognition Letters | 2014
Liwei Liu; Junliang Xing; Genquan Duan; Haizhou Ai
This paper focuses on detecting vehicles in different target scenes with the same pre-trained detector which is very challenging due to view variations. To address this problem, we propose a novel approach for detection adaptation based on scene transformation, which contributes in both view transformation and automatic parameter estimation. Instead of modifying the pre-trained detectors, we transform scenes into frontal/rear view handling with pitch and yaw view variations. Without human interactions but only some general prior knowledge, the transformation parameters are automatically initialized, and then online optimized with spatial-temporal voting, which guarantees that the transformation matches the pre-trained detector. Since there is no need of labeling new samples and manual camera calibration, our approach can considerably reduce manual interactions. Experiments on challenging real-world videos demonstrate that our approach achieves significant improvements over the pre-trained detector, and it is even comparable to the performance of the detector trained on fully labeled sequences.
Image and Vision Computing | 2012
Genquan Duan; Haizhou Ai; Junliang Xing; Song Cao; Shihong Lao
How far can human detection and tracking go in real world crowded scenes? Many algorithms often fail in such scenes due to frequent and severe occlusions as well as viewpoint changes. In order to handle these difficulties, we propose Scene Aware Detection (SAD) and Block Assignment Tracking (BAT) that incorporate with some available scene models (e.g. background, layout, ground plane and camera models). The SAD is proposed for accurate detection through utilizing 1) camera model to deal with viewpoint changes by rectifying sub-images, 2) a structural filter approach to handle occlusions based on a feature sharing mechanism in which a three-level hierarchical structure is built for humans, and 3) foregrounds for pruning negative and false positive samples and merging intermediate detection results. Many detection or appearance based tracking systems are prone to errors in occluded scenes because of failures of detectors and interactions of multiple objects. Differently, the BAT formulates tracking as a block assignment process, where blocks with the same label form the appearance of one object. In the BAT, we model objects on two levels, one is the ensemble level to measure how it is like an object by discriminative models, and the other one is the block level to measure how it is like a target object by appearance and motion models. The main advantage of BAT is that it can track an object even when all the part detectors fail as long as the object has assigned blocks. Extensive experiments in many challenging real world scenes demonstrate the efficiency and effectiveness of our approach.
asian conference on computer vision | 2010
Genquan Duan; Haizhou Ai; Shihong Lao
In this paper, we aim to detect human in video over large viewpoint changes which is very challenging due to the diversity of human appearance and motion from a wide spread of viewpoint domain compared with a common frontal viewpoint. We propose 1) a new feature called Intra-frame and Inter-frame Comparison Feature to combine both appearance and motion information, 2) an Enhanced Multiple Clusters Boost algorithm to co-cluster the samples of various viewpoints and discriminative features automatically and 3) a Multiple Video Sampling strategy to make the approach robust to human motion and frame rate changes. Due to the large amount of samples and features, we propose a two-stage tree structure detector, using only appearance in the 1st stage and both appearance and motion in the 2nd stage. Our approach is evaluated on some challenging Real-world scenes, PETS2007 dataset, ETHZ dataset and our own collected videos, which demonstrate the effectiveness and efficiency of our approach.
international conference on image processing | 2012
Zhifang Liu; Genquan Duan; Haizhou Ai; Takayoshi Yamashita
Adaptation of pre-trained boosted pedestrian detectors to specific scenes is an important yet difficult task in computer vision. To address this problem, a feature reselection strategy is proposed in this paper. The proposed method identifies weak classifiers which do not well adapt to the specific scene, and replaces them with retrained weak classifiers. This feature reselection strategy has the following advantages: 1) it does not need original offline training data, but only uses a few online samples from the target scene; 2) the adapted detector preserves the generality of the generic detector, resulting in very few false positives; and 3) it can adapt a generic detector to a specific scene with very fast speed due to its parallel nature. Experiments on challenging pedestrian detection datasets demonstrate that our proposed strategy can significantly improve the performance of pre-trained boosted detectors in specific scenes with very low computation cost and very little labeling work.
ieee intelligent vehicles symposium | 2012
Liwei Liu; Genquan Duan; Haizhou Ai; Shihong Lao
Vehicle detection in traffic scenes is a fundamental task for intelligent transportation system and has many practical applications as diverse as traffic monitoring, intelligent scheduling and autonomous navigation. In recent years, the number of detection approaches in monocular images has grown rapidly. However, most of them focus on detecting other objects (such as face, pedestrian, cat, dog, etc.) and also there lacks of vehicle datasets with various conditions for vehicle detection and comprehensive comparisons. To address these problems, we perform an extensive evaluation of many state-of-the-art detection approaches on vehicles. Our main contributions are: (1) we collect a large dataset of real-world vehicles in frontal/rear view with 30° ~ -30° yaw changes and 5° ~ 45° pitch changes under different weather conditions (snowy, rainy, sunny and cloudy) and illumination variations, and then (2) we evaluate six types of state-of-the-art features in Real AdaBoost framework on the adequate dataset collected by ourselves and a public dataset using the same evaluation protocol. Our study presents a fair comparison and deep analysis of these features in vehicle detection. From these experiments, we explore the characteristics of good features for vehicle detection. (3) Finally, we exploit these characteristics and propose a relatively effective and efficient detector, balancing performance, speed and memory cost which can be put into practical use.
international conference on image processing | 2011
Song Cao; Genquan Duan; Haizhou Ai
Detecting people in occlusion and articulated pose remains a big challenging problem in computer vision. To achieve a fast and accurate human detection algorithm, Node-Combined Part Detector (NCPD) Model is proposed in this paper. We make two major contributions: (1) We propose a novel method, torso-nodes combination, to integrate part detectors. (2) We adopt stable part detectors described by Associated Paring Comparison Features (APCF) and trained with Real-AdaBoost algorithm. This new human detection algorithm is not only much faster than the previous work but also maintaining competitive accuracy with the state-of-the-art human detection system. Besides, the algorithm performs better within low false alarm. For average time per image, our algorithm can achieve speedup rate of about 10x as compared with Deformable Part based Model (DPM) and over 125x as compared with Poselet Model.