Tony X. Han | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tony X. Han is active.

Explore More

Publication

Featured researches published by Tony X. Han.

international conference on computer vision | 2009

An HOG-LBP human detector with partial occlusion handling

Xiaoyu Wang; Tony X. Han; Shuicheng Yan

By combining Histograms of Oriented Gradients (HOG) and Local Binary Pattern (LBP) as the feature set, we propose a novel human detection approach capable of handling partial occlusion. Two kinds of detectors, i.e., global detector for whole scanning windows and part detectors for local regions, are learned from the training data using linear SVM. For each ambiguous scanning window, we construct an occlusion likelihood map by using the response of each block of the HOG feature to the global detector. The occlusion likelihood map is then segmented by Mean-shift approach. The segmented portion of the window with a majority of negative response is inferred as an occluded region. If partial occlusion is indicated with high likelihood in a certain scanning window, part detectors are applied on the unoccluded regions to achieve the final classification on the current scanning window. With the help of the augmented HOG-LBP feature and the global-part occlusion handling method, we achieve a detection rate of 91.3% with FPPW= 10−6, 94.7% with FPPW= 10−5, and 97.9% with FPPW= 10−4 on the INRIA dataset, which, to our best knowledge, is the best human detection performance on the INRIA dataset. The global-part occlusion handling method is further validated using synthesized occlusion data constructed from the INRIA and Pascal dataset.

international conference on computer vision | 2011

Contextual weighting for vocabulary tree based image retrieval

Xiaoyu Wang; Ming Yang; Timothee Cour; Shenghuo Zhu; Kai Yu; Tony X. Han

In this paper we address the problem of image retrieval from millions of database images. We improve the vocabulary tree based approach by introducing contextual weighting of local features in both descriptor and spatial domains. Specifically, we propose to incorporate efficient statistics of neighbor descriptors both on the vocabulary tree and in the image spatial domain into the retrieval. These contextual cues substantially enhance the discriminative power of individual local features with very small computational overhead. We have conducted extensive experiments on benchmark datasets, i.e., the UKbench, Holidays, and our new Mobile dataset, which show that our method reaches state-of-the-art performance with much less computation. Furthermore, the proposed method demonstrates excellent scalability in terms of both retrieval accuracy and efficiency on large-scale experiments using 1.26 million images from the ImageNet database as distractors.

asian conference on computer vision | 2012

Histogram of oriented normal vectors for object recognition with a depth sensor

Shuai Tang; Xiaoyu Wang; Xutao Lv; Tony X. Han; James M. Keller; Zhihai He; Marjorie Skubic; Shihong Lao

We propose a feature, the Histogram of Oriented Normal Vectors (HONV), designed specifically to capture local geometric characteristics for object recognition with a depth sensor. Through our derivation, the normal vector orientation represented as an ordered pair of azimuthal angle and zenith angle can be easily computed from the gradients of the depth image. We form the HONV as a concatenation of local histograms of azimuthal angle and zenith angle. Since the HONV is inherently the local distribution of the tangent plane orientation of an object surface, we use it as a feature for object detection/classification tasks. The object detection experiments on the standard RGB-D dataset [1] and a self-collected Chair-D dataset show that the HONV significantly outperforms traditional features such as HOG on the depth image and HOG on the intensity image, with an improvement of 11.6% in average precision. For object classification, the HONV achieved 5.0% improvement over state-of-the-art approaches.

international conference on machine learning | 2005

VACE multimodal meeting corpus

Lei Chen; R. Travis Rose; Ying Qiao; Irene Kimbara; Fey Parrill; Haleema Welji; Tony X. Han; Jilin Tu; Zhongqiang Huang; Mary P. Harper; Francis K. H. Quek; Yingen Xiong; David McNeill; Ronald F. Tuttle; Thomas S. Huang

In this paper, we report on the infrastructure we have developed to support our research on multimodal cues for understanding meetings. With our focus on multimodality, we investigate the interaction among speech, gesture, posture, and gaze in meetings. For this purpose, a high quality multimodal corpus is being produced.

european conference on computer vision | 2010

Discriminative tracking by metric learning

Xiaoyu Wang; Gang Hua; Tony X. Han

We present a discriminative model that casts appearance modeling and visual matching into a single objective for visual tracking. Most previous discriminative models for visual tracking are formulated as supervised learning of binary classifiers. The continuous output of the classification function is then utilized as the cost function for visual tracking. This may be less desirable since the function is optimized for making binary decision. Such a learning objective may make it not to be able to well capture the manifold structure of the discriminative appearances. In contrast, our unified formulation is based on a principled metric learning framework, which seeks for a discriminative embedding for appearance modeling. In our formulation, both appearance modeling and visual matching are performed online by efficient gradient based optimization. Our formulation is also able to deal with multiple targets, where the exclusive principle is naturally reinforced to handle occlusions. Its efficacy is validated in a wide variety of challenging videos. It is shown that our algorithm achieves more persistent results, when compared with previous appearance model based tracking algorithms.

computer vision and pattern recognition | 2013

Detection Evolution with Multi-order Contextual Co-occurrence

Guang Chen; Yuanyuan Ding; Jing Xiao; Tony X. Han

Context has been playing an increasingly important role to improve the object detection performance. In this paper we propose an effective representation, Multi-Order Contextual co-Occurrence (MOCO), to implicitly model the high level context using solely detection responses from a baseline object detector. The so-called (1st-order) context feature is computed as a set of randomized binary comparisons on the response map of the baseline object detector. The statistics of the 1st-order binary context features are further calculated to construct a high order co-occurrence descriptor. Combining the MOCO feature with the original image feature, we can evolve the baseline object detector to a stronger context aware detector. With the updated detector, we can continue the evolution till the contextual improvements saturate. Using the successful deformable-part-model detector [13] as the baseline detector, we test the proposed MOCO evolution framework on the PASCAL VOC 2007 dataset [8] and Caltech pedestrian dataset [7]: The proposed MOCO detector outperforms all known state-of-the-art approaches, contextually boosting deformable part models (ver. 5) [13] by 3.3% in mean average precision on the PASCAL 2007 dataset. For the Caltech pedestrian dataset, our method further reduces the log-average miss rate from 48% to 46% and the miss rate at 1 FPPI from 25% to 23%, compared with the best prior art [6].

systems man and cybernetics | 2012

Detection of Sudden Pedestrian Crossings for Driving Assistance Systems

Yanwu Xu; Dong Xu; Stephen Lin; Tony X. Han; Xianbin Cao; Xuelong Li

In this paper, we study the problem of detecting sudden pedestrian crossings to assist drivers in avoiding accidents. This application has two major requirements: to detect crossing pedestrians as early as possible just as they enter the view of the car-mounted camera and to maintain a false alarm rate as low as possible for practical purposes. Although many current sliding-window-based approaches using various features and classification algorithms have been proposed for image-/video-based pedestrian detection, their performance in terms of accuracy and processing speed falls far short of practical application requirements. To address this problem, we propose a three-level coarse-to-fine video-based framework that detects partially visible pedestrians just as they enter the camera view, with low false alarm rate and high speed. The framework is tested on a new collection of high-resolution videos captured from a moving vehicle and yields a performance better than that of state-of-the-art pedestrian detection while running at a frame rate of 55 fps.

IEEE Transactions on Circuits and Systems for Video Technology | 2009

Hierarchical Space-Time Model Enabling Efficient Search for Human Actions

Huazhong Ning; Tony X. Han; Dirk Walther; Ming Liu; Thomas S. Huang

We propose a five-layer hierarchical space-time model (HSTM) for representing and searching human actions in videos. From a features point of view, both invariance and selectivity are desirable characteristics, which seem to contradict each other. To make these characteristics coexist, we introduce a coarse-to-fine search and verification scheme for action searching, based on the HSTM model. Because going through layers of the hierarchy corresponds to progressively turning the knob between invariance and selectivity, this strategy enables search for human actions ranging from rapid movements of sports to subtle motions of facial expressions. The introduction of the Histogram of Gabor Orientations feature makes the searching for actions go smoothly across the hierarchical layers of the HSTM model. The efficient matching is achieved by applying integral histograms to compute the features in the top two layers. The HSTM model was tested on three selected challenging video sequences and on the KTH human action database. And it achieved improvement over other state-of-the-art algorithms. These promising results validate that the HSTM model is both selective and robust for searching human actions.

computer vision and pattern recognition | 2006

Efficient Nonparametric Belief Propagation with Application to Articulated Body Tracking

Tony X. Han; Huazhong Ning; Thomas S. Huang

An efficient Nonparametric Belief Propagation (NBP) algorithm is developed in this paper. While the recently proposed nonparametric belief propagation algorithm has wide applications such as articulated tracking [22, 19], superresolution [6], stereo vision and sensor calibration [10], the hardcore of the algorithm requires repeatedly sampling from products of mixture of Gaussians, which makes the algorithm computationally very expensive. To avoid the slow sampling process, we applied mixture Gaussian density approximation by mode propagation and kernel fitting [2, 7]. The products of mixture of Gaussians are approximated accurately by just a few mode propagation and kernel fitting steps, while the sampling method (e.g. Gibbs sampler) needs many samples to achieve similar approximation results. The proposed algorithm is then applied to articulated body tracking for several scenarios. The experimental results show the robustness and the efficiency of the proposed algorithm. The proposed efficient NBP algorithm also has potentials in other applications mentioned above.

IEEE Transactions on Circuits and Systems for Video Technology | 2018

Residual Networks of Residual Networks: Multilevel Residual Networks

Ke Zhang; Miao Sun; Tony X. Han; Xingfang Yuan; Liru Guo; Tao Liu

A residual networks family with hundreds or even thousands of layers dominates major image recognition tasks, but building a network by simply stacking residual blocks inevitably limits its optimization ability. This paper proposes a novel residual network architecture, residual networks of residual networks (RoR), to dig the optimization ability of residual networks. RoR substitutes optimizing residual mapping of residual mapping for optimizing original residual mapping. In particular, RoR adds levelwise shortcut connections upon original residual networks to promote the learning capability of residual networks. More importantly, RoR can be applied to various kinds of residual networks (ResNets, Pre-ResNets, and WRN) and significantly boost their performance. Our experiments demonstrate the effectiveness and versatility of RoR, where it achieves the best performance in all residual-network-like structures. Our RoR-3-WRN58-4 + SD models achieve new state-of-the-art results on CIFAR-10, CIFAR-100, and SVHN, with the test errors of 3.77%, 19.73%, and 1.59%, respectively. RoR-3 models also achieve state-of-the-art results compared with ResNets on the ImageNet data set.

Explore More