Antoni B. Chan
City University of Hong Kong
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Antoni B. Chan.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2007
Gustavo Carneiro; Antoni B. Chan; Pedro J. Moreno; Nuno Vasconcelos
A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning
computer vision and pattern recognition | 2008
Antoni B. Chan; Zhang-Sheng John Liang; Nuno Vasconcelos
We present a privacy-preserving system for estimating the size of inhomogeneous crowds, composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking. First, the crowd is segmented into components of homogeneous motion, using the mixture of dynamic textures motion model. Second, a set of simple holistic features is extracted from each segmented region, and the correspondence between features and the number of people per segment is learned with Gaussian process regression. We validate both the crowd segmentation algorithm, and the crowd counting system, on a large pedestrian dataset (2000 frames of video, containing 49,885 total pedestrian instances). Finally, we present results of the system running on a full hour of video.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008
Antoni B. Chan; Nuno Vasconcelos
A dynamic texture is a spatio-temporal generative model for video, which represents video sequences as observations from a linear dynamical system. This work studies the mixture of dynamic textures, a statistical model for an ensemble of video sequences that is sampled from a finite collection of visual processes, each of which is a dynamic texture. An expectation-maximization (EM) algorithm is derived for learning the parameters of the model, and the model is related to previous works in linear systems, machine learning, time- series clustering, control theory, and computer vision. Through experimentation, it is shown that the mixture of dynamic textures is a suitable representation for both the appearance and dynamics of a variety of visual processes that have traditionally been challenging for computer vision (for example, fire, steam, water, vehicle and pedestrian traffic, and so forth). When compared with state-of-the-art methods in motion segmentation, including both temporal texture methods and traditional representations (for example, optical flow or other localized motion representations), the mixture of dynamic textures achieves superior performance in the problems of clustering and segmenting video of such processes.
IEEE Transactions on Image Processing | 2012
Antoni B. Chan; Nuno Vasconcelos
An approach to the problem of estimating the size of inhomogeneous crowds, which are composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking is proposed. Instead, the crowd is segmented into components of homogeneous motion, using the mixture of dynamic-texture motion model. A set of holistic low-level features is extracted from each segmented region, and a function that maps features into estimates of the number of people per segment is learned with Bayesian regression. Two Bayesian regression models are examined. The first is a combination of Gaussian process regression with a compound kernel, which accounts for both the global and local trends of the count mapping but is limited by the real-valued outputs that do not match the discrete counts. We address this limitation with a second model, which is based on a Bayesian treatment of Poisson regression that introduces a prior distribution on the linear weights of the model. Since exact inference is analytically intractable, a closed-form approximation is derived that is computationally efficient and kernelizable, enabling the representation of nonlinear functions. An approximate marginal likelihood is also derived for kernel hyperparameter learning. The two regression-based crowd counting methods are evaluated on a large pedestrian data set, containing very distinct camera views, pedestrian traffic, and outliers, such as bikes or skateboarders. Experimental results show that regression-based counts are accurate regardless of the crowd size, outperforming the count estimates produced by state-of-the-art pedestrian detectors. Results on 2 h of video demonstrate the efficiency and robustness of the regression-based crowd size estimation over long periods of time.
IEEE Transactions on Medical Imaging | 2006
Anthony P. Reeves; Antoni B. Chan; David F. Yankelevitz; Claudia I. Henschke; Bryan Kressler; William J. Kostis
The pulmonary nodule is the most common manifestation of lung cancer, the most deadly of all cancers. Most small pulmonary nodules are benign, however, and currently the growth rate of the nodule provides for one of the most accurate noninvasive methods of determining malignancy. In this paper, we present methods for measuring the change in nodule size from two computed tomography image scans recorded at different times; from this size change the growth rate may be established. The impact of partial voxels for small nodules is evaluated and isotropic resampling is shown to improve measurement accuracy. Methods for nodule location and sizing, pleural segmentation, adaptive thresholding, image registration, and knowledge-based shape matching are presented. The latter three techniques provide for a significant improvement in volume change measurement accuracy by considering both image scans simultaneously. Improvements in segmentation are evaluated by measuring volume changes in benign or slow growing nodules. In the analysis of 50 nodules, the variance in percent volume change was reduced from 11.54% to 9.35% (p=0.03) through the use of registration, adaptive thresholding, and knowledge-based shape matching.
computer vision and pattern recognition | 2005
Antoni B. Chan; Nuno Vasconcelos
We present a framework for the classification of visual processes that are best modeled with spatio-temporal autoregressive models. The new framework combines the modeling power of a family of models known as dynamic textures and the generalization guarantees, for classification, of the support vector machine classifier. This combination is achieved by the derivation of a new probabilistic kernel based on the Kullback-Leibier divergence (KL) between Gauss-Markov processes. In particular, we derive the KL-kernel for dynamic textures in both 1) the image space, which describes both the motion and appearance components of the spatio-temporal process, and 2) the hidden state space, which describes the temporal component alone. Together, the two kernels cover a large variety of video classification problems, including the cases where classes can differ in both appearance and motion and the cases where appearance is similar for all classes and only motion is discriminant. Experimental evaluation on two databases shows that the new classifier achieves superior performance over existing solutions.
computer vision and pattern recognition | 2007
Antoni B. Chan; Nuno Vasconcelos
The dynamic texture is a stochastic video model that treats the video as a sample from a linear dynamical system. The simple model has been shown to be surprisingly useful in domains such as video synthesis, video segmentation, and video classification. However, one major disadvantage of the dynamic texture is that it can only model video where the motion is smooth, i.e. video textures where the pixel values change smoothly. In this work, we propose an extension of the dynamic texture to address this issue. Instead of learning a linear observation function with PCA, we learn a non-linear observation function using kernel-PCA. The resulting kernel dynamic texture is capable of modeling a wider range of video motion, such as chaotic motion (e.g. turbulent water) or camera motion (e.g. panning). We derive the necessary steps to compute the Martin distance between kernel dynamic textures, and then validate the new model through classification experiments on video containing camera motion.
international conference on computer vision | 2009
Antoni B. Chan; Nuno Vasconcelos
Poisson regression models the noisy output of a counting function as a Poisson random variable, with a log-mean parameter that is a linear function of the input vector. In this work, we analyze Poisson regression in a Bayesian setting, by introducing a prior distribution on the weights of the linear function. Since exact inference is analytically unobtainable, we derive a closed-form approximation to the predictive distribution of the model. We show that the predictive distribution can be kernelized, enabling the representation of non-linear log-mean functions. We also derive an approximate marginal likelihood that can be optimized to learn the hyperparameters of the kernel. We then relate the proposed approximate Bayesian Poisson regression to Gaussian processes. Finally, we present experimental results using Bayesian Poisson regression for crowd counting from low-level features.
asian conference on computer vision | 2014
Sijin Li; Antoni B. Chan
In this paper, we propose a deep convolutional neural network for 3D human pose estimation from monocular images. We train the network using two strategies: (1) a multi-task framework that jointly trains pose regression and body part detectors; (2) a pre-training strategy where the pose regressor is initialized using a network trained for body part detection. We compare our network on a large data set and achieve significant improvement over baseline methods. Human pose estimation is a structured prediction problem, i.e., the locations of each body part are highly correlated. Although we do not add constraints about the correlations between body parts to the network, we empirically show that the network has disentangled the dependencies among different body parts, and learned their correlations.
international conference on computer vision | 2005
Antoni B. Chan; Nuno Vasconcelos
A dynamic texture is a linear dynamical system used to model a single video as a sample from a spatio-temporal stochastic process. In this work, we introduce the mixture of dynamic textures, which models a collection of videos consisting of different visual processes as samples from a set of dynamic textures. We derive the EM algorithm for learning a mixture of dynamic textures, and relate the learning algorithm and the dynamic texture mixture model to previous works. Finally, we demonstrate the applicability of the proposed model to problems that have traditionally been challenging for computer vision.