Roberto Cipolla | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roberto Cipolla is active.

Explore More

Publication

Featured researches published by Roberto Cipolla.

computer vision and pattern recognition | 2008

Semantic texton forests for image categorization and segmentation

Jamie Shotton; Matthew Johnson; Roberto Cipolla

We propose semantic texton forests, efficient and powerful new low-level features. These are ensembles of decision trees that act directly on image pixels, and therefore do not need the expensive computation of filter-bank responses or local descriptors. They are extremely fast to both train and test, especially compared with k-means clustering and nearest-neighbor assignment of feature descriptors. The nodes in the trees provide (i) an implicit hierarchical clustering into semantic textons, and (ii) an explicit local classification estimate. Our second contribution, the bag of semantic textons, combines a histogram of semantic textons over an image region with a region prior category distribution. The bag of semantic textons is computed over the whole image for categorization, and over local rectangular regions for segmentation. Including both histogram and region prior allows our segmentation algorithm to exploit both textural and semantic context. Our third contribution is an image-level prior for segmentation that emphasizes those categories that the automatic categorization believes to be present. We evaluate on two datasets including the very challenging VOC 2007 segmentation dataset. Our results significantly advance the state-of-the-art in segmentation accuracy, and furthermore, our use of efficient decision forests gives at least a five-fold increase in execution speed.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2002

Real-time visual tracking of complex structures

Tom Drummond; Roberto Cipolla

Presents a framework for three-dimensional model-based tracking. Graphical rendering technology is combined with constrained active contour tracking to create a robust wire-frame tracking system. It operates in real time at video frame rate (25 Hz) on standard hardware. It is based on an internal CAD model of the object to be tracked which is rendered using a binary space partition tree to perform hidden line removal. A Lie group formalism is used to cast the motion computation problem into simple geometric terms so that tracking becomes a simple optimization problem solved by means of iterative reweighted least squares. A visual servoing system constructed using this framework is presented together with results showing the accuracy of the tracker. The paper then describes how this tracking system has been extended to provide a general framework for tracking in complex configurations. The adjoint representation of the group is used to transform measurements into common coordinate frames. The constraints are then imposed by means of Lagrange multipliers. Results from a number of experiments performed using this framework are presented and discussed.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2007

Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations

Tae-Kyun Kim; Josef Kittler; Roberto Cipolla

We address the problem of comparing sets of images for object recognition, where the sets may represent variations in an objects appearance due to changing camera pose and lighting conditions. canonical correlations (also known as principal or canonical angles), which can be thought of as the angles between two d-dimensional subspaces, have recently attracted attention for image set matching. Canonical correlations offer many benefits in accuracy, efficiency, and robustness compared to the two main classical methods: parametric distribution-based and nonparametric sample-based matching of sets. Here, this is first demonstrated experimentally for reasonably sized data sets using existing methods exploiting canonical correlations. Motivated by their proven effectiveness, a novel discriminative learning method over sets is proposed for set classification. Specifically, inspired by classical linear discriminant analysis (LDA), we develop a linear discriminant function that maximizes the canonical correlations of within-class sets and minimizes the canonical correlations of between-class sets. Image sets transformed by the discriminant function are then compared by the canonical correlations. Classical orthogonal subspace method (OSM) is also investigated for the similar purpose and compared with the proposed method. The proposed method is evaluated on various object recognition problems using face image sets with arbitrary motion captured under different illuminations and image sets of 500 general objects taken at different views. The method is also applied to object category recognition using ETH-80 database. The proposed method is shown to outperform the state-of-the-art methods in terms of accuracy and efficiency

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2006

Model-based hand tracking using a hierarchical Bayesian filter

Björn Stenger; Arasanathan Thayananthan; Philip H. S. Torr; Roberto Cipolla

This paper sets out a tracking framework, which is applied to the recovery of three-dimensional hand motion from an image sequence. The method handles the issues of initialization, tracking, and recovery in a unified way. In a single input image with no prior information of the hand pose, the algorithm is equivalent to a hierarchical detection scheme, where unlikely pose candidates are rapidly discarded. In image sequences, a dynamic model is used to guide the search and approximate the optimal filtering equations. A dynamic model is given by transition probabilities between regions in parameter space and is learned from training data obtained by capturing articulated motion. The algorithm is evaluated on a number of image sequences, which include hand motion with self-occlusion in front of a cluttered background

european conference on computer vision | 2008

Segmentation and Recognition Using Structure from Motion Point Clouds

Gabriel J. Brostow; Jamie Shotton; Julien Fauqueur; Roberto Cipolla

We propose an algorithm for semantic segmentation based on 3D point clouds derived from ego-motion. We motivate five simple cues designed to model specific patterns of motion and 3D world structure that vary with object category. We introduce features that project the 3D cues back to the 2D image plane while modeling spatial layout and context. A randomized decision forest combines many such features to achieve a coherent 2D segmentation and recognize the object categories present. Our main contribution is to show how semantic segmentation is possible based solely on motion-derived 3D world structure. Our method works well on sparse, noisy point clouds, and unlike existing approaches, does not need appearance-based descriptors. Experiments were performed on a challenging new video database containing sequences filmed from a moving car in daylight and at dusk. The results confirm that indeed, accurate segmentation and recognition are possible using only motion and 3D world structure. Further, we show that the motion-derived information complements an existing state-of-the-art appearance-based method, improving both qualitative and quantitative performance.

Image and Vision Computing | 1997

Feature-based human face detection

Kin Choong Yow; Roberto Cipolla

Human face detection has always been an important problem for face, expression and gesture recognition. Though numerous attempts have been made to detect and localize faces, these approaches have made assumptions that restrict their extension to more general cases. We identify that the key factor in a generic and robust system is that of using a large amount of image evidence, related and reinforced by model knowledge through a probabilistic framework. In this paper, we propose a feature-based algorithm for detecting faces that is sufficiently generic and is also easily extensible to cope with more demanding variations of the imaging conditions. The algorithm detects feature points from the image using spatial filters and groups them into face candidates using geometric and gray level constraints. A probabilistic framework is then used to reinforce probabilities and to evaluate the likelihood of the candidate as a face. We provide results to support the validity of the approach and demonstrate its capability to detect faces under different scale, orientation and viewpoint.

computer vision and pattern recognition | 2006

Unsupervised Bayesian Detection of Independent Motion in Crowds

Gabriel J. Brostow; Roberto Cipolla

While crowds of various subjects may offer applicationspecific cues to detect individuals, we demonstrate that for the general case, motion itself contains more information than previously exploited. This paper describes an unsupervised data driven Bayesian clustering algorithm which has detection of individual entities as its primary goal. We track simple image features and probabilistically group them into clusters representing independently moving entities. The numbers of clusters and the grouping of constituent features are determined without supervised learning or any subject-specific model. The new approach is instead, that space-time proximity and trajectory coherence through image space are used as the only probabilistic criteria for clustering. An important contribution of this work is how these criteria are used to perform a one-shot data association without iterating through combinatorial hypotheses of cluster assignments. Our proposed general detection algorithm can be augmented with subject-specific filtering, but is shown to already be effective at detecting individual entities in crowds of people, insects, and animals. This paper and the associated video examine the implementation and experiments of our motion clustering framework.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008

Multiscale Categorical Object Recognition Using Contour Fragments

Jamie Shotton; Andrew P. Blake; Roberto Cipolla

Psychophysical studies show that we can recognize objects using fragments of outline contour alone. This paper proposes a new automatic visual recognition system based only on local contour features, capable of localizing objects in space and scale. The system first builds a class-specific codebook of local fragments of contour using a novel formulation of chamfer matching. These local fragments allow recognition that is robust to within-class variation, pose changes, and articulation. Boosting combines these fragments into a cascaded sliding-window classifier, and mean shift is used to select strong responses as a final set of detection. We show how learning can be performed iteratively on both training and test sets to bootstrap an improved classifier. We compare with other methods based on contour and local descriptors in our detailed evaluation over 17 challenging categories and obtain highly competitive results. The results confirm that contour is indeed a powerful cue for multiscale and multiclass visual object recognition.

computer vision and pattern recognition | 2003

Shape context and chamfer matching in cluttered scenes

Arasanathan Thayananthan; Bjoern Stenger; Philip H. S. Torr; Roberto Cipolla

This paper compares two methods for object localization from contours: shape context and chamfer matching of templates. In the light of our experiments, we suggest improvements to the shape context: shape contexts are used to find corresponding features between model and image. In real images it is shown that the shape context is highly influenced by clutters; furthermore, even when the object is correctly localized, the feature correspondence may be poor. We show that the robustness of shape matching can be increased by including a figural continuity constraint. The combined shape and continuity cost is minimized using the Viterbi algorithm on features, resulting in improved localization and correspondence. Our algorithm can be generally applied to any feature based shape matching method. Chamfer matching correlates model templates with the distance transform of the edge image. This can be done efficiently using a coarse-to-fine search over the transformation parameters. The method is robust in clutter, however, multiple templates are needed to handle scale, rotation and shape variation. We compare both methods for locating hand shapes in cluttered images, and applied to word recognition in EZ-Gimpy images.

international conference on computer vision | 2005

Contour-based learning for object detection

Jamie Shotton; Andrew Blake; Roberto Cipolla

We present a novel categorical object detection scheme that uses only local contour-based features. A two-stage, partially supervised learning architecture is proposed: a rudimentary detector is learned from a very small set of segmented images and applied to a larger training set of un-segmented images; the second stage bootstraps these detections to learn an improved classifier while explicitly training against clutter. The detectors are learned with a boosting algorithm which creates a location-sensitive classifier using a discriminative set of features from a randomly chosen dictionary of contour fragments. We present results that are very competitive with other state-of-the-art object detection schemes and show robustness to object articulations, clutter, and occlusion. Our major contributions are the application of boosted local contour-based features for object detection in a partially supervised learning framework, and an efficient new boosting procedure for simultaneously selecting features and estimating per-feature parameters.

Explore More