Paul Sturgess | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul Sturgess is active.

Explore More

Publication

Featured researches published by Paul Sturgess.

european conference on computer vision | 2010

What, where and how many? combining object detectors and CRFs

L'ubor Ladický; Paul Sturgess; Karteek Alahari; Chris Russell; Philip H. S. Torr

Computer vision algorithms for individual tasks such as object recognition, detection and segmentation have shown impressive results in the recent past. The next challenge is to integrate all these algorithms and address the problem of scene understanding. This paper is a step towards this goal. We present a probabilistic framework for reasoning about regions, objects, and their attributes such as object class, location, and spatial extent. Our model is a Conditional Random Field defined on pixels, segments and objects. We define a global energy function for the model, which combines results from sliding window detectors, and low-level pixel-based unary and pairwise relations. One of our primary contributions is to show that this energy function can be solved efficiently. Experimental results show that our model achieves significant improvement over the baseline methods on CamVid and PASCAL VOC datasets.

International Journal of Computer Vision | 2012

Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction

Lubor Ladický; Paul Sturgess; Chris Russell; Sunando Sengupta; Yalin Bastanlar; William Clocksin; Philip H. S. Torr

The problems of dense stereo reconstruction and object class segmentation can both be formulated as Random Field labeling problems, in which every pixel in the image is assigned a label corresponding to either its disparity, or an object class such as road or building. While these two problems are mutually informative, no attempt has been made to jointly optimize their labelings. In this work we provide a flexible framework configured via cross-validation that unifies the two problems and demonstrate that, by resolving ambiguities, which would be present in real world data if the two problems were considered separately, joint optimization of the two problems substantially improves performance. To evaluate our method, we augment the Leuven data set (http://cms.brookes.ac.uk/research/visiongroup/files/Leuven.zip), which is a stereo video shot from a car driving around the streets of Leuven, with 70 hand labeled object class and disparity maps. We hope that the release of these annotations will stimulate further work in the challenging domain of street-view analysis. Complete source code is publicly available (http://cms.brookes.ac.uk/staff/Philip-Torr/ale.htm).

intelligent robots and systems | 2012

Automatic dense visual semantic mapping from street-level imagery

Sunando Sengupta; Paul Sturgess; Lubor Ladicky; Philip H. S. Torr

This paper describes a method for producing a semantic map from multi-view street-level imagery. We define a semantic map as an overhead, or birds eye view of a region with associated semantic object labels, such as car, road and pavement. We formulate the problem using two conditional random fields. The first is used to model the semantic image segmentation of the street view imagery treating each image independently. The outputs of this stage are then aggregated over many images to form the input for our semantic map that is a second random field defined over a ground plane. Each image is related by a simple, yet effective, geometrical function that back projects a region from the street view image into the overhead ground plane map. We introduce, and make publicly available, a new dataset created from real world data. Our qualitative evaluation is performed on this data consisting of a 14.8 km track, and we also quantify our results on a representative subset.

ACM Transactions on Graphics | 2014

ImageSpirit: Verbal Guided Image Parsing

Ming-Ming Cheng; Shuai Zheng; Wen-Yan Lin; Vibhav Vineet; Paul Sturgess; Nigel Crook; Niloy J. Mitra; Philip H. S. Torr

Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixels. In this article we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interest enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g., smartphones, Google Glass, livingroom devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the trade-offs compared to traditional mouse-based interactions, results are reported for both a large-scale quantitative evaluation and a user study.

ieee international conference on automatic face gesture recognition | 2013

Approximate structured output learning for Constrained Local Models with application to real-time facial feature detection and tracking on low-power devices

Shuai Zheng; Paul Sturgess; Philip H. S. Torr

Given a face detection, facial feature detection involves localizing the facial landmarks such as eyes, nose, mouth. Within this paper we examine the learning of the appearance model in Constrained Local Models (CLM) technique. We have two contributions: firstly we examine an approximate method for doing structured learning, which jointly learns all the appearances of the landmarks. Even though this method has no guarantee of optimality we find it performs better than training the appearance models independently. This also allows for efficiently online learning of a particular instance of a face. Secondly we use a binary approximation of our learnt model that when combined with binary features, leads to efficient inference at runtime using bitwise AND operations. We quantify the generalization performance of our approximate SO-CLM, by training the model parameters on a single dataset, and testing on a total of five unseen benchmarks. The speed at runtime is demonstrated on the ipad2 platform. Our results clearly show that our proposed system runs in real-time, yet still performs at state-of-the-art levels of accuracy.

international conference on robotics and automation | 2015

Semantic octree: Unifying recognition, reconstruction and representation via an octree constrained higher order MRF

Sunando Sengupta; Paul Sturgess

On the one hand, mainly within the computer vision community, multi-resolution image labelling problems with pixel, super-pixel and object levels, have made great progress towards the modelling of holistic scene understanding. On the other hand, mainly within the robotics and graphics communities, multi-resolution 3D representations of the world have matured to be efficient and accurate. In this paper we bring together the two hands and move towards the new direction of unified recognition, reconstruction and representation. We tackle the problem by embedding an octree into a hierarchical robust PN Markov Random Field. This allows us to jointly infer the multi-resolution 3D volume along with the object-class labels, all within the constraints of an octree data-structure. The octree representation is chosen as this data-structure is efficient for further processing such as dynamic updates, data compression, and surface reconstruction. We perform experiments in inferring our semantic octree on the The kitti Vision Benchmark Suite in order to demonstrate its efficacy.

british machine vision conference | 2012

Scalable Cascade Inference for Semantic Image Segmentation.

Paul Sturgess; Lubor Ladicky; Nigel Crook; Philip H. S. Torr

Semantic image segmentation is a problem of simultaneous segmentation and recognition of an input image into regions and their associated categorical labels, such as person, car or cow. A popular way to achieve this goal is to assign a label to every pixel in the input image and impose simple structural constraints on the output label space. Efficient approximation algorithms for solving this labelling problem such as α-expansion have, at best, linear runtime complexity with respect to the number of labels, making them practical only when working in a specific domain that has few classes-of-interest. However when working in a more general setting where the number of classes could easily reach tens of thousands, sub-linear complexity is desired. In this paper we propose meeting this requirement by performing cascaded inference that wraps around the α-expansion algorithm. The cascade both divides the large label set into smaller more manageable ones by way of a hierarchy, and dynamically subdivides the image into smaller and smaller regions during inference. We test our method on the SUN09 dataset with 107 accurately hand labelled classes.

computer vision and pattern recognition | 2012

Efficient discriminative learning of parametric nearest neighbor classifiers

Ziming Zhang; Paul Sturgess; Sunando Sengupta; Nigel Crook; Philip H. S. Torr

Linear SVMs are efficient in both training and testing, however the data in real applications is rarely linearly separable. Non-linear kernel SVMs are too computationally intensive for applications with large-scale data sets. Recently locally linear classifiers have gained popularity due to their efficiency whilst remaining competitive with kernel methods. The vanilla nearest neighbor algorithm is one of the simplest locally linear classifiers, but it lacks robustness due to the noise often present in real-world data. In this paper, we introduce a novel local classifier, Parametric Nearest Neighbor (P-NN) and its extension Ensemble of P-NN (EP-NN). We parameterize the nearest neighbor algorithm based on the minimum weighted squared Euclidean distances between the data points and the prototypes, where a prototype is represented by a locally linear combination of some data points. Meanwhile, our method attempts to jointly learn both the prototypes and the classifier parameters discriminatively via max-margin. This makes our classifiers suitable to approximate the classification decision boundaries locally based on nonlinear functions. During testing, the computational complexity of both classifiers is linear in the product of the dimension of data and the number of prototypes. Our classification results on MNIST, USPS, LETTER, and Chars 74K are comparable and in some cases are better than many other methods such as the state-of-the-art locally linear classifiers.

british machine vision conference | 2012

Improved Initialization and Gaussian Mixture Pairwise Terms for Dense Random Fields with Mean-field Inference.

Vibhav Vineet; Jonathan Warrell; Paul Sturgess; Philip H. S. Torr

Recently, Krahenbuhl and Koltun proposed an efficient inference method for densely connected pairwise random fields using the mean-field approximation for a Conditional Random Field (CRF). However, they restrict their pairwise weights to take the form of a weighted combination of Gaussian kernels where each Gaussian component is allowed to take only zero mean, and can only be rescaled by a single value for each label pair. Further, their method is sensitive to initialization. In this paper, we propose methods to alleviate these issues. First, we propose a hierarchical mean-field approach where labelling from the coarser level is propagated to the finer level for better initialisation. Further, we use SIFT-flow based label transfer to provide a good initial condition at the coarsest level. Second, we allow our approach to take general Gaussian pairwise weights, where we learn the mean, the co-variance matrix, and the mixing co-efficient for every mixture component. We propose a variation of Expectation Maximization (EM) for piecewise learning of the parameters of the mixture model determined by the maximum likelihood function. Finally, we demonstrate the efficiency and accuracy offered by our method for object class segmentation problems on two challenging datasets: PascalVOC-10 segmentation and CamVid datasets. We show that we are able to achieve state of the art performance on the CamVid dataset, and an almost 3% improvement on the PascalVOC10 dataset compared to baseline graph-cut and mean-field methods, while also reducing the inference time by almost a factor of 3 compared to graph-cuts based methods.

british machine vision conference | 2010