Antonio Criminisi
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Antonio Criminisi.
european conference on computer vision | 2006
Jamie Shotton; John Winn; Carsten Rother; Antonio Criminisi
This paper proposes a new approach to learning a discriminative model of object classes, incorporating appearance, shape and context information efficiently. The learned model is used for automatic visual recognition and semantic segmentation of photographs. Our discriminative model exploits novel features, based on textons, which jointly model shape and texture. Unary classification and feature selection is achieved using shared boosting to give an efficient classifier which can be applied to a large number of classes. Accurate image segmentation is achieved by incorporating these classifiers in a conditional random field. Efficient training of the model on very large datasets is achieved by exploiting both random feature selection and piecewise training methods. High classification and segmentation accuracy are demonstrated on three different databases: i) our own 21-object class database of photographs of real objects viewed under general lighting conditions, poses and viewpoints, ii) the 7-class Corel subset and iii) the 7-class Sowerby database used in [1]. The proposed algorithm gives competitive results both for highly textured (e.g. grass, trees), highly structured (e.g. cars, faces, bikes, aeroplanes) and articulated objects (e.g. body, cow).
international conference on computer vision | 2005
John Winn; Antonio Criminisi; Thomas P. Minka
This paper presents a new algorithm for the automatic recognition of object classes from images (categorization). Compact and yet discriminative appearance-based object class models are automatically learned from a set of training images. The method is simple and extremely fast, making it suitable for many applications such as semantic image retrieval, Web search, and interactive image editing. It classifies a region according to the proportions of different visual words (clusters in feature space). The specific visual words and the typical proportions in each object are learned from a segmented training set. The main contribution of this paper is twofold: i) an optimally compact visual dictionary is learned by pair-wise merging of visual words from an initially large dictionary. The final visual words are described by GMMs. ii) A novel statistical measure of discrimination is proposed which is optimized by each merge operation. High classification accuracy is demonstrated for nine object classes on photographs of real objects viewed under general lighting conditions, poses and viewpoints. The set of test images used for validation comprise: i) photographs acquired by us, ii) images from the Web and iii) images from the recently released Pascal dataset. The proposed algorithm performs well on both texture-rich objects (e.g. grass, sky, trees) and structure-rich ones (e.g. cars, bikes, planes)
international conference on computer vision | 1999
Antonio Criminisi; Ian D. Reid; Andrew Zisserman
We describe how 3D affine measurements may be computed from a single perspective view of a scene given only minimal geometric information determined from the image. This minimal information is typically the vanishing line of a reference plane, and a vanishing point for a direction not parallel to the plane. It is shown that affine scene structure may then be determined from the image, without knowledge of the cameras internal calibration (e.g. focal length), nor of the explicit relation between camera and world (pose).In particular, we show how to (i) compute the distance between planes parallel to the reference plane (up to a common scale factor); (ii) compute area and length ratios on any plane parallel to the reference plane; (iii) determine the cameras location. Simple geometric derivations are given for these results. We also develop an algebraic representation which unifies the three types of measurement and, amongst other advantages, permits a first order error propagation analysis to be performed, associating an uncertainty with each measurement.We demonstrate the technique for a variety of applications, including height measurements in forensic images and 3D graphical modelling from single images.
International Journal of Computer Vision | 2000
Antonio Criminisi; Ian D. Reid; Andrew Zisserman
We describe how 3D affine measurements may be computed from a single perspective view of a scene given only minimal geometric information determined from the image. This minimal information is typically the vanishing line of a reference plane, and a vanishing point for a direction not parallel to the plane. It is shown that affine scene structure may then be determined from the image, without knowledge of the cameras internal calibration (e.g. focal length), nor of the explicit relation between camera and world (pose).In particular, we show how to (i) compute the distance between planes parallel to the reference plane (up to a common scale factor); (ii) compute area and length ratios on any plane parallel to the reference plane; (iii) determine the cameras location. Simple geometric derivations are given for these results. We also develop an algebraic representation which unifies the three types of measurement and, amongst other advantages, permits a first order error propagation analysis to be performed, associating an uncertainty with each measurement.We demonstrate the technique for a variety of applications, including height measurements in forensic images and 3D graphical modelling from single images.
IEEE Transactions on Medical Imaging | 2015
Bjoern H. Menze; András Jakab; Stefan Bauer; Jayashree Kalpathy-Cramer; Keyvan Farahani; Justin S. Kirby; Yuliya Burren; Nicole Porz; Johannes Slotboom; Roland Wiest; Levente Lanczi; Elizabeth R. Gerstner; Marc-André Weber; Tal Arbel; Brian B. Avants; Nicholas Ayache; Patricia Buendia; D. Louis Collins; Nicolas Cordier; Jason J. Corso; Antonio Criminisi; Tilak Das; Hervé Delingette; Çağatay Demiralp; Christopher R. Durst; Michel Dojat; Senan Doyle; Joana Festa; Florence Forbes; Ezequiel Geremia
In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma patients - manually annotated by up to four raters - and to 65 comparable scans generated using tumor image simulation software. Quantitative evaluations revealed considerable disagreement between the human raters in segmenting various tumor sub-regions (Dice scores in the range 74%-85%), illustrating the difficulty of this task. We found that different algorithms worked best for different sub-regions (reaching performance comparable to human inter-rater variability), but that no single algorithm ranked in the top for all sub-regions simultaneously. Fusing several good algorithms using a hierarchical majority vote yielded segmentations that consistently ranked above all individual algorithms, indicating remaining opportunities for further methodological improvements. The BRATS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource.
Foundations and Trends in Computer Graphics and Vision | 2012
Antonio Criminisi; Jamie Shotton; Ender Konukoglu
This review presents a unified, efficient model of random decision forests which can be applied to a number of machine learning, computer vision, and medical image analysis tasks. Our model extends existing forest-based techniques as it unifies classification, regression, density estimation, manifold learning, semi-supervised learning, and active learning under the same decision forest framework. This gives us the opportunity to write and optimize the core implementation only once, with application to many diverse tasks. The proposed model may be used both in a discriminative or generative way and may be applied to discrete or continuous, labeled or unlabeled data. The main contributions of this review are: (1) Proposing a unified, probabilistic and efficient model for a variety of learning tasks; (2) Demonstrating margin-maximizing properties of classification forests; (3) Discussing probabilistic regression forests in comparison with other nonlinear regression algorithms; (4) Introducing density forests for estimating probability density functions; (5) Proposing an efficient algorithm for sampling from a density forest; (6) Introducing manifold forests for nonlinear dimensionality reduction; (7) Proposing new algorithms for transductive learning and active learning. Finally, we discuss how alternatives such as random ferns and extremely randomized trees stem from our more general forest model. This document is directed at both students who wish to learn the basics of decision forests, as well as researchers interested in the new contributions. It presents both fundamental and novel concepts in a structured way, with many illustrative examples and real-world applications. Thorough comparisons with state-of-the-art algorithms such as support vector machines, boosting and Gaussian processes are presented and relative advantages and disadvantages discussed. The many synthetic examples and existing commercial applications demonstrate the validity of the proposed model and its flexibility.
Archive | 2013
Antonio Criminisi; Jamie Shotton
This practical and easy-to-follow text explores the theoretical underpinnings of decision forests, organizing the vast existing literature on the field within a new, general-purpose forest model. Topics and features: with a foreword by Prof. Y. Amit and Prof. D. Geman, recounting their participation in the development of decision forests; introduces a flexible decision forest model, capable of addressing a large and diverse set of image and video analysis tasks; investigates both the theoretical foundations and the practical implementation of decision forests; discusses the use of decision forests for such tasks as classification, regression, density estimation, manifold learning, active learning and semi-supervised classification; includes exercises and experiments throughout the text, with solutions, slides, demo videos and other supplementary material provided at an associated website; provides a free, user-friendly software library, enabling the reader to experiment with forests in a hands-on manner.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013
Jamie Shotton; Ross B. Girshick; Andrew W. Fitzgibbon; Toby Sharp; Mat Cook; Mark J. Finocchio; Richard Moore; Pushmeet Kohli; Antonio Criminisi; Alex Aben-Athar Kipman; Andrew Blake
We describe two new approaches to human pose estimation. Both can quickly and accurately predict the 3D positions of body joints from a single depth image without using any temporal information. The key to both approaches is the use of a large, realistic, and highly varied synthetic set of training images. This allows us to learn models that are largely invariant to factors such as pose, body shape, field-of-view cropping, and clothing. Our first approach employs an intermediate body parts representation, designed so that an accurate per-pixel classification of the parts will localize the joints of the body. The second approach instead directly regresses the positions of body joints. By using simple depth pixel comparison features and parallelizable decision forests, both approaches can run super-real time on consumer hardware. Our evaluation investigates many aspects of our methods, and compares the approaches to each other and to the state of the art. Results on silhouettes suggest broader applicability to other imaging modalities.
international conference on computer vision | 2011
Ross B. Girshick; Jamie Shotton; Pushmeet Kohli; Antonio Criminisi; Andrew W. Fitzgibbon
We present a new approach to general-activity human pose estimation from depth images, building on Hough forests. We extend existing techniques in several ways: real time prediction of multiple 3D joints, explicit learning of voting weights, vote compression to allow larger training sets, and a comparison of several decision-tree training objectives. Key aspects of our work include: regression directly from the raw depth image, without the use of an arbitrary intermediate representation; applicability to general motions (not constrained to particular activities) and the ability to localize occluded as well as visible body joints. Experimental results demonstrate that our method produces state of the art results on several data sets including the challenging MSRC-5000 pose estimation test set, at a speed of about 200 frames per second. Results on silhouettes suggest broader applicability to other imaging modalities.
computer vision and pattern recognition | 2006
Antonio Criminisi; Geoffrey Cross; Andrew Blake; Vladimir Kolmogorov
This paper presents an algorithm capable of real-time separation of foreground from background in monocular video sequences. Automatic segmentation of layers from colour/contrast or from motion alone is known to be error-prone. Here motion, colour and contrast cues are probabilistically fused together with spatial and temporal priors to infer layers accurately and efficiently. Central to our algorithm is the fact that pixel velocities are not needed, thus removing the need for optical flow estimation, with its tendency to error and computational expense. Instead, an efficient motion vs nonmotion classifier is trained to operate directly and jointly on intensity-change and contrast. Its output is then fused with colour information. The prior on segmentation is represented by a second order, temporal, Hidden Markov Model, together with a spatial MRF favouring coherence except where contrast is high. Finally, accurate layer segmentation and explicit occlusion detection are efficiently achieved by binary graph cut. The segmentation accuracy of the proposed algorithm is quantitatively evaluated with respect to existing groundtruth data and found to be comparable to the accuracy of a state of the art stereo segmentation algorithm. Foreground/ background segmentation is demonstrated in the application of live background substitution and shown to generate convincingly good quality composite video.