Peter Kontschieder
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter Kontschieder.
international conference on computer vision | 2011
Peter Kontschieder; Samuel Rota Bulò; Horst Bischof; Marcello Pelillo
In this paper we propose a simple and effective way to integrate structural information in random forests for semantic image labelling. By structural information we refer to the inherently available, topological distribution of object classes in a given image. Different object class labels will not be randomly distributed over an image but usually form coherently labelled regions. In this work we provide a way to incorporate this topological information in the popular random forest framework for performing low-level, unary classification. Our paper has several contributions: First, we show how random forests can be augmented with structured label information. In the second part, we introduce a novel data splitting function that exploits the joint distributions observed in the structured label space for learning typical label transitions between object classes. Finally, we provide two possibilities for integrating the structured output predictions into concise, semantic labellings. In our experiments on the challenging MSRC and CamVid databases, we compare our method to standard random forest and conditional random field classification results.
asian conference on computer vision | 2009
Peter Kontschieder; Michael Donoser; Horst Bischof
This paper considers two major applications of shape matching algorithms: (a) query-by-example, i e retrieving the most similar shapes from a database and (b) finding clusters of shapes, each represented by a single prototype Our approach goes beyond pairwise shape similarity analysis by considering the underlying structure of the shape manifold, which is estimated from the shape similarity scores between all the shapes within a database We propose a modified mutual kNN graph as the underlying representation and demonstrate its performance for the task of shape retrieval We further describe an efficient, unsupervised clustering method which uses the modified mutual kNN graph for initialization Experimental evaluation proves the applicability of our method, e g by achieving the highest ever reported retrieval score of 93.40% on the well known MPEG-7 database.
international conference on computer vision | 2015
Peter Kontschieder; Madalina Fiterau; Antonio Criminisi; Samuel Rota Bulò
We present Deep Neural Decision Forests - a novel approach that unifies classification trees with the representation learning functionality known from deep convolutional networks, by training them in an end-to-end manner. To combine these two worlds, we introduce a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network. Our model differs from conventional deep networks because a decision forest provides the final predictions and it differs from conventional decision forests since we propose a principled, joint and global optimization of split and leaf node parameters. We show experimental results on benchmark machine learning datasets like MNIST and ImageNet and find on-par or superior results when compared to state-of-the-art deep models. Most remarkably, we obtain Top5-Errors of only 7.84%/6.38% on ImageNet validation data when integrating our forests in a single-crop, single/seven model GoogLeNet architecture, respectively. Thus, even without any form of training data set augmentation we are improving on the 6.67% error obtained by the best GoogLeNet architecture (7 models, 144 crops).
computer vision and pattern recognition | 2013
Peter Kontschieder; Pushmeet Kohli; Jamie Shotton; Antonio Criminisi
Conventional decision forest based methods for image labelling tasks like object segmentation make predictions for each variable (pixel) independently [3, 5, 8]. This prevents them from enforcing dependencies between variables and translates into locally inconsistent pixel labellings. Random field models, instead, encourage spatial consistency of labels at increased computational expense. This paper presents a new and efficient forest based model that achieves spatially consistent semantic image segmentation by encoding variable dependencies directly in the feature space the forests operate on. Such correlations are captured via new long-range, soft connectivity features, computed via generalized geodesic distance transforms. Our model can be thought of as a generalization of the successful Semantic Texton Forest, Auto-Context, and Entangled Forest models. A second contribution is to show the connection between the typical Conditional Random Field (CRF) energy and the forest training objective. This analysis yields a new objective for training decision forests that encourages more accurate structured prediction. Our GeoF model is validated quantitatively on the task of semantic image segmentation, on four challenging and very diverse image datasets. GeoF outperforms both state of-the-art forest models and the conventional pair wise CRF.
computer vision and pattern recognition | 2014
Samuel Rota Bulò; Peter Kontschieder
In this work we present Neural Decision Forests, a novel approach to jointly tackle data representation- and discriminative learning within randomized decision trees. Recent advances of deep learning architectures demonstrate the power of embedding representation learning within the classifier -- An idea that is intuitively supported by the hierarchical nature of the decision forest model where the input space is typically left unchanged during training and testing. We bridge this gap by introducing randomized Multi- Layer Perceptrons (rMLP) as new split nodes which are capable of learning non-linear, data-specific representations and taking advantage of them by finding optimal predictions for the emerging child nodes. To prevent overfitting, we i) randomly select the image data fed to the input layer, ii) automatically adapt the rMLP topology to meet the complexity of the data arriving at the node and iii) introduce an l1-norm based regularization that additionally sparsifies the network. The key findings in our experiments on three different semantic image labelling datasets are consistently improved results and significantly compressed trees compared to conventional classification trees.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014
Peter Kontschieder; Samuel Rota Bulò; Marcello Pelillo; Horst Bischof
Ensembles of randomized decision trees, known as Random Forests, have become a valuable machine learning tool for addressing many computer vision problems. Despite their popularity, few works have tried to exploit contextual and structural information in random forests in order to improve their performance. In this paper, we propose a simple and effective way to integrate contextual information in random forests, which is typically reflected in the structured output space of complex problems like semantic image labelling. Our paper has several contributions: We show how random forests can be augmented with structured label information and be used to deliver structured low-level predictions. The learning task is carried out by employing a novel split function evaluation criterion that exploits the joint distribution observed in the structured label space. This allows the forest to learn typical label transitions between object classes and avoid locally implausible label configurations. We provide two approaches for integrating the structured output predictions obtained at a local level from the forest into a concise, global, semantic labelling. We integrate our new ideas also in the Hough-forest framework with the view of exploiting contextual information at the classification level to improve the performance on the task of object detection. Finally, we provide experimental evidence for the effectiveness of our approach on different tasks: Semantic image labelling on the challenging MSRCv2 and CamVid databases, reconstruction of occluded handwritten Chinese characters on the Kaist database and pedestrian detection on the TU Darmstadt databases.
medical image computing and computer assisted intervention | 2014
Peter Kontschieder; Jonas F. Dorn; Cecily Morrison; Robert Corish; Darko Zikic; Abigail Sellen; Marcus D’Souza; Christian P. Kamm; Jessica Burggraaff; Prejaas Tewarie; Thomas Vogel; Michela Azzarito; Ben Glocker; Peter Chin; Frank Dahlke; C.H. Polman; Ludwig Kappos; Bernard M. J. Uitdehaag; Antonio Criminisi
This paper presents new learning-based techniques for measuring disease progression in Multiple Sclerosis (MS) patients. Our system aims to augment conventional neurological examinations by adding quantitative evidence of disease progression. An off-the-shelf depth camera is used to image the patient at the examination, during which he/she is asked to perform carefully selected movements. Our algorithms then automatically analyze the videos, assessing the quality of each movement and classifying them as healthy or non-healthy. Our contribution is three-fold: We i) introduce ensembles of randomized SVM classifiers and compare them with decision forests on the task of depth video classification; ii) demonstrate automatic selection of discriminative landmarks in the depth videos, showing their clinical relevance; iii) validate our classification algorithms quantitatively on a new dataset of 1041 videos of both MS patients and healthy volunteers. We achieve average Dice scores well in excess of the 80% mark, confirming the validity of our approach in practical applications. Our results suggest that this technique could be fruitful for depth-camera supported clinical assessments for a range of conditions.
Computer Vision and Image Understanding | 2012
Peter Kontschieder; Samuel Rota Bulò; Michael Donoser; Marcello Pelillo; Horst Bischof
In this paper we propose a novel, game-theoretic approach for finding multiple instances of an object category as sets of mutually coherent votes in a generalized Hough space. Existing Hough-voting based detection systems have to inherently apply parameter-sensitive non-maxima suppression (NMS) or mode detection techniques for finding object center hypotheses. Moreover, the voting origins contributing to a particular maximum are lost and hence mostly bounding boxes are drawn to indicate the object hypotheses. To overcome these problems, we introduce a two-stage method, applicable on top of any Hough-voting based detection framework. First, we define a Hough environment, where the geometric compatibilities of the voting elements are captured in a pairwise fashion. Then we analyze this environment within a game-theoretic setting, where we model the competition between voting elements as a Darwinian process, driven by their mutual geometric compatibilities. In order to find multiple and possibly overlapping objects, we introduce a new enumeration method inspired by tabu search. As a result, we obtain locations and voting element compositions of each object instance while bypassing the task of NMS. We demonstrate the broad applicability of our method on challenging datasets like the extended TUD pedestrian crossing scene.
international symposium on mixed and augmented reality | 2011
Michael Donoser; Peter Kontschieder; Horst Bischof
In this paper we introduce a novel real-time method to track weakly textured planar objects and to simultaneously estimate their 3D pose. The basic idea is to adapt the classic tracking-by-detection approach, which seeks for the object to be tracked independently in each frame, for tracking non-textured objects. In order to robustly estimate the 3D pose of such objects in each frame, we have to tackle three demanding problems. First, we need to find a stable representation of the object which is discriminable against the background and highly repetitive. Second, we have to robustly relocate this representation in every frame, also during considerable viewpoint changes. Finally, we have to estimate the pose from a single, closed object contour. Of course, all demands shall be accommodated at low computational costs and in real-time. To attack the above mentioned problems, we propose to exploit the properties of Maximally Stable Extremal Regions (MSERs) for detecting the required contours in an efficient manner and to apply random ferns as efficient and robust classifier for tracking. To estimate the 3D pose, we construct a perspectively invariant frame on the closed contour which is intrinsically provided by the extracted MSER. In our experiments we obtain robust tracking results with accurate poses on various challenging image sequences at a single requirement: One MSER used for tracking has to have at least one concavity that sufficiently deviates from its convex hull.
british machine vision conference | 2011
Peter Kontschieder; Hayko Riemenschneider; Michael Donoser; Horst Bischof
The goal of this work is to discriminatively learn contour fragment descriptors for the task of object detection. Unlike previous methods that incorporate learning techniques only for object model generation or for verification after detection, we present a holistic object detection system using solely shape as underlying cue. In the learning phase, we interrelate local shape descriptions (fragments) of the object contour with the corresponding spatial location of the object centroid. We introduce a novel shape fragment descriptor that abstracts spatially connected edge points into a matrix consisting of angular relations between the points. Our proposed descriptor fulfills important properties like distinctiveness, robustness and insensitivity to clutter. During detection, we hypothesize object locations in a generalized Hough voting scheme. The back-projected votes from the fragments allow to approximately delineate the object contour. We evaluate our method e.g. on the well-known ETHZ shape data base, where we achieve an average detection score of 87:5% at 1:0 FPPI only from Hough voting, outperforming the highest scoring Hough voting approaches by almost 8%.