Samuel Schulter
Graz University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Samuel Schulter.
computer vision and pattern recognition | 2015
Samuel Schulter; Christian Leistner; Horst Bischof
The aim of single image super-resolution is to reconstruct a high-resolution image from a single low-resolution input. Although the task is ill-posed it can be seen as finding a non-linear mapping from a low to high-dimensional space. Recent methods that rely on both neighborhood embedding and sparse-coding have led to tremendous quality improvements. Yet, many of the previous approaches are hard to apply in practice because they are either too slow or demand tedious parameter tweaks. In this paper, we propose to directly map from low to high-resolution patches using random forests. We show the close relation of previous work on single image super-resolution to locally linear regression and demonstrate how random forests nicely fit into this framework. During training the trees, we optimize a novel and effective regularized objective that not only operates on the output space but also on the input space, which especially suits the regression task. During inference, our method comprises the same well-known computational efficiency that has made random forests popular for many computer vision problems. In the experimental part, we demonstrate on standard benchmarks for single image super-resolution that our approach yields highly accurate state-of-the-art results, while being fast in both training and evaluation.
computer vision and pattern recognition | 2013
Samuel Schulter; Paul Wohlhart; Christian Leistner; Amir Saffari; Peter M. Roth; Horst Bischof
This paper introduces a novel classification method termed Alternating Decision Forests (ADFs), which formulates the training of Random Forests explicitly as a global loss minimization problem. During training, the losses are minimized via keeping an adaptive weight distribution over the training samples, similar to Boosting methods. In order to keep the method as flexible and general as possible, we adopt the principle of employing gradient descent in function space, which allows to minimize arbitrary losses. Contrary to Boosted Trees, in our method the loss minimization is an inherent part of the tree growing process, thus allowing to keep the benefits of common Random Forests, such as, parallel processing. We derive the new classifier and give a discussion and evaluation on standard machine learning data sets. Furthermore, we show how ADFs can be easily integrated into an object detection application. Compared to both, standard Random Forests and Boosted Trees, ADFs give better performance in our experiments, while yielding more compact models in terms of tree depth.
british machine vision conference | 2011
Samuel Schulter; Christian Leistner; Peter M. Roth; Horst Bischof; Luc Van Gool
Recently, Gall & Lempitsky [6] and Okada [9] introduced Hough Forests (HF), which emerged as a powerful tool in object detection, tracking and several other vision applications. HFs are based on the generalized Hough transform [2] and are ensembles of randomized decision trees, consisting of both classification and regression nodes, which are trained recursively. Densly sampled patches of the target object {Pi = (Ai,yi,di)} represent the training data, where Ai is the appearance, yi the label, and di a vector pointing to the center of the object. Each node tries to find an optimal splitting function by either optimizing the information gain for classification nodes or the variance of offset vectors di for regression nodes. This yields quite clean leaf nodes according to both, appearance and offset. However, typically HFs are trained in off-line mode, which means that they assume having access to the entire training set at once. This limits their application in situations where the data arrives sequentially, e.g., in object tracking, in incremental, or large-scale learning. For all of these applications, on-line methods inherently can perform better. Thus, we propose in this paper an on-line learning scheme for Hough forests, which allows to extend their usage to further applications, such as the tracking of arbitrary target instances or large-scale learning of visual classifiers. Growing such a tree in an on-line fashion is a difficult task, as errors in the hard splitting rules cannot be corrected easily further down the tree. While Godec et al. [8] circumvent the recursive on-line update of classification trees by randomly growing the trees to their full size and just update the leaf node statistics, we integrate the ideas from [5, 10] that follow a tree-growing principle. The basic idea there is to start with a tree consisting of only one node, which is the root node and the only leaf at that time. Each node collects the data falling in it and decides on its own, based on a certain splitting criterion, whether to split this node or to further update the statistics. Although the splitting criteria in [5, 10] have strong theoretical support, we will show in the experiments that it even suffices to only count the number n of samples Pi that a node has already incorporated and split when n > γ , where γ is a predefined threshold. An overview of this procedure is given in Figure 1. This splitting criterion requires to find reasonable splitting functions with only a small subset of the data, which does not necessarily have to be a disadvantage when building random forests. As stated in Breiman [4], the upper bound for the generalization error of random forests can be optimized with a high strength of the individual trees but also a low correlation between them. To this end, we derive a new but simple splitting procedure for off-line HFs based on subsampling the input space on the node level, which can further decrease the correlation between the trees. That is, each node in a tree randomly samples a predefined number γ of data samples uniformly over all available data at the current node, which is then used for finding a good splitting function. In the first experiment, we demonstrate on three object detection data sets that both, our on-line formulation and subsample splitting scheme, can reach similar performance compared to the classical Hough forests and can even outperform them, see Figures 2(a)&(b). Additionally, during training both proposed methods are orders of magnitudes faster than the original approach (Figure 2(c)). In the second part of the experiments, we demonstrate the power of our method on visual object tracking. Especially, our focus lies on tracking objects of a priori unknown classes, as class-specific tracking with off-line forests has already been demonstrated before [7]. We present results on seven tracking data sets and show that our on-line HFs can outperform state-of-the-art tracking-by-detection methods. Figure 1: While labeled samples arrive on-line, each tree propagates the sample to the corresponding leaf node, which decides whether to split the current leaf or to update its statistics.
international conference on computer vision | 2013
Samuel Schulter; Christian Leistner; Paul Wohlhart; Peter M. Roth; Horst Bischof
We present Alternating Regression Forests (ARFs), a novel regression algorithm that learns a Random Forest by optimizing a global loss function over all trees. This interrelates the information of single trees during the training phase and results in more accurate predictions. ARFs can minimize any differentiable regression loss without sacrificing the appealing properties of Random Forests, like low computational complexity during both, training and testing. Inspired by recent developments for classification [19], we derive a new algorithm capable of dealing with different regression loss functions, discuss its properties and investigate the relations to other methods like Boosted Trees. We evaluate ARFs on standard machine learning benchmarks, where we observe better generalization power compared to both standard Random Forests and Boosted Trees. Moreover, we apply the proposed regressor to two computer vision applications: object detection and head pose estimation from depth images. ARFs outperform the Random Forest baselines in both tasks, illustrating the importance of optimizing a common loss function for all trees.
computer vision and pattern recognition | 2011
Christian Leistner; Martin Godec; Samuel Schulter; Amir Saffari; Manuel Werlberger; Horst Bischof
Current state-of-the-art object classification systems are trained using large amounts of hand-labeled images. In this paper, we present an approach that shows how to use unlabeled video sequences, comprising weakly-related object categories towards the target class, to learn better classifiers for tracking and detection. The underlying idea is to exploit the space-time consistency of moving objects to learn classifiers that are robust to local transformations. In particular, we use dense optical flow to find moving objects in videos in order to train part-based random forests that are insensitive to natural transformations. Our method, which is called Video Forests, can be used in two settings: first, labeled training data can be regularized to force the trained classifier to generalize better towards small local transformations. Second, as part of a tracking-by-detection approach, it can be used to train a general codebook solely on pair-wise data that can then be applied to tracking of instances of a priori unknown object categories. In the experimental part, we show on benchmark datasets for both tracking and detection that incorporating unlabeled videos into the learning of visual classifiers leads to improved results.
medical image computing and computer assisted intervention | 2015
Philipp Kainz; Martin Urschler; Samuel Schulter; Paul Wohlhart; Vincent Lepetit
Automated cell detection in histopathology images is a hard problem due to the large variance of cell shape and appearance. We show that cells can be detected reliably in images by predicting, for each pixel location, a monotonous function of the distance to the center of the closest cell. Cell centers can then be identified by extracting local extremums of the predicted values. This approach results in a very simple method, which is easy to implement. We show on two challenging microscopy image datasets that our approach outperforms state-of-the-art methods in terms of accuracy, reliability, and speed. We also introduce a new dataset that we will make publicly available.
computer vision and pattern recognition | 2014
Samuel Schulter; Christian Leistner; Paul Wohlhart; Peter M. Roth; Horst Bischof
In this paper, we present a novel object detection approach that is capable of regressing the aspect ratio of objects. This results in accurately predicted bounding boxes having high overlap with the ground truth. In contrast to most recent works, we employ a Random Forest for learning a template-based model but exploit the nature of this learning algorithm to predict arbitrary output spaces. In this way, we can simultaneously predict the object probability of a window in a sliding window approach as well as regress its aspect ratio with a single model. Furthermore, we also exploit the additional information of the aspect ratio during the training of the Joint Classification-Regression Random Forest, resulting in better detection models. Our experiments demonstrate several benefits: (i) Our approach gives competitive results on standard detection benchmarks. (ii) The additional aspect ratio regression delivers more accurate bounding boxes than standard object detection approaches in terms of overlap with ground truth, especially when tightening the evaluation criterion. (iii) The detector itself becomes better by only including the aspect ratio information during training.
british machine vision conference | 2012
Paul Wohlhart; Samuel Schulter; Martin Köstinger; Peter M. Roth; Horst Bischof
Object detection models based on the Implicit Shape Model (ISM) [3] use small, local parts that vote for object centers in images. Since these parts vote completely independently from each other, this often leads to false-positive detections due to random constellations of parts. Thus, we introduce a verification step, which considers the activations of all voting elements that contribute to a detection. The levels of activation of each voting element of the ISM form a new description vector for an object hypothesis, which can be examined in order to discriminate between correct and incorrect detections. In particular, we observe the levels of activation of the voting elements in Hough Forests [2], which can be seen as a variant of ISM. In Hough Forests, the voting elements are all the positive training patches used to train the Forest. Each patch of the input image is classified by all decision trees in the Hough Forest. Whenever an input patch falls into the same leaf node as a patch from training, a certain amount of weight is added to the detection hypothesis at the relative position of the object center, which was recorded when cropping out the training patch. The total amount of weight one voting element (offset vector) adds to a detection hypothesis (the total activation) can be calculated by summing over all input patches and trees in the forest. Stacking the activations of all elements gives an activation vector for a hypothesis. We learn classifiers to discriminate correct and wrong part constellations based on these activation vectors and thus assign a better confidence to each detection. We use linear models as well as a histogram intersection kernel SVM. In the linear classifier, one weight is learned for each voting element. We additionally show how to use these weights, not only as a post processing step, but directly in the voting process. This has two advantages: First, it circumvents the explicit calculation of the activation vector for later reclassification, which is computationally more demanding. Second, the non-maxima suppression is performed on cleaner Hough maps, which allows for reducing the size of the suppression neighborhood and thus increases the recall at high levels of precision.
international conference on computer vision | 2015
Gernot Riegler; Samuel Schulter; Matthias Rüther; Horst Bischof
Single image super-resolution is an important task in the field of computer vision and finds many practical applications. Current state-of-the-art methods typically rely on machine learning algorithms to infer a mapping from low-to high-resolution images. These methods use a single fixed blur kernel during training and, consequently, assume the exact same kernel underlying the image formation process for all test images. However, this setting is not realistic for practical applications, because the blur is typically different for each test image. In this paper, we loosen this restrictive constraint and propose conditioned regression models (including convolutional neural networks and random forests) that can effectively exploit the additional kernel information during both, training and inference. This allows for training a single model, while previous methods need to be re-trained for every blur kernel individually to achieve good results, which we demonstrate in our evaluations. We also empirically show that the proposed conditioned regression models (i) can effectively handle scenarios where the blur kernel is different for each image and (ii) outperform related approaches trained for only a single kernel.
british machine vision conference | 2013
Samuel Schulter; Christian Leistner; Peter M. Roth; Horst Bischof
Unsupervised object discovery is the task of finding recurring objects over an unsorted set of images without any human supervision, which becomes more and more important as the amount of visual data grows exponentially. Existing approaches typically build on still images and rely on different prior knowledge to yield accurate results. In contrast, we propose a novel video-based approach, allowing also for exploiting motion information, which is a strong and physically valid indicator for foreground objects, thus, tremendously easing the task. In particular, we show how to integrate motion information in parallel with appearance cues into a common conditional random field formulation to automatically discover object categories from videos. In the experiments, we show that our system can successfully extract, group, and segment most foreground objects and is also able to discover stationary objects in the given videos. Furthermore, we demonstrate that the unsupervised learned appearance models also yield reasonable results for object detection on still images.