Peter M. Roth
Graz University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter M. Roth.
computer vision and pattern recognition | 2012
Martin Köstinger; Martin Hirzer; Paul Wohlhart; Peter M. Roth; Horst Bischof
In this paper, we raise important issues on scalability and the required degree of supervision of existing Mahalanobis metric learning methods. Often rather tedious optimization procedures are applied that become computationally intractable on a large scale. Further, if one considers the constantly growing amount of data it is often infeasible to specify fully supervised labels for all data points. Instead, it is easier to specify labels in form of equivalence constraints. We introduce a simple though effective strategy to learn a distance metric from equivalence constraints, based on a statistical inference perspective. In contrast to existing methods we do not rely on complex optimization problems requiring computationally expensive iterations. Hence, our method is orders of magnitudes faster than comparable methods. Results on a variety of challenging benchmarks with rather diverse nature demonstrate the power of our method. These include faces in unconstrained environments, matching before unseen object instances and person re-identification across spatially disjoint cameras. In the latter two benchmarks we clearly outperform the state-of-the-art.
scandinavian conference on image analysis | 2011
Martin Hirzer; Csaba Beleznai; Peter M. Roth; Horst Bischof
Person re-identification, i.e., recognizing a single person across spatially disjoint cameras, is an important task in visual surveillance. Existing approaches either try to find a suitable description of the appearance or learn a discriminative model. Since these different representational strategies capture a large extent of complementary information we propose to combine both approaches. First, given a specific query, we rank all samples according to a feature-based similarity, where appearance is modeled by a set of region covariance descriptors. Next, a discriminative model is learned using boosting for feature selection, which provides a more specific classifier. The proposed approach is demonstrated on two datasets, where we show that the combination of a generic descriptive statistical model and a discriminatively learned feature-based model attains considerably better results than the individual models alone. In addition, we give a comparison to the state-of-the-art on a publicly available benchmark dataset.
european conference on computer vision | 2012
Martin Hirzer; Peter M. Roth; Martin Köstinger; Horst Bischof
Matching persons across non-overlapping cameras is a rather challenging task. Thus, successful methods often build on complex feature representations or sophisticated learners. A recent trend to tackle this problem is to use metric learning to find a suitable space for matching samples from different cameras. However, most of these approaches ignore the transition from one camera to the other. In this paper, we propose to learn a metric from pairs of samples from different cameras. In this way, even less sophisticated features describing color and texture information are sufficient for finally getting state-of-the-art classification results. Moreover, once the metric has been learned, only linear projections are necessary at search time, where a simple nearest neighbor classification is performed. The approach is demonstrated on three publicly available datasets of different complexity, where it can be seen that state-of-the-art results can be obtained at much lower computational costs.
international conference on computer vision | 2011
Martin Köstinger; Paul Wohlhart; Peter M. Roth; Horst Bischof
Face alignment is a crucial step in face recognition tasks. Especially, using landmark localization for geometric face normalization has shown to be very effective, clearly improving the recognition results. However, no adequate databases exist that provide a sufficient number of annotated facial landmarks. The databases are either limited to frontal views, provide only a small number of annotated images or have been acquired under controlled conditions. Hence, we introduce a novel database overcoming these limitations: Annotated Facial Landmarks in the Wild (AFLW). AFLW provides a large-scale collection of images gathered from Flickr, exhibiting a large variety in face appearance (e.g., pose, expression, ethnicity, age, gender) as well as general imaging and environmental conditions. In total 25,993 faces in 21,997 real-world images are annotated with up to 21 landmarks per image. Due to the comprehensive set of annotations AFLW is well suited to train and test algorithms for multi-view face detection, facial landmark localization and face pose estimation. Further, we offer a rich set of tools that ease the integration of other face databases and associated annotations into our joint framework.
international conference on computer vision | 2011
Martin Godec; Peter M. Roth; Horst Bischof
Online learning has shown to be successful in tracking of previously unknown objects. However, most approaches are limited to a bounding-box representation with fixed aspect ratio. Thus, they provide a less accurate foreground/background separation and cannot handle highly non-rigid and articulated objects. This, in turn, increases the amount of noise introduced during online self-training.
Person Re-Identification | 2014
Peter M. Roth; Martin Hirzer; Martin Köstinger; Csaba Beleznai; Horst Bischof
Recently, Mahalanobis metric learning has gained a considerable interest for single-shot person re-identification. The main idea is to build on an existing image representation and to learn a metric that reflects the visual camera-to-camera transitions, allowing for a more powerful classification. The goal of this chapter is twofold. We first review the main ideas of Mahalanobis metric learning in general and then give a detailed study on different approaches for the task of single-shot person re-identification, also comparing to the state of the art. In particular, for our experiments, we used Linear Discriminant Metric Learning (LDML), Information Theoretic Metric Learning (ITML), Large Margin Nearest Neighbor (LMNN), Large Margin Nearest Neighbor with Rejection (LMNN-R), Efficient Impostor-based Metric Learning (EIML), and KISSME. For our evaluations we used four different publicly available datasets (i.e., VIPeR, ETHZ, PRID 2011, and CAVIAR4REID). Additionally, we generated the new, more realistic PRID 450S dataset, where we also provide detailed segmentations. For the latter one, we also evaluated the influence of using well-segmented foreground and background regions. Finally, the corresponding results are presented and discussed.
advanced video and signal based surveillance | 2012
Martin Hirzer; Peter M. Roth; Horst Bischof
Recognizing persons over a system of disjunct cameras is a hard task for human operators and even harder for automated systems. In particular, realistic setups show difficulties such as different camera angles or different camera properties. Additionally, also the appearance of exactly the same person can change dramatically due to different views (e.g., frontal/back) of carried objects. In this paper, we mainly address the first problem by learning the transition from one camera to the other. This is realized by learning a Mahalanobis metric using pairs of labeled samples from different cameras. Building on the ideas of Large Margin Nearest Neighbor classification, we obtain a more efficient solution which additionally provides much better generalization properties. To demonstrate these benefits, we run experiments on three different publicly available datasets, showing state-of-the-art or even better results, however, on much lower computational efforts. This is in particular interesting since we use quite simple color and texture features, whereas other approaches build on rather complex image descriptions!
computer vision and pattern recognition | 2014
Thomas Mauthner; Peter M. Roth; Horst Bischof
Robust multi-object tracking-by-detection requires the correct assignment of noisy detection results to object trajectories. We address this problem by proposing an online approach based on the observation that object detectors primarily fail if objects are significantly occluded. In contrast to most existing work, we only rely on geometric information to efficiently overcome detection failures. In particular, we exploit the spatio-temporal evolution of occlusion regions, detector reliability, and target motion prediction to robustly handle missed detections. In combination with a conservative association scheme for visible objects, this allows for real-time tracking of multiple objects from a single static camera, even in complex scenarios. Our evaluations on publicly available multi-object tracking benchmark datasets demonstrate favorable performance compared to the state-of-the-art in online and offline multi-object tracking.
international conference on computer vision | 2009
Christian Leistner; Amir Saffari; Peter M. Roth; Horst Bischof
On-line boosting is one of the most successful on-line algorithms and thus applied in many computer vision applications. However, even though boosting, in general, is well known to be susceptible to class-label noise, on-line boosting is mostly applied to self-learning applications such as visual object tracking, where label-noise is an inherent problem. This paper studies the robustness of on-line boosting. Since mainly the applied loss function determines the behavior of boosting, we propose an on-line version of GradientBoost, which allows us to plug in arbitrary loss-functions into the on-line learner. Hence, we can easily study the importance and the behavior of different loss-functions. We evaluate various on-line boosting algorithms in form of a competitive study on standard machine learning problems as well as on common computer vision applications such as tracking and autonomous training of object detectors. Our results show that using on-line Gradient-Boost with robust loss functions leads to superior results in all our experiments.
computer vision and pattern recognition | 2009
Peter M. Roth; Sabine Sternig; Helmut Grabner; Horst Bischof
In this paper we present an adaptive but robust object detector for static cameras by introducing classifier grids. Instead of using a sliding window for object detection we propose to train a separate classifier for each image location, obtaining a very specific object detector with a low false alarm rate. For each classifier corresponding to a grid element we estimate two generative representations in parallel, one describing the objects class and one describing the background. These are combined in order to obtain a discriminative model. To enable to adapt to changing environments these classifiers are learned on-line (i.e., boosting). Continuously learning (24 hours a day, 7 days a week) requires a stable system. In our method this is ensured by a fixed object representation while updating only the representation of the background. We demonstrate the stability in a long-term experiment by running the system for a whole week, which shows a stable performance over time. In addition, we compare the proposed approach to state-of-the-art methods in the field of person and car detection. In both cases we obtain competitive results.